ChaCha20 vs AES-256
Table of contents:
ChaCha20 vs AES
When it comes to symmetric encryption, you only have two good choices today: ChaCha20 or any of its derivatives, or AES-256.
AES-256 has special acceleration hardware dedicated to it. ChaCha20 is faster out of the box, if there is no dedicated AES hardware. ChaCha20 works by taking in a nonce, a pseudorandom 96-bit-input, deriving a unique key from your nonce and password and then generating a stream of high-entropy pseudorandom bits that can be XORed with the plaintext. There is just a debate whether you want a bigger nonce, leading to XChaCha20. AES-256 has several different encryption modes. In ECB mode, you can parallelize encryption and decryption per block and search in encrypted data. However, the same plaintext always gets encrypted to the same ciphertext. Thus, it leaks entropy. For most use cases, you should avoid ECB. Here is an example of an ‘encrypted’ image with ECB:
CBC mode is a bit more similar to ChaCha20; you need an initialization vector ( a nonce in case of ChaCha20 ) that is used as a kind of salt for your key, together with a key derivation function to ensure that the same plaintext gets encrypted to a different ciphertext each time. The ciphertext of one block is then used to derive a key for the next block you want to encrypt and thus, encryption can’t be parallelized and you can’t search in encrypted data. On top of that, this mode has some security issues, if the attacker can observe multiple encryption results for plaintexts he chooses. I do not recommend CBC-mode. GCM mode allows for parallelization when encrypting/decrypting, provides authentication and decrypts the same plaintext to a different ciphertext each time. However, its fatal flaw is that you may never reuse the initialization vector.
There are more modes, but we can stop here, because to be honest, I don’t like any of the AES-256 in particular as each and everyone of them has some kind of security issue.
If you are stuck with AES-256, the best you can do is probably use GCM mode in combination with some authenticator like Poly1305, which is also often combined with ChaCha20.
ChaCha20 is just the better algorithm for cybersecurity.
If you want authentication, too, it is generally used in conjunction with the Poly1305 authenticator.
The combination of ChaCha20 with Poly1305 is super secure.
You can find a reference to it with implementation details here:
RFC 7539: ChaCha20 and Poly1305 for IETF Protocols 8439 INFORMATIONAL Errata Exist Internet Research Task Force (IRTF) Y. Nir Request for Comments: 7539 Check Point… www.rfc-editor.org
RFC is by far the best source if you want to build things from scratch; they provide well-written documentation and always provide useful test-vectors to help you with implementation and verify that you did not understand anything wrong.
ChaCha20 comes with some magic constants used for the hashing function:
static List<int> magicConstants = [
0x61707865, 0x3320646e,
0x79622d32, 0x6b206574
];
Each block has 16x32 bit, with one item being the blockCounter, which is clascially at 0, 4 items for the magic constants, 8 items for the 256-bit-key and 3 items for the 96-bit-nonce.
static Uint32List makeChaChaState(Uint32List key, int blockCounter, Uint32List nonce)
{
assert(key.length == 8);
assert(nonce.length == 3);
return Uint32List.fromList([
magicConstants[0], magicConstants[1], magicConstants[2], magicConstants[3],
key[0], key[1], key[2], key[3],
key[4], key[5], key[6], key[7],
blockCounter, nonce[0], nonce[1], nonce[2]
]);
}
Now RFC writes
RFC 7539 ChaCha20 & Poly1305 May 2015
ChaCha20 runs 20 rounds, alternating between “column rounds” and “diagonal rounds”. Each round consists of four quarter-rounds, and they are run as follows. Quarter rounds 1-4 are part of a “column” round, while 5-8 are part of a “diagonal” round:
- QUARTERROUND ( 0, 4, 8,12)
- QUARTERROUND ( 1, 5, 9,13)
- QUARTERROUND ( 2, 6,10,14)
- QUARTERROUND ( 3, 7,11,15)
- QUARTERROUND ( 0, 5,10,15)
- QUARTERROUND ( 1, 6,11,12)
- QUARTERROUND ( 2, 7, 8,13)
- QUARTERROUND ( 3, 4, 9,14)
….
The basic operation of the ChaCha algorithm is the quarter round. It operates on four 32-bit unsigned integers, denoted a, b, c, and d. The operation is as follows (in C-like notation):
- a += b; d ^= a; d <<<= 16;
- c += d; b ^= c; b <<<= 12;
- a += b; d ^= a; d <<<= 8;
- c += d; b ^= c; b <<<= 7;
Nir & Langley Informational [Page 7] which is where the algorithm gets its name from. It provides us with testing vectors of
final Uint32List rfc_7539_232_sample_state = Uint32List.fromList([
0x61707865, 0x3320646e, 0x79622d32, 0x6b206574,
0x03020100, 0x07060504 ,0x0b0a0908 ,0x0f0e0d0c,
0x13121110, 0x17161514, 0x1b1a1918, 0x1f1e1d1c,
0x00000001, 0x09000000, 0x4a000000, 0x00000000
]);
final Uint32List rfc_7539_232_sample_state_expected_state_after_20_cha_cha_quarter_rounds = Uint32List.fromList([
0x837778ab, 0xe238d763, 0xa67ae21e, 0x5950bb2f,
0xc4f2d0c7, 0xfc62bb2f, 0x8fa018fc, 0x3f5ec7b7,
0x335271c2, 0xf29489f3, 0xeabda8fc, 0x82e46ebd,
0xd19c12b4, 0xb04e16de, 0x9e83d0cb, 0x4e3c50a2
]);
We can implement the algorithm in Flutter like
int rotateLeft(int n, int count) {
assert(count >= 0 && count < 32);
if (count == 0) return n;
return (n << count) | ((n >= 0) ? n >> (32 - count) : ~(~n >> (32 - count)));
}
void tenQuarterRounds()
{
for(int i = 0; i < 10; i++)
{
quarterRound(0, 4, 8, 12);
quarterRound(1, 5, 9, 13);
quarterRound(2, 6, 10, 14);
quarterRound(3, 7, 11, 15);
quarterRound(0, 5, 10, 15);
quarterRound(1, 6, 11, 12);
quarterRound(2, 7, 8, 13);
quarterRound(3, 4, 9, 14);
}
}
void cha_cha_20(){
Uint32List initialStateCopy = Uint32List.fromList(cha_cha_state.toList());
tenQuarterRounds();
for(int i = 0; i < cha_cha_state.length; i++)
{
cha_cha_state[i] += initialStateCopy[i];
}
}
Uint32List transformFourInputsByChaChaQuarterRound(Uint32List fourInputs)
{
fourInputs[0] += fourInputs[1];
fourInputs[3] ^= fourInputs[0];
fourInputs[3] = rotateLeft(fourInputs[3],16);
fourInputs[2] += fourInputs[3];
fourInputs[1] ^= fourInputs[2];
fourInputs[1] = rotateLeft(fourInputs[1],12);
fourInputs[0] += fourInputs[1];
fourInputs[3] ^= fourInputs[0];
fourInputs[3] = rotateLeft(fourInputs[3],8);
fourInputs[2] += fourInputs[3];
fourInputs[1] ^= fourInputs[2];
fourInputs[1] = rotateLeft(fourInputs[1],7);
return fourInputs;
}
void quarterRound(int a, int b, int c, int d)
{
Uint32List result = transformFourInputsByChaChaQuarterRound(Uint32List.fromList([
cha_cha_state[a],
cha_cha_state[b],
cha_cha_state[c],
cha_cha_state[d]
]));
cha_cha_state[a] = result[0];
cha_cha_state[b] = result[1];
cha_cha_state[c] = result[2];
cha_cha_state[d] = result[3];
}
and verify our implementation based on the test vectors provided by RFC.
NOTE: ChaCha20 is especially friendly to be used with SIMD. If possible, use SIMD to speed up your ChaCha20 considerably.