wycheproof
wycheproof copied to clipboard
ChaCha20-Poly1305 large test vectors
Performance optimized implementations of ChaCha20-Poly1305 tend to use conditional areas of code that use CPU intrinsics to operate on larger ciphertext streams. (SSSE3 intrinsics for 256 byte blocks, AVX2 intrinsics for 512 byte blocks)
It would be good to see some large test vectors that will target this, at least one in the >=256 n <512 range, and at least one in the >=512 range. This will demonstrate any implementation bugs in these optimized paths.
You mean test vectors containing plaintext longer than 256 or 512 bytes?
It sounds like a good idea.
@bleichen what do you think?
First, I'm a bit surprised that using 512 byte blocks with AVX2 intrinsics would give optimal performance. I wouldn't expect larger than 256 byte blocks here.
Adding a few longer test vectors can be done. Though the question that remains is whether this is effective. One likely source for problem is the poly1305 computation, which can easily suffer from overflow problems if parallelized carelessly. To generate test vectors that check the poly1305 computation for overflows, I generated a large number of keys and selected those keys where the poly1305 subkeys were extreme. This assumes that Horner's method is used. Hence it is likely that overflows in a parallel poly1305 implementation would currently remain undetected. I'd guess that test vectors for poly1305 alone, with extreme sub keys could help.
Another source for errors are incremental updates. E.g. this paper has some results: https://eprint.iacr.org/2017/891.pdf Flaws can occur at just a few specific input sizes. Not sure what specific input sizes are most problematic for parallelized implementations.
I meant to say ChaCha20 rather than ChaCha20-Poly1305, so that might have confused the issue.
See here for an implementation that has an optimized path for >512 bytes:
https://github.com/jedisct1/libsodium/blob/dcc2e06c93067f421ab549550b89fec45993b7a7/src/libsodium/crypto_stream/chacha20/dolbeau/u8.h#L129
I hope, on that basis alone, the case for larger test vectors is self explanatory?
I found I had to create larger test vectors in order to catch AVX2 implementation bugs in my own implementation. I have no experience in what makes a good test vector hence why I opened this issue.
On Fri, Nov 29, 2019 at 4:44 PM Mark [email protected] wrote:
I meant to say ChaCha20 rather than ChaCha20-Poly1305, so that might have confused the issue.
See here for an implementation that has an optimized path for >512 bytes:
https://github.com/jedisct1/libsodium/blob/dcc2e06c93067f421ab549550b89fec45993b7a7/src/libsodium/crypto_stream/chacha20/dolbeau/u8.h#L129
I hope, on that basis alone, the case for larger test vectors is self explanatory?
I found I had to create larger test vectors in order to catch AVX2 implementation bugs in my own implementation. I have no experience in what makes a good test vector hence why I opened this issue.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/wycheproof/issues/73?email_source=notifications&email_token=AGGH7XULLZCQYXBHGIJZC43QWE2HRA5CNFSM4IZN2PVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFPEXNA#issuecomment-559827892, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGGH7XWAR7ZLYPBEMV5W4NTQWE2HRANCNFSM4IZN2PVA .
Thanks. This implementation does indeed use 512 byte blocks. I would have expected that register spills are a big enough problem, so that smaller chunks are preferable. I'll add some longer inputs. Though I still think that test vectors are not the best option to detect problems with large inputs. E.g. comparing against a reference implementation allows to cover more ground, including things like plaintext
2**32 blocks, for ciphers that allow this.