sphincsplus
sphincsplus copied to clipboard
New f1600x4
Instead of using intrinsics and full unrolling, this uses a four-round unrolled version adapted from the one I wrote for Cloudflare's CIRCL library:
github.com/cloudflare/circl/simd/keccakf1600
This is about 10-20% faster on my notebook. We'd better check whether that's also true for other systems as well before merging.
This also means it no longer works on Windows.
@thomwiggers Any suggestions how to advance this? (Given you are facing the same troubles with Kyber & Dilithium.)
The Kyber/Dilithium AVX2 implementations are simply marked as not supported on Windows. It would still be a shame to drop Windows support from this otherwise mostly (bar VLAs) portable plain-C implementation.