libcrux
libcrux copied to clipboard
Optimize HACL* Raw RSA decryption to use CRT
Tested on multiple platforms and compilers.
Some observations:
- compiling with gcc on x64 does not appear to enable HACL_CAN_COMPILE_INTRINSICS in lib_intrinsics.h, leading to a performance degradation. enabling this flag provides a significant boos.
- optimizing at -O2 (like the Linux Kernel) vs -O3 does not make much difference to this code
- recent GCCs (e.g. 13) are better at optimizing this code than GCC-11
- recent clang is still about 10-15% faster than recent GCC
- the optimizations that work differ for x64 and ARM likely because of the difference in mul instructions and pipelining
Performance
- Our dec code is 10x slower than optimized OpenSSL assembly that uses CRT
- On x64, our dec code (without CRT) appears to already be competititive with Kernel code (with CRT)
Next Steps:
- Send update to Cloudflare
- Implement and verify CRT
It is not yet clear whether CRT is needed in this round (although I would like to do it.)
Next steps: push upstream to HACL and to consumers
Currently, this does not appear to be a priority since our signing code is already faster than Linux. We will do this in the fall as time permits.
This issue has been marked as stale due to a lack of activity for 60 days. If you believe this issue is still relevant, please provide an update or comment to keep it open. Otherwise, it will be closed in 7 days.
This issue has been closed due to a lack of activity since being marked as stale. If you believe this issue is still relevant, please reopen it with an update or comment.