Volodymyr Paprotski
Volodymyr Paprotski
Handcrafted x86_64 asm for Poly1305. Main optimization is to process 16 message blocks at a time. For more details, left a lot of comments in `macroAssembler_x86_poly.cpp`. - Added new KAT...
Replace vpblendvp[sd] with macro assembler call and test in: - `C2_MacroAssembler::vector_cast_float_to_int_special_cases_avx` (insufficient registers for 1 of 2 blends) - `C2_MacroAssembler::vector_cast_double_to_int_special_cases_avx` - `C2_MacroAssembler::vector_count_leading_zeros_int_avx` Functional testing with existing and new tests: `make...
Performance. Before: ``` Benchmark (algorithm) (dataSize) (keyLength) (provider) Mode Cnt Score Error Units SignatureBench.ECDSA.sign SHA256withECDSA 1024 256 thrpt 3 6443.934 ± 6.491 ops/s SignatureBench.ECDSA.sign SHA256withECDSA 16384 256 thrpt 3 6152.979...