nimcrypto icon indicating copy to clipboard operation
nimcrypto copied to clipboard

Optimized SHA2 implementation.

Open cheatfate opened this issue 1 year ago • 4 comments

Should address #36

cheatfate avatar May 08 '24 01:05 cheatfate

A few benchmarks for hashing the beacon state - this is certainly not a exhaustive benchmark because it only tests 64-byte values, but it's still indicative on that particular sample size:

11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz

# best of 3 runs of each

# current nimcrypto
arnetheduck@praeceps:~/status/nimbus-eth2$ ncli/ncli --print-times hashTreeRoot deneb_state state.ssz 
683ed74f8fb7f3322e2b746796d22c1a03023e0aa82299b536f66598bc928407
All time are ms
     Average,       StdDev,          Min,          Max,      Samples,         Test
    1293.098,        0.000,     1293.098,     1293.098,            1, Load file
    6333.745,        0.000,     6333.745,     6333.745,            1, Compute

# new-sha2 with reference implementation - slightly slower
     Average,       StdDev,          Min,          Max,      Samples,         Test
    1328.006,        0.000,     1328.006,     1328.006,            1, Load file
    6780.952,        0.000,     6780.952,     6780.952,            1, Compute

# new-sha2 with cpuid with `shaext` implementation, cpu detection for every new context
     Average,       StdDev,          Min,          Max,      Samples,         Test
    1156.908,        0.000,     1156.908,     1156.908,            1, Load file
    4662.638,        0.000,     4662.638,     4662.638,            1, Compute

# new-sha2 with hardcoded `shaext` implementation,
     Average,       StdDev,          Min,          Max,      Samples,         Test
     714.325,        0.000,      714.325,      714.325,            1, Load file
    1512.727,        0.000,     1512.727,     1512.727,            1, Compute

# new-sha2 with hardcoded `avx2`
     Average,       StdDev,          Min,          Max,      Samples,         Test
    1250.886,        0.000,     1250.886,     1250.886,            1, Load file
    5794.621,        0.000,     5794.621,     5794.621,            1, Compute

# new-sha2 with hardcoded `avx` - oddly, this one is a bit faster than avx2
     Average,       StdDev,          Min,          Max,      Samples,         Test
    1225.362,        0.000,     1225.362,     1225.362,            1, Load file
    5662.962,        0.000,     5662.962,     5662.962,            1, Compute


# blst
     Average,       StdDev,          Min,          Max,      Samples,         Test
     747.602,        0.000,      747.602,      747.602,            1, Load file
    1581.679,        0.000,     1581.679,     1581.679,            1, Compute

arnetheduck avatar Dec 17 '24 18:12 arnetheduck

AVX is faster than AVX2 because of data size... AVX2 implementation uses AVX implementation for 64 bytes data.

cheatfate avatar Dec 18 '24 02:12 cheatfate

Note that you can bench also vs Constantine which includes OpenSSL

git clone https://github.com/mratsim/constantine
cd constantine
CC=clang nimble bench_sha256

mratsim avatar Dec 19 '24 20:12 mratsim

includes OpenSSL

from what I remember, openssl == blst more or less

arnetheduck avatar Dec 19 '24 20:12 arnetheduck