blake2_simd icon indicating copy to clipboard operation
blake2_simd copied to clipboard

blake2-rfc is slightly faster than the portable implementation

Open oconnor663 opened this issue 5 years ago • 4 comments

https://github.com/cesarb/blake2-rfc

I measure it to be about 2% faster than portable.rs. Not yet sure why, though it might be using some SIMD under the covers, or maybe getting optimized to SSE2 by the compiler.

However, the relationship is reversed if I set RUSTFLAGS="-C target-cpu=native -C target-feature=-avx2". No idea why. Again, still a small difference. Notably, both implementations tank their performance if I allow them to use AVX2.

oconnor663 avatar Nov 01 '18 04:11 oconnor663

I thought it might be because blake2-rfc was getting autovectorized, but looking at the output of cargo asm that doesn't seem to be the case. So I'm still not sure where the difference comes from.

oconnor663 avatar May 24 '19 15:05 oconnor663

When I try it on ARM I get the opposite result. Should look at 32-bit ARM at some point.

oconnor663 avatar Aug 08 '19 14:08 oconnor663

@oconnor663 i got same performance (vs blake2-rfc)

# code copy from https://github.com/shadowsocks/crypto2/tree/dev/src/hash/blake2b
git clone https://github.com/LuoZijun/test_blake2b/
cargo bench

LuoZijun avatar Aug 28 '21 12:08 LuoZijun

@oconnor663 I tried with ARM Neoverse N1, blake2-rfc is slightly faster.

The code: https://github.com/gemtek-indonesia/blake2b256-bench/blob/249cac1bf8788c224f45990d607c4b510a92c862/src/main.rs#L103-L134

And compiled it with:

RUSTFLAGS="-C target-cpu=native -C codegen-units=1" cargo build --release

Ujang360 avatar Jan 01 '23 14:01 Ujang360