plonky2
plonky2 copied to clipboard
Add blake3 hash
keccak256 time: [300.05 ns 300.94 ns 301.96 ns]
change: [+0.1833% +0.4828% +0.7742%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
blake3 time: [94.998 ns 95.208 ns 95.444 ns]
change: [+0.5058% +0.8687% +1.2147%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
poseidon time: [1.3084 us 1.3115 us 1.3151 us]
change: [-1.0031% -0.6878% -0.3738%] (p = 0.00 < 0.05)
Change within noise threshold.
Benchmarking poseidon<GoldilocksField, 12>: Collecting 100 samples in estimated 5.0071 s (3.4 poseidon<GoldilocksField, 12>
time: [1.3116 us 1.3146 us 1.3185 us]
change: [-0.6405% -0.2693% +0.1246%] (p = 0.17 > 0.05)
No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
3x faster than keccak is a bit suspecious, something must be off with the keccak impl.
I changed KeccakHasher to avoid allocations and use the tiny-keccak library directly. The keccak-hash crate is a thin wrapper around tiny-keccak with a less flexible interface that also pulls in some Ethereum specific dependencies.
the hash bench is slightly improved by the direct usage:
keccak256 time: [285.47 ns 286.14 ns 286.90 ns]
change: [-3.3365% -3.0242% -2.7179%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
The tiny-keccak implementation is a straightforward translation of the reference implementation, with only loop unrolling. The blake3 crate is a well optimized one, this could explain the perf difference.
Or maybe Keccak in general with its huge state is not great at tiny inputs like benchmarked above.
The merkle tree benchmarks (which is dominated by hashing the 135 elm. leafs) benefit ~20% from avoiding allocations:
Benchmarking merkle-tree<GoldilocksField, KeccakHash>/8192: Collecting 10 samples in estimate merkle-tree<GoldilocksField, KeccakHash>/8192
time: [4.3198 ms 4.3874 ms 4.4808 ms]
change: [-17.398% -15.211% -13.018%] (p = 0.00 < 0.05)
Performance has improved.
Benchmarking merkle-tree<GoldilocksField, KeccakHash>/16384: Collecting 10 samples in estimat merkle-tree<GoldilocksField, KeccakHash>/16384
time: [8.7242 ms 8.8855 ms 9.0729 ms]
change: [-24.327% -20.635% -16.591%] (p = 0.00 < 0.05)
Performance has improved.
Benchmarking merkle-tree<GoldilocksField, KeccakHash>/32768: Collecting 10 samples in estimat merkle-tree<GoldilocksField, KeccakHash>/32768
time: [18.198 ms 37.847 ms 64.170 ms]
change: [-47.292% +1.5083% +86.937%] (p = 0.97 > 0.05)
No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) high mild
1 (10.00%) high severe
I suspect using the keccak sponge construction directly or a XOF would be faster than iterated hashing, but that would break Ethereum compatibility.
Not sure why the last bench has such variability. It's not a fluke and happens every time.
Blake3 trees. Looks about 10% faster. I was expecting more so will dig into it.
Benchmarking merkle-tree<GoldilocksField, Blake3Hash>/8192: Collecting 10 samples in estimate merkle-tree<GoldilocksField, Blake3Hash>/8192
time: [4.0060 ms 4.0256 ms 4.0615 ms]
change: [-1.1688% +0.6354% +2.7404%] (p = 0.56 > 0.05)
No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
2 (20.00%) high severe
Benchmarking merkle-tree<GoldilocksField, Blake3Hash>/16384: Collecting 10 samples in estimat merkle-tree<GoldilocksField, Blake3Hash>/16384
time: [7.9700 ms 8.0613 ms 8.1478 ms]
change: [+0.9682% +3.5005% +5.9757%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) low severe
1 (10.00%) high mild
Benchmarking merkle-tree<GoldilocksField, Blake3Hash>/32768: Collecting 10 samples in estimat merkle-tree<GoldilocksField, Blake3Hash>/32768
time: [14.985 ms 15.714 ms 16.575 ms]
change: [-0.2254% +4.3129% +9.1328%] (p = 0.12 > 0.05)
No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
1 (10.00%) high mild
So the keccak/blake3 gap looks mostly due to keccak's large state size, which amortizes for larger inputs. For >= 128 input elements the blake3 advantage is a more reasonable ~20%.
Benchmarking PoseidonHash/hash_no_pad/128: Collecting 100 samples in estimated 5.0021 s (227k PoseidonHash/hash_no_pad/128
time: [20.348 us 20.369 us 20.397 us]
change: [+0.0097% +0.4055% +0.8672%] (p = 0.07 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) high mild
11 (11.00%) high severe
Benchmarking KeccakHash/hash_no_pad/128: Collecting 100 samples in estimated 5.0061 s (1.2M i KeccakHash/hash_no_pad/128
time: [2.6121 us 2.6214 us 2.6309 us]
change: [+0.6240% +1.0450% +1.4290%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
10 (10.00%) high mild
1 (1.00%) high severe
Benchmarking Blake3Hash/hash_no_pad/128: Collecting 100 samples in estimated 5.0124 s (1.3M i Blake3Hash/hash_no_pad/128
time: [2.1946 us 2.1991 us 2.2036 us]
change: [+2.0914% +2.4045% +2.7058%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
Or maybe Keccak in general with its huge state is not great at tiny inputs like benchmarked above.
Yeah that makes sense. I think that amortized cost is most important for us, at least for recursion, since our main (witness) Merkle tree has large leaves which are more of a bottleneck than upper layers.
It's also a bit annoying that we can't leverage Keccak's large rate in the pseudo-permutation, since we want compatibility with EVM's KEC which only returns 256 bits. Though not a big deal since it's not used in the Merkle tree.
Out of curiosity, if you're just exploring fast hash functions, why not trying TurboShake? It's Keccak with 12 rounds instead of 24, so expect 2x faster: https://eprint.iacr.org/2023/342
Hello!
This PR has been open for a long time, please review and update as following:
- If this PR is no l longer relevant, close it.
- If this PR is relevant but not read, mark it as Draft and make a comment of what else needs to be done.
- Fix any issues and get it merged / reviewed by 10/27/2023.
If there's no update by 10/27/2023 this PR will be closed.
Closing per last message