plonky2 Add blake3 hash

keccak256               time:   [300.05 ns 300.94 ns 301.96 ns]                      
                        change: [+0.1833% +0.4828% +0.7742%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

blake3                  time:   [94.998 ns 95.208 ns 95.444 ns]                   
                        change: [+0.5058% +0.8687% +1.2147%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

poseidon                time:   [1.3084 us 1.3115 us 1.3151 us]                      
                        change: [-1.0031% -0.6878% -0.3738%] (p = 0.00 < 0.05)
                        Change within noise threshold.

Benchmarking poseidon<GoldilocksField, 12>: Collecting 100 samples in estimated 5.0071 s (3.4                                                                                             poseidon<GoldilocksField, 12>                        
                        time:   [1.3116 us 1.3146 us 1.3185 us]
                        change: [-0.6405% -0.2693% +0.1246%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

Apr 16 '22 09:04 recmo

3x faster than keccak is a bit suspecious, something must be off with the keccak impl.

Apr 16 '22 09:04 recmo

I changed KeccakHasher to avoid allocations and use the tiny-keccak library directly. The keccak-hash crate is a thin wrapper around tiny-keccak with a less flexible interface that also pulls in some Ethereum specific dependencies.

the hash bench is slightly improved by the direct usage:

keccak256               time:   [285.47 ns 286.14 ns 286.90 ns]                      
                        change: [-3.3365% -3.0242% -2.7179%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

The tiny-keccak implementation is a straightforward translation of the reference implementation, with only loop unrolling. The blake3 crate is a well optimized one, this could explain the perf difference.

Or maybe Keccak in general with its huge state is not great at tiny inputs like benchmarked above.

The merkle tree benchmarks (which is dominated by hashing the 135 elm. leafs) benefit ~20% from avoiding allocations:

Benchmarking merkle-tree<GoldilocksField, KeccakHash>/8192: Collecting 10 samples in estimate                                                                                             merkle-tree<GoldilocksField, KeccakHash>/8192                        
                        time:   [4.3198 ms 4.3874 ms 4.4808 ms]
                        change: [-17.398% -15.211% -13.018%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking merkle-tree<GoldilocksField, KeccakHash>/16384: Collecting 10 samples in estimat                                                                                             merkle-tree<GoldilocksField, KeccakHash>/16384                        
                        time:   [8.7242 ms 8.8855 ms 9.0729 ms]
                        change: [-24.327% -20.635% -16.591%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking merkle-tree<GoldilocksField, KeccakHash>/32768: Collecting 10 samples in estimat                                                                                             merkle-tree<GoldilocksField, KeccakHash>/32768                        
                        time:   [18.198 ms 37.847 ms 64.170 ms]
                        change: [-47.292% +1.5083% +86.937%] (p = 0.97 > 0.05)
                        No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

I suspect using the keccak sponge construction directly or a XOF would be faster than iterated hashing, but that would break Ethereum compatibility.

Not sure why the last bench has such variability. It's not a fluke and happens every time.

Apr 16 '22 10:04 recmo

Blake3 trees. Looks about 10% faster. I was expecting more so will dig into it.

Benchmarking merkle-tree<GoldilocksField, Blake3Hash>/8192: Collecting 10 samples in estimate                                                                                             merkle-tree<GoldilocksField, Blake3Hash>/8192                        
                        time:   [4.0060 ms 4.0256 ms 4.0615 ms]
                        change: [-1.1688% +0.6354% +2.7404%] (p = 0.56 > 0.05)
                        No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe
Benchmarking merkle-tree<GoldilocksField, Blake3Hash>/16384: Collecting 10 samples in estimat                                                                                             merkle-tree<GoldilocksField, Blake3Hash>/16384                        
                        time:   [7.9700 ms 8.0613 ms 8.1478 ms]
                        change: [+0.9682% +3.5005% +5.9757%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low severe
  1 (10.00%) high mild
Benchmarking merkle-tree<GoldilocksField, Blake3Hash>/32768: Collecting 10 samples in estimat                                                                                             merkle-tree<GoldilocksField, Blake3Hash>/32768                        
                        time:   [14.985 ms 15.714 ms 16.575 ms]
                        change: [-0.2254% +4.3129% +9.1328%] (p = 0.12 > 0.05)
                        No change in performance detected.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Apr 16 '22 10:04 recmo

So the keccak/blake3 gap looks mostly due to keccak's large state size, which amortizes for larger inputs. For >= 128 input elements the blake3 advantage is a more reasonable ~20%.

Benchmarking PoseidonHash/hash_no_pad/128: Collecting 100 samples in estimated 5.0021 s (227k                                                                                             PoseidonHash/hash_no_pad/128                        
                        time:   [20.348 us 20.369 us 20.397 us]
                        change: [+0.0097% +0.4055% +0.8672%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) high mild
  11 (11.00%) high severe

Benchmarking KeccakHash/hash_no_pad/128: Collecting 100 samples in estimated 5.0061 s (1.2M i                                                                                             KeccakHash/hash_no_pad/128                        
                        time:   [2.6121 us 2.6214 us 2.6309 us]
                        change: [+0.6240% +1.0450% +1.4290%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  10 (10.00%) high mild
  1 (1.00%) high severe

Benchmarking Blake3Hash/hash_no_pad/128: Collecting 100 samples in estimated 5.0124 s (1.3M i                                                                                             Blake3Hash/hash_no_pad/128                        
                        time:   [2.1946 us 2.1991 us 2.2036 us]
                        change: [+2.0914% +2.4045% +2.7058%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

Apr 17 '22 13:04 recmo

Or maybe Keccak in general with its huge state is not great at tiny inputs like benchmarked above.

Yeah that makes sense. I think that amortized cost is most important for us, at least for recursion, since our main (witness) Merkle tree has large leaves which are more of a bottleneck than upper layers.

It's also a bit annoying that we can't leverage Keccak's large rate in the pseudo-permutation, since we want compatibility with EVM's KEC which only returns 256 bits. Though not a big deal since it's not used in the Merkle tree.

Apr 25 '22 08:04 dlubarov

Out of curiosity, if you're just exploring fast hash functions, why not trying TurboShake? It's Keccak with 12 rounds instead of 24, so expect 2x faster: https://eprint.iacr.org/2023/342

May 30 '23 22:05 0x0ece

Hello!

This PR has been open for a long time, please review and update as following:

If this PR is no l longer relevant, close it.
If this PR is relevant but not read, mark it as Draft and make a comment of what else needs to be done.
Fix any issues and get it merged / reviewed by 10/27/2023.

If there's no update by 10/27/2023 this PR will be closed.

Oct 23 '23 17:10 pgebheim

Closing per last message

Nov 20 '23 18:11 pgebheim

plonky2 plonky2 copied to clipboard

Add blake3 hash

plonky2
plonky2 copied to clipboard