reth
reth copied to clipboard
perf: change zstd compression level to 1
Hi. I've noticed you're using compression level of 0 for zstd compressors.
I've put together this benchmark: https://github.com/pawurb/zstd-tx-bench . I've tried to recreate the way you use compression in the reth codebase.
It shows that compression level 1 is ~40% faster than 0 while producing (at least for the sample tx) only ~2% larger payloads:
The official docs mention that level 0 is in practice equal to 3 (default value). But given the significant speed increase maybe it's worth changing?
interesting, thanks! i see that in the repo the transaction data file is like 16KB, have you tried running a larger dataset? Is this only one tx or more?
this wouldnt be a breaking change per se, but it would mean that the static file themselves would have a different checksum
I've tried benchmark with 10 different recent mainnet txs, from 1kb to 29kb and performance improvement seems consistent, even better for bigger txs:
I've added script for checking compression: https://github.com/pawurb/zstd-tx-bench/blob/main/src/compare_compression.rs . And size difference is within ~2% in both directions. Except for one tx where level 1 is 44% better, no idea why. It might be dictionary specific.
That's a great find, thank you. @joshieDo confirming this is not breaking?
@joshieDo confirming this is not breaking?
Correct, at a compression/decompression level it's not. Decompression speed should also be independent of the compression level.
The only thing that comes to mind is: if any user is expecting the checksum to be X of certain static file ranges, then this would break it. i don't particularly mind, just mentioning it.
I feel like we should run it over all transactions/receipts and see what comes out. I'll try it out
Ping @joshieDo for a decision:)
i ran some numbers a couple weeks ago, which lean towards closing this.
op-reth op-mainnet partial synced node
2662251 txes 10x
level 0
Encoding times (ms) - Avg: 3646.268, Median: 3648.831, Min: 3580.706, Max: 3768.547
Decoding times (ms) - Avg: 826.728, Median: 815.104, Min: 803.284, Max: 906.936
level 1
Encoding times (ms) - Avg: 3647.427, Median: 3647.995, Min: 3594.371, Max: 3696.799
Decoding times (ms) - Avg: 837.368, Median: 835.975, Min: 821.720, Max: 852.423
some potentially causes/differences:
- different dataset origin: im doing op-mainnet while OP were mainnet
- dataset size: 10 vs 2662251 *by default we only compress txes with tx.input len bigger than 32
- The OP txes are rlp encoded while ours are compacted before the compression happens - still i'm not sure this explains it since when txes are big, the Data field is not compacted and it should not matter.
It's possible i did something wrong in the benchmarks, but for now i'd close it.