go-ethereum
go-ethereum copied to clipboard
getLogs optimizations
According to some benchmarks eth_getLogs
in geth is an order of magnitude slower than Nethermind and Erigon. These are some ideas we can try for speeding it up:
- Push bloom aggregates to freezer instead of leveldb
- Bloom aggregates of different sizes
Another idea(sparse bloom filter) for improving efficiency of bloom filter is raised by zsolt https://gist.github.com/zsfelfoldi/e27487259bea871fefe398a1e964bece
TL;DR:
The bloom filter used in Geth is too small(2048 bits), which ends up to extremely high false positive rate. It means bloom filter is useless.
The idea for decreasing false positive rate is: (1) increase the size of bloom filter (2) increase the number of bloom filters. If bloom filter is efficient for filtering, then the eth_getLogs
performance can be improved significantly for sure.
However, instead of making super sparse bloom filter, I think we can just increase the number of bloom filter. Let's say for each bloom filter it can contain at most 100 items. Once the bloom filter is "full", we create another bloom filter for following items.
There are use-cases where the filters yield results in almost every blocks, e.g. when following activity of many Uniswap pools. There is room for improvement here. Imagine the extreme case that a query yields results for every block. Then:
- We can retrieve receipts and headers en masse from freezer instead of one-by-one
- We can avoid the index and bloom filter checks
However given the query it's not easy to detect whether it will have "dense" or "sparse" results. @fjl suggested a compromise:
We use the index to find sequential matching blocks and perform a range retrieval over those and only after start processing them. This in theory should be a net benefit. Because in the worst case we don't detect any ranges and revert to fetching and processing individually which is the status quo.