go-ethereum icon indicating copy to clipboard operation
go-ethereum copied to clipboard

getLogs optimizations

Open s1na opened this issue 2 years ago • 2 comments

According to some benchmarks eth_getLogs in geth is an order of magnitude slower than Nethermind and Erigon. These are some ideas we can try for speeding it up:

  • Push bloom aggregates to freezer instead of leveldb
  • Bloom aggregates of different sizes

s1na avatar Jul 19 '22 12:07 s1na

Another idea(sparse bloom filter) for improving efficiency of bloom filter is raised by zsolt https://gist.github.com/zsfelfoldi/e27487259bea871fefe398a1e964bece

TL;DR:

The bloom filter used in Geth is too small(2048 bits), which ends up to extremely high false positive rate. It means bloom filter is useless.

The idea for decreasing false positive rate is: (1) increase the size of bloom filter (2) increase the number of bloom filters. If bloom filter is efficient for filtering, then the eth_getLogs performance can be improved significantly for sure.

rjl493456442 avatar Jul 21 '22 10:07 rjl493456442

However, instead of making super sparse bloom filter, I think we can just increase the number of bloom filter. Let's say for each bloom filter it can contain at most 100 items. Once the bloom filter is "full", we create another bloom filter for following items.

rjl493456442 avatar Jul 21 '22 10:07 rjl493456442

There are use-cases where the filters yield results in almost every blocks, e.g. when following activity of many Uniswap pools. There is room for improvement here. Imagine the extreme case that a query yields results for every block. Then:

  • We can retrieve receipts and headers en masse from freezer instead of one-by-one
  • We can avoid the index and bloom filter checks

However given the query it's not easy to detect whether it will have "dense" or "sparse" results. @fjl suggested a compromise:

We use the index to find sequential matching blocks and perform a range retrieval over those and only after start processing them. This in theory should be a net benefit. Because in the worst case we don't detect any ranges and revert to fetching and processing individually which is the status quo.

s1na avatar Mar 09 '23 09:03 s1na