reth icon indicating copy to clipboard operation
reth copied to clipboard

Save TraceFromIndex and TraceToIndex

Open jsvisa opened this issue 1 year ago • 4 comments

Describe the feature

Erigon stores the trace information, specifically the "From" and "To" addresses, in an indexed format. This intelligent indexing greatly enhances the performance when searching for transactions or traces, eg the rpc like trace_filter, ots_searchTransaction will experience significant speed improvements as a result of this index.

But the format of the index in Erigon2 and Erigon3 are different.

Erigon 2

The db stat as below(the holesky chain on 2024-08-23)

Status of CallFromIndex
  Pagesize: 8192
  Tree depth: 4
  Branch pages: 4490
  Leaf pages: 857155
  Overflow pages: 0
  Entries: 55318038
Status of CallToIndex
  Pagesize: 8192
  Tree depth: 4
  Branch pages: 5570
  Leaf pages: 1061662
  Overflow pages: 0
  Entries: 69128270
Status of CallTraceSet
  Pagesize: 8192
  Tree depth: 3
  Branch pages: 523
  Leaf pages: 45804
  Overflow pages: 0
  Entries: 8279845

For each executed block, it will collect all the from and to addresses touched in each call, and then save into kv.CallTraceSet https://github.com/ledgerwatch/erigon/blob/d24e5d45755d7b23075c507ad9216e1d60ad03de/eth/calltracer/calltracer.go#L63 :

  • The key is the block number.
  • The value contains an address (20 bytes) and a flag byte
  • If the flag's least significant bit is set (v[length.Addr]&1 > 0), it adds the block number to the froms map for that address.
  • If the flag's second least significant bit is set (v[length.Addr]&2 > 0), it adds the block number to the tos map for that address.

After processing all entries, it finalizes the data by writing the collected information to kv.CallFromIndex and kv.CallToIndex, it stores as the key-value pairs:

   address + chunk1_start_block -> bitmap_chunk1
   address + chunk2_start_block -> bitmap_chunk2

and value is a collection of block numbers, so for the same key(in the same range of blocks) the bitmap will be UPSERTED for each block.

And in the meanwhile clean the old entries in kv.CallTraceSet(this table is temporary)

Erigon 3

Similar to Erigon2, but there was no temporary kv.CallTraceSet table, the from and tos are directly created into kv.TblTracesFromIdx and kv.TblTracesToIdx, and the storage format is also different:

  • Erigon2: one bitmap for a range of blocks per address, one bit for one block;
  • Erigon3: Key-Value pairs where key is address+block, and value is empty, data is persisted into disk as inverted index(managed by a B-Tree like struct)

Compared to Erigon2, Erigon3 has improvements:

  1. no need to put/delete the temporary kv.CallTraceSet table;
  2. no need to update for the same bitmap, just insert new records for new blocks;

Potential tradeoffs:

  1. for the active addresses(eg: Exchange Hot Wallet), Erigon3's storage may use more disk than a bitmap

WIP: need to learn more about the Erigon 3's implemation, and propose a suitable method in reth.

Additional context

No response

jsvisa avatar Sep 03 '24 15:09 jsvisa