go-ethereum icon indicating copy to clipboard operation
go-ethereum copied to clipboard

Integrating state snapshot into pathdb

Open rjl493456442 opened this issue 1 year ago • 2 comments

The state snapshot and trie database function as distinct components within the system. It's important to note that the state updates between these two components occur asynchronously.

Therefore, it's pretty common that the persistent state of trie is different with the one of state snapshot.


Originally trie database can only support hash-based scheme, which is completely async with state snapshot. In order to handle the unclean shutdown, some complicated and nasty recovery mechanism is introduced to ensure both trie database and state snapshot can be recovered after panic.

It turns out this recovery mechanism is unnecessary complexity in path-based scheme which can be get rid of totally. It's beneficial enough to integrate state snapshot into pathdb for better robustness and simplicity.


Besides, integration of state snapshot and pathdb is required to build new archive node over path-based scheme.

The overall idea of new archive node design is to maintain extra indexes for each state. The block number of state mutations are tracked in the index.

e.g.

state x => [100, 200, 300]. This index means state x is mutated in block 100, 200, 300.

In the mean time, there is a corresponding state history persisted for each block, which tracks the original value of the modified state.

Therefore, in order to access state x at block 150, state history(200) is resolved and the value recorded is the one in block 150.

If state x at block 350 is accessed, then the value in the latest state should be provided, which is exactly resolved from the state snapshot.

integrating state snapshot into pathdb will simplify the whole thing to implement archive node idea.

rjl493456442 avatar Nov 28 '23 07:11 rjl493456442

I just thought about EIP-4788 which modifies the storage of an address on every block. This would mean that the index for this address would be len(blocks). We could probably keep this in mind to make historical lookups faster for this account

MariusVanDerWijden avatar Dec 01 '23 09:12 MariusVanDerWijden

Besides, integration of state snapshot and pathdb is required to build new archive node over path-based scheme.

Is it a must? State snapshot does not store all the historical changset into kvdb and it holds optional versions data in difflayer. This is the same as pathdb. All the KVs in difflayer&disklayer of snapshot are the subset of pathdb, you know, leaf node of the MPT .

So I think we just need a better policy to cache the leaf node of the trienodes in pathdb so that it can achieve the same read latency as snapshot, otherwise it will affect execution performance.

fynnss avatar Jan 22 '24 12:01 fynnss