go-ethereum icon indicating copy to clipboard operation
go-ethereum copied to clipboard

all: implement path based state scheme

Open rjl493456442 opened this issue 2 years ago • 7 comments

rjl493456442 avatar Jun 14 '22 07:06 rjl493456442

Very nice idea! Is there a discord server to discuss the ideas of these Geth improvements?

qizhou avatar Jun 16 '22 00:06 qizhou

截屏2022-06-24 上午11 19 38

Just post a few metrics for the overall performance(PR: block 11,011,902, Master 10,600,000, Master is 400K blocks behind).

In-memory garbage collection (256MB cache size)

[PR]

  • Total GCed size 3.54TiB
  • Total committed size 939GiB
  • 79% data can be GCed in memory

**Edit(block 144m): **

  • Total GCed size 6.66TiB
  • Total committed size 2.02GiB
  • 76% data can be GCed in memory

[MASTER]

  • Total GCed size 3.84TiB
  • Total committed size 415GiB
  • 90% data can be GCed in memory

Cache hit rate (~1.2GB clean cache size)

  • Total hit in diff layers 12.2B
  • Total hit in clean cache 1.98B
  • Total miss 1.62B
  • 89% cache hit rate

**Edit(block 144m): **

  • Total hit in diff layers 22B
  • Total hit in clean cache 3.35B
  • Total miss 3.63B
  • 87% cache hit rate

In-disk garbage collection

  • Master: 484GB
  • PR: 111GB

**Edit(block 144m): **

  • Master: 1.05TB
  • PR: 232GB

Block execution

PR is slightly faster than Master, the main speedup comes from the trie commit phase difference. In Master trie node reference/dereference algorithm will take some time while it's unnecessary in PR.

rjl493456442 avatar Jun 24 '22 03:06 rjl493456442

Different disk cache size: 128MB vs 512MB ( until block 118M)

[128MB] Cache hit rate: 88% Commit time: 3.19s In-memory GC size: 3.92TB Commit size: 1.42TB

[512MB] Cache hit rate: 90% Commit time: 14.7s In-memory GC size: 4.51TB Commit size: 879GB

Switch bench03/04 to 1024MB vs 64MB

[1024MB] Cache hit rate: 87% Commit time: 35.5s In-memory GC size: 3.52TB Commit size: 512GB

[64MB] Cache hit rate: 82% Commit time: 1.32s In-memory GC size: 1.86TB Commit size: 1.46TB

But 64MB is faster than 1024MB. Not sure why, maybe it's relevant with longer commit time, or the higher GC pressure.

rjl493456442 avatar Jul 01 '22 08:07 rjl493456442

Thanks for the great work Geth team! We're looking forward to the release of this feature to save our disk space.

I'm just curious about how this is implemented, as my understanding after reading the code, this is implemented by utilizing a snapshot layer that stores the RLP-encoded content by the account key(and prefix), so no duplication nodes as before.

While it seems the RLP-encoded content still retains the legacy format? i.e. an RLP-encoded shortNode still contains the key prefix, even though the key prefix has already been recorded in the snapshot's NodeSet, I'm wondering if this is a type of redundant info and can be eliminated as well in the future?

windycrypto avatar Jul 03 '22 10:07 windycrypto

@cifer76 It's mostly the conversion of trie node scheme. Currently, all the nodes are stored by their hash in disk which makes in-disk pruning extremely hard. In this PR we switch to path based scheme which natively enables the in-disk pruning. And in order to survive mini reorg, deep reorg, historical tracing and so on, a few auxiliary components are designed.

rjl493456442 avatar Jul 04 '22 01:07 rjl493456442

@rjl493456442 Thanks for clarifying, yep I know that the nodes were stored by their hash before and now(this PR) they are stored by path. While, actually I mean the stored content of the node is the same as before right? e.g. for a shortNode, the stored content in the disk still includes the key nibbles right? like <key nibbles> | <hash of children content>, where the <key nibbles> could be e.g. "de08af".

windycrypto avatar Jul 04 '22 09:07 windycrypto

@rjl493456442 Thanks for clarifying, yep I know that the nodes were stored by their hash before and now(this PR) they are stored by path. While, actually I mean the stored content of the node is the same as before right? e.g. for a shortNode, the stored content in the disk still includes the key nibbles right? like <key nibbles> | <hash of children content>, where the could be e.g. "de08af".

Exactly. However I plan to split the value node out and store them separately as the next step, so that we can get rid of state snapshot eventually.

rjl493456442 avatar Jul 07 '22 07:07 rjl493456442