go-ethereum icon indicating copy to clipboard operation
go-ethereum copied to clipboard

all: implement path-based state scheme

Open rjl493456442 opened this issue 3 years ago • 8 comments

This PR adds the path-based implementation, but it's not used yet. The main intention for this PR is reviewers can review the main part but not worrying breaking the live code.

rjl493456442 avatar Oct 11 '22 08:10 rjl493456442

Benchmark results on mainnet

Overall performance:

Finish mainnet full sync in approximately 10 days, which is 11 hours ahead of master branch.

IOWait:

截屏2023-01-28 上午10 23 01

Master branch has a high iowait.

Memory usage:

截屏2023-01-28 上午10 23 36

The memory usage has no big difference between these two.

Allocation:

截屏2023-01-28 上午10 24 04

PBSS has a higher allocation rate

Database

Overall:

  PBSS Master
Database size 261GB(can be compacted to 215GB) 1.37TB
Database write(key-value store) 19.2TB 32.1TB
Database read(key-value store) 250TB 273TB

Compaction overhead:

截屏2023-01-28 上午10 25 22

The compaction overhead of master is obviously larger

rjl493456442 avatar Jan 28 '23 02:01 rjl493456442

Wow, great work Gary!!!

holiman avatar Jan 28 '23 07:01 holiman

INFO [01-28|08:23:21.344] State is complete accounts=197,283,862 slots=908,428,600 codes=29,682,972 elapsed=5h45m42.592s

rjl493456442 avatar Jan 28 '23 13:01 rjl493456442


Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka +------------------------------+---------------------+------------+------------+
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka |           DATABASE           |      CATEGORY       |    SIZE    |   ITEMS    |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka +------------------------------+---------------------+------------+------------+
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Headers             | 2.41 MiB   |       4150 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Bodies              | 478.43 MiB |       4150 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Receipt lists       | 267.74 MiB |       4150 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Difficulties        | 214.79 KiB |       4150 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Block number->hash  | 169.39 KiB |       4130 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Block hash->number  | 683.85 MiB |   17489356 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Transaction index   | 12.24 GiB  |  362206976 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Bloombit index      | 3.38 GiB   |    8747182 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Contract codes      | 6.06 GiB   |     947830 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Account trie nodes  | 32.99 GiB  |  283918605 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Storage trie nodes  | 135.02 GiB | 1350456903 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Legacy trie nodes   | 0.00 B     |          0 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | State lookups       | 154.75 KiB |       3865 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Trie preimages      | 0.00 B     |          0 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Account snapshot    | 9.70 GiB   |  210877140 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Storage snapshot    | 71.38 GiB  |  997110424 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Beacon sync headers | 590.00 B   |          1 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Clique snapshots    | 0.00 B     |          0 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store              | Singleton metadata  | 232.61 MiB |         18 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Light client                 | CHT trie nodes      | 0.00 B     |          0 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Light client                 | Bloom trie nodes    | 0.00 B     |          0 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Chain)        | Headers             | 7.96 GiB   |   17485207 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Chain)        | Hashes              | 633.66 MiB |   17485207 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Chain)        | Bodies              | 343.17 GiB |   17485207 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Chain)        | Receipts            | 148.79 GiB |   17485207 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Chain)        | Diffs               | 276.48 MiB |   17485207 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Statehistory) | History.Meta        | 267.78 KiB |       3862 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Statehistory) | Account.Index       | 47.18 MiB  |       3862 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Statehistory) | Storage.Index       | 49.55 MiB  |       3862 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Statehistory) | Account.Data        | 35.23 MiB  |       3862 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Statehistory) | Storage.Data        | 12.01 MiB  |       3862 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka +------------------------------+---------------------+------------+------------+
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka |                                       TOTAL        | 773.36 GIB |            |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka +------------------------------+---------------------+------------+------------+

rjl493456442 avatar Jan 29 '23 13:01 rjl493456442

The spec of benchmark machine

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          16
On-line CPU(s) list:             0-15
Thread(s) per core:              2
Core(s) per socket:              8
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           113
Model name:                      AMD Ryzen 7 3800X 8-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         2212.605
CPU max MHz:                     5381.5420
CPU min MHz:                     2200.0000

And it has 64GB memory. Just for reference.

rjl493456442 avatar Jan 30 '23 12:01 rjl493456442

Sorry for spreading my comments a bit between tihs PR and the other. But, regarding https://github.com/ethereum/go-ethereum/pull/26665/files#r1102865645 .

Here's an alternative way to do things. Not sure which is cleanest.

diff --git a/trie/committer.go b/trie/committer.go
index d9a63a224a..745cb76839 100644
--- a/trie/committer.go
+++ b/trie/committer.go
@@ -37,18 +37,18 @@ type committer struct {
 }
 
 // newCommitter creates a new committer or picks one from the pool.
-func newCommitter(owner common.Hash, accessList map[string][]byte, collectLeaf bool) *committer {
+func newCommitter(nodes *NodeSet, collectLeaf bool) *committer {
 	return &committer{
-		nodes:       NewNodeSet(owner, accessList),
+		nodes:       nodes,
 		collectLeaf: collectLeaf,
 	}
 }
 
 // Commit collapses a node down into a hash node and returns it along with
 // the modified nodeset.
-func (c *committer) Commit(n node) (hashNode, *NodeSet) {
+func (c *committer) Commit(n node) hashNode {
 	h := c.commit(nil, n)
-	return h.(hashNode), c.nodes
+	return h.(hashNode)
 }
 
 // commit collapses a node down into a hash node and returns it.
diff --git a/trie/nodeset.go b/trie/nodeset.go
index 08121f5c13..698c8237c8 100644
--- a/trie/nodeset.go
+++ b/trie/nodeset.go
@@ -94,21 +94,23 @@ type NodeSet struct {
 
 // NewNodeSet initializes an empty node set to be used for tracking dirty nodes
 // for a specific account or storage trie. The owner is zero for the account trie
-// and the owning account address hash for storage tries. The provided accessList
-// represents the original value of accessed nodes, it can be optional but would
-// be beneficial for speeding up the construction of trie history.
-func NewNodeSet(owner common.Hash, accessList map[string][]byte) *NodeSet {
-	// Don't panic for lazy users.
-	if accessList == nil {
-		accessList = make(map[string][]byte)
-	}
+// and the owning account address hash for storage tries.
+func NewNodeSet(owner common.Hash) *NodeSet {
 	return &NodeSet{
 		owner:      owner,
 		nodes:      make(map[string]*memoryNode),
-		accessList: accessList,
+		accessList: make(map[string][]byte),
 	}
 }
 
+// WithAccessList sets an accessList which represents the
+// original value of accessed nodes. This is used for speeding up the
+// construction of trie history.
+func (set *NodeSet) WithAccessList(accessList map[string][]byte) *NodeSet {
+	set.accessList = accessList
+	return set
+}
+
 // forEachWithOrder iterates the dirty nodes with the specified order.
 // If topToBottom is true:
 //
diff --git a/trie/snap_database_test.go b/trie/snap_database_test.go
index af46af9cbe..f0dbade7fd 100644
--- a/trie/snap_database_test.go
+++ b/trie/snap_database_test.go
@@ -39,7 +39,7 @@ type testEnv struct {
 // fill creates a list of random nodes for simulation.
 func fill(n int, prevPaths [][][]byte, prevBlobs [][][]byte) (common.Hash, *NodeSet) {
 	var (
-		nodes    = NewNodeSet(common.Hash{}, nil)
+		nodes    = NewNodeSet(common.Hash{})
 		checkDup = func(path []byte) bool {
 			if len(path) == 0 {
 				return true
diff --git a/trie/trie.go b/trie/trie.go
index aedde3a502..ccf0080167 100644
--- a/trie/trie.go
+++ b/trie/trie.go
@@ -568,8 +568,9 @@ func (t *Trie) Commit(collectLeaf bool) (common.Hash, *NodeSet) {
 		t.root = hashedNode
 		return rootHash, nil
 	}
-	h := newCommitter(t.owner, t.accessList, collectLeaf)
-	newRoot, nodes := h.Commit(t.root)
+	nodes := NewNodeSet(t.owner).WithAccessList(t.accessList)
+	h := newCommitter(nodes, collectLeaf)
+	newRoot := h.Commit(t.root)
 	t.root = newRoot
 	return rootHash, nodes
 }

EDITED to change the constructor, and pass nodes from the Trie

holiman avatar Feb 10 '23 15:02 holiman

@holiman This change looks good to me.

rjl493456442 avatar Feb 13 '23 03:02 rjl493456442

Oh, you added back the deletions -- is this ready for a review again?

holiman avatar Feb 22 '23 13:02 holiman

Needs a rebase :)

holiman avatar Mar 14 '23 12:03 holiman

i am so excited for this work to land :tada:

kumavis avatar Apr 14 '23 21:04 kumavis

i am so excited for this work to land too 🎉

ylsGit avatar Apr 25 '23 04:04 ylsGit

Will this PR be included in v1.12.0? What is the time plan?

ylsGit avatar Apr 26 '23 07:04 ylsGit

How do I switch to PBSS for already existing hash-based data and must I use PBSS to start sync data from the Genesis block?

ylsGit avatar Apr 26 '23 08:04 ylsGit

You will need to resync from scratch to enable it. There's just no way to meaningfully convert one database into the other and also cleaning up all the junk and whantot.

karalabe avatar Apr 26 '23 08:04 karalabe

Hi, this pr is awesome and will help a lot in trie storage & prune, but I have some uncertain problems:

  1. how to handle contract destructs in pbss, it will delete all its slots;
  2. When there is a power outage or system crash and the diff layers cannot save the journal, what is the impact on the system?
  3. Will pbss support archive history in the future? How to support? Currently, the archive node only run based on hash scheme?

0xbundler avatar May 04 '23 16:05 0xbundler

  1. how to handle contract destructs in pbss, it will delete all its slots;

The storage of destructed account is still left in the disk. Because in PBSS state changes are required to flush in atomic way, however the storage of destructed account is unbounded, may run into a huge batch and cause out-of-memory. Therefore, this part of storage trie nodes are left until SELF-DESTRUCT is totally disabled.

  1. When there is a power outage or system crash and the diff layers cannot save the journal, what is the impact on the system?

We have recovery mechanism to rewind the chain head in next restart.

  1. Will pbss support archive history in the future? How to support? Currently, the archive node only run based on hash scheme?

Yes, i already had some prototypes and it's possible to implement archive node.

rjl493456442 avatar May 05 '23 02:05 rjl493456442

  1. how to handle contract destructs in pbss, it will delete all its slots;

The storage of destructed account is still left in the disk. Because in PBSS state changes are required to flush in atomic way, however the storage of destructed account is unbounded, may run into a huge batch and cause out-of-memory. Therefore, this part of storage trie nodes are left until SELF-DESTRUCT is totally disabled.

  1. When there is a power outage or system crash and the diff layers cannot save the journal, what is the impact on the system?

We have recovery mechanism to rewind the chain head in next restart.

  1. Will pbss support archive history in the future? How to support? Currently, the archive node only run based on hash scheme?

Yes, i already had some prototypes and it's possible to implement archive node.

Thank you for your reply. Looking forward to it.

0xbundler avatar May 05 '23 06:05 0xbundler

How can I use this pr to run pbss on some eth network to test? Do you have some guide doc?

ylsGit avatar May 06 '23 08:05 ylsGit

Use #26274 instead and give it the trie.path-based (or something like that, check help) parameter.

yorickdowne avatar May 06 '23 08:05 yorickdowne

Comparing hbss and pbss, is the state hash the same? Can hbss node and pbss node run in the same eth net?

ylsGit avatar May 08 '23 08:05 ylsGit

Comparing hbss and pbss, is the state hash the same? Can hbss node and pbss node run in the same eth net?

Yes

rjl493456442 avatar May 08 '23 08:05 rjl493456442

Are there any new plans for this feature? When will it be available for merging?

ylsGit avatar Jun 16 '23 10:06 ylsGit