go-ethereum
go-ethereum copied to clipboard
all: implement path-based state scheme
This PR adds the path-based implementation, but it's not used yet. The main intention for this PR is reviewers can review the main part but not worrying breaking the live code.
Benchmark results on mainnet
Overall performance:
Finish mainnet full sync in approximately 10 days, which is 11 hours ahead of master branch.
IOWait:
Master branch has a high iowait.
Memory usage:
The memory usage has no big difference between these two.
Allocation:
PBSS has a higher allocation rate
Database
Overall:
| PBSS | Master | |
|---|---|---|
| Database size | 261GB(can be compacted to 215GB) | 1.37TB |
| Database write(key-value store) | 19.2TB | 32.1TB |
| Database read(key-value store) | 250TB | 273TB |
Compaction overhead:
The compaction overhead of master is obviously larger
Wow, great work Gary!!!
INFO [01-28|08:23:21.344] State is complete accounts=197,283,862 slots=908,428,600 codes=29,682,972 elapsed=5h45m42.592s
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka +------------------------------+---------------------+------------+------------+
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | DATABASE | CATEGORY | SIZE | ITEMS |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka +------------------------------+---------------------+------------+------------+
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Headers | 2.41 MiB | 4150 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Bodies | 478.43 MiB | 4150 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Receipt lists | 267.74 MiB | 4150 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Difficulties | 214.79 KiB | 4150 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Block number->hash | 169.39 KiB | 4130 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Block hash->number | 683.85 MiB | 17489356 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Transaction index | 12.24 GiB | 362206976 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Bloombit index | 3.38 GiB | 8747182 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Contract codes | 6.06 GiB | 947830 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Account trie nodes | 32.99 GiB | 283918605 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Storage trie nodes | 135.02 GiB | 1350456903 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Legacy trie nodes | 0.00 B | 0 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | State lookups | 154.75 KiB | 3865 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Trie preimages | 0.00 B | 0 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Account snapshot | 9.70 GiB | 210877140 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Storage snapshot | 71.38 GiB | 997110424 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Beacon sync headers | 590.00 B | 1 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Clique snapshots | 0.00 B | 0 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Key-Value store | Singleton metadata | 232.61 MiB | 18 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Light client | CHT trie nodes | 0.00 B | 0 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Light client | Bloom trie nodes | 0.00 B | 0 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Chain) | Headers | 7.96 GiB | 17485207 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Chain) | Hashes | 633.66 MiB | 17485207 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Chain) | Bodies | 343.17 GiB | 17485207 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Chain) | Receipts | 148.79 GiB | 17485207 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Chain) | Diffs | 276.48 MiB | 17485207 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Statehistory) | History.Meta | 267.78 KiB | 3862 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Statehistory) | Account.Index | 47.18 MiB | 3862 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Statehistory) | Storage.Index | 49.55 MiB | 3862 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Statehistory) | Account.Data | 35.23 MiB | 3862 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | Ancient store (Statehistory) | Storage.Data | 12.01 MiB | 3862 |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka +------------------------------+---------------------+------------+------------+
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka | TOTAL | 773.36 GIB | |
Jun 16 10:43:30 bench05.ethdevops.io tender_ishizaka +------------------------------+---------------------+------------+------------+
The spec of benchmark machine
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 113
Model name: AMD Ryzen 7 3800X 8-Core Processor
Stepping: 0
Frequency boost: enabled
CPU MHz: 2212.605
CPU max MHz: 5381.5420
CPU min MHz: 2200.0000
And it has 64GB memory. Just for reference.
Sorry for spreading my comments a bit between tihs PR and the other. But, regarding https://github.com/ethereum/go-ethereum/pull/26665/files#r1102865645 .
Here's an alternative way to do things. Not sure which is cleanest.
diff --git a/trie/committer.go b/trie/committer.go
index d9a63a224a..745cb76839 100644
--- a/trie/committer.go
+++ b/trie/committer.go
@@ -37,18 +37,18 @@ type committer struct {
}
// newCommitter creates a new committer or picks one from the pool.
-func newCommitter(owner common.Hash, accessList map[string][]byte, collectLeaf bool) *committer {
+func newCommitter(nodes *NodeSet, collectLeaf bool) *committer {
return &committer{
- nodes: NewNodeSet(owner, accessList),
+ nodes: nodes,
collectLeaf: collectLeaf,
}
}
// Commit collapses a node down into a hash node and returns it along with
// the modified nodeset.
-func (c *committer) Commit(n node) (hashNode, *NodeSet) {
+func (c *committer) Commit(n node) hashNode {
h := c.commit(nil, n)
- return h.(hashNode), c.nodes
+ return h.(hashNode)
}
// commit collapses a node down into a hash node and returns it.
diff --git a/trie/nodeset.go b/trie/nodeset.go
index 08121f5c13..698c8237c8 100644
--- a/trie/nodeset.go
+++ b/trie/nodeset.go
@@ -94,21 +94,23 @@ type NodeSet struct {
// NewNodeSet initializes an empty node set to be used for tracking dirty nodes
// for a specific account or storage trie. The owner is zero for the account trie
-// and the owning account address hash for storage tries. The provided accessList
-// represents the original value of accessed nodes, it can be optional but would
-// be beneficial for speeding up the construction of trie history.
-func NewNodeSet(owner common.Hash, accessList map[string][]byte) *NodeSet {
- // Don't panic for lazy users.
- if accessList == nil {
- accessList = make(map[string][]byte)
- }
+// and the owning account address hash for storage tries.
+func NewNodeSet(owner common.Hash) *NodeSet {
return &NodeSet{
owner: owner,
nodes: make(map[string]*memoryNode),
- accessList: accessList,
+ accessList: make(map[string][]byte),
}
}
+// WithAccessList sets an accessList which represents the
+// original value of accessed nodes. This is used for speeding up the
+// construction of trie history.
+func (set *NodeSet) WithAccessList(accessList map[string][]byte) *NodeSet {
+ set.accessList = accessList
+ return set
+}
+
// forEachWithOrder iterates the dirty nodes with the specified order.
// If topToBottom is true:
//
diff --git a/trie/snap_database_test.go b/trie/snap_database_test.go
index af46af9cbe..f0dbade7fd 100644
--- a/trie/snap_database_test.go
+++ b/trie/snap_database_test.go
@@ -39,7 +39,7 @@ type testEnv struct {
// fill creates a list of random nodes for simulation.
func fill(n int, prevPaths [][][]byte, prevBlobs [][][]byte) (common.Hash, *NodeSet) {
var (
- nodes = NewNodeSet(common.Hash{}, nil)
+ nodes = NewNodeSet(common.Hash{})
checkDup = func(path []byte) bool {
if len(path) == 0 {
return true
diff --git a/trie/trie.go b/trie/trie.go
index aedde3a502..ccf0080167 100644
--- a/trie/trie.go
+++ b/trie/trie.go
@@ -568,8 +568,9 @@ func (t *Trie) Commit(collectLeaf bool) (common.Hash, *NodeSet) {
t.root = hashedNode
return rootHash, nil
}
- h := newCommitter(t.owner, t.accessList, collectLeaf)
- newRoot, nodes := h.Commit(t.root)
+ nodes := NewNodeSet(t.owner).WithAccessList(t.accessList)
+ h := newCommitter(nodes, collectLeaf)
+ newRoot := h.Commit(t.root)
t.root = newRoot
return rootHash, nodes
}
EDITED to change the constructor, and pass nodes from the Trie
@holiman This change looks good to me.
Oh, you added back the deletions -- is this ready for a review again?
Needs a rebase :)
i am so excited for this work to land :tada:
i am so excited for this work to land too 🎉
Will this PR be included in v1.12.0? What is the time plan?
How do I switch to PBSS for already existing hash-based data and must I use PBSS to start sync data from the Genesis block?
You will need to resync from scratch to enable it. There's just no way to meaningfully convert one database into the other and also cleaning up all the junk and whantot.
Hi, this pr is awesome and will help a lot in trie storage & prune, but I have some uncertain problems:
- how to handle contract destructs in
pbss, it will delete all its slots; - When there is a power outage or system crash and the diff layers cannot save the journal, what is the impact on the system?
- Will
pbsssupport archive history in the future? How to support? Currently, the archive node only run based on hash scheme?
- how to handle contract destructs in pbss, it will delete all its slots;
The storage of destructed account is still left in the disk. Because in PBSS state changes are required to flush in atomic way, however the storage of destructed account is unbounded, may run into a huge batch and cause out-of-memory. Therefore, this part of storage trie nodes are left until SELF-DESTRUCT is totally disabled.
- When there is a power outage or system crash and the diff layers cannot save the journal, what is the impact on the system?
We have recovery mechanism to rewind the chain head in next restart.
- Will pbss support archive history in the future? How to support? Currently, the archive node only run based on hash scheme?
Yes, i already had some prototypes and it's possible to implement archive node.
- how to handle contract destructs in pbss, it will delete all its slots;
The storage of destructed account is still left in the disk. Because in PBSS state changes are required to flush in atomic way, however the storage of destructed account is unbounded, may run into a huge batch and cause out-of-memory. Therefore, this part of storage trie nodes are left until SELF-DESTRUCT is totally disabled.
- When there is a power outage or system crash and the diff layers cannot save the journal, what is the impact on the system?
We have recovery mechanism to rewind the chain head in next restart.
- Will pbss support archive history in the future? How to support? Currently, the archive node only run based on hash scheme?
Yes, i already had some prototypes and it's possible to implement archive node.
Thank you for your reply. Looking forward to it.
How can I use this pr to run pbss on some eth network to test? Do you have some guide doc?
Use #26274 instead and give it the trie.path-based (or something like that, check help) parameter.
Comparing hbss and pbss, is the state hash the same? Can hbss node and pbss node run in the same eth net?
Comparing hbss and pbss, is the state hash the same? Can hbss node and pbss node run in the same eth net?
Yes
Are there any new plans for this feature? When will it be available for merging?