ethereumjs-monorepo
ethereumjs-monorepo copied to clipboard
Trie: Internal Refactoring
Trie: Internal Refactoring
Nodes
src/trie/node/
BASE_NODE
abstract class defines basic node structure
raw()- format node for rlp encoding
rlpEncode()- rlp encode to bytes
hash()- hash of rlp encode
NODE_TYPES
- LeafNode
- keyNibbles (unique end of key)
- value
- ExtensionNode
- keyNibbles (shared by subnodes)
- child
- BranchNode
- children: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
- value
- ProofNode
- placeholder for hashed node in a sparse trie
- load()
- attempts to resolve by checking db
- NullNode
- null or deleted
TNode
type TNode = LeafNode | ExtensionNode | BranchNode | ProofNode | NullNode
Operations
src/trie/operations/
Functions in operations directory are used internally by Trie class methods
- Decode
- Decode RLP encoded bytes into
TNode
- Decode RLP encoded bytes into
- Insert
- Update Trie with new key/value pair
- Delete
- Update Trie by removing key/value pair
- Cleanup
- Settle necessary node changes after Trie update
- e.g. Branch with 1 child
- Settle necessary node changes after Trie update
- GetNodePath
- Return path to node by key
- WalkTrie
- Walk the Trie
- Filter nodes with search parameters
- Execute OnFound function on filtered nodes
- ReadStream
- Create a Node Readable Stream for DB values
- GarbageCollection
- Remove unreachable nodes from DB/cache
Trie
MerklePatriciaTrie
trie/merklePatriciaTrie.ts
The MerklePatriciaTrie class implements basic Trie functionality (no DB)
Public
root()- returns the current root hash
lookupNodeByHash()- retrieves a node by hash
setRootByHash- sets the trie root by node hash
resolveProofNode- attempts to resolve a hashed node
getPath- returns stack of nodes from root to node
getNode- returns a node from the trie by key
Internal
_getChildOf- Navigates a parent node to a child by key
_getNodePath- Returns a
WalkResultobject { node, path, remainingNibbles }
- Returns a
_insertAtNode- Inserts a node into Trie by key
_deleteAtNode- Deletes a node from Trie by key
_cleanupNode- Processes a node after an update
TrieWithDB
trie/trieDB.ts
The TrieWithDB class extends the MerklePatriciaTrie class with methods for uisng a database.
TrieWithDB enables
- Checkpointing
- NodePruning
- SecureKeys
- Root Persistence
- TrieDatabase
- Cache
Added Methods:
database()- Returns the database instance
setDatabase()- Replaces the database with new DB
checkpoint()- Adds a root to the checkpoint list
hasCheckpoints()- Returns true if checkpoints in list
storeNode()- Stores a node in database by hash
persistRoot()- Stores the current root in DB with
DB_ROOT_KEY
- Stores the current root in DB with
commit- Removes a checkpoint from the list
revert- Reverts back to previous checkpoint root
pruneCheckpoints- Reduce checkpoings to
maxCheckpoints
- Reduce checkpoings to
flushCheckpoints- Dump all checkpoints
garbageCollect- Delete all unreachable nodes from storage
verifyPrunedIntegrity- Verify that all nodes are reachable
_markReachableNodes- Filter nodes for garbage collection
TrieWrap
trie/trieWrapper.ts
TrieWrap extends TrieWithDB with additional functionality and provides a simple public interface for most uses.
static methods
async create- Create a new Trie instance from options
fromProof- Create a new Trie instance from a proof
verifyProof- Verify a proof against a given root
proof methods
createProof- Create a merkle proof for a key
updateFromProof- Update a (sparse) Trie from a proof
verifyProof- Verify a proof against the root hash
trie methods
putgetdelbatchsetRootNode
stream / walk
createReadStreamwalkTriewalkTrieRecursively
Codecov Report
Merging #2662 (4c90832) into develop-v7 (6bc6a10) will increase coverage by
3.35%. The diff coverage is94.25%.
:exclamation: Current head 4c90832 differs from pull request most recent head 63be09b. Consider uploading reports for the commit 63be09b to get more accurate results
Additional details and impacted files
| Flag | Coverage Δ | |
|---|---|---|
| block | 90.30% <ø> (ø) |
|
| blockchain | 90.79% <93.97%> (+0.24%) |
:arrow_up: |
| client | ? |
|
| common | 96.06% <100.00%> (ø) |
|
| devp2p | 91.82% <ø> (?) |
|
| ethash | ∅ <ø> (∅) |
|
| statemanager | ? |
|
| trie | ? |
|
| tx | 95.50% <ø> (ø) |
|
| util | 81.33% <100.00%> (ø) |
|
| vm | ? |
Flags with carried forward coverage won't be shown. Click here to find out more.
Got all but one of the Trie tests passing with the new findPath().
the failure is weird. it doesn't do anything too different that the other tests around it, so i can't quite tell why it's failing
Got all but one of the Trie tests passing with the new findPath().
the failure is weird. it doesn't do anything too different that the other tests around it, so i can't quite tell why it's failing
🎉
Something is off with CI though, would be great if you can fix as some first next step, so that one gets an overview of the tests still failing.
Note: if we keep this we need to mention in the docs (breaking).
The issues here run deep, and while I can do my best to limit this PR to non-breaking changes, we may find it impossible to fix one issue without also fixing a few others, which makes breaking changes all the more likely.
one helpful non-breaking change i'd love to make right away would be to give the Trie class a debugger. does it not have one be design? like as a performance choice or something?
I have a working solution that does not require the breaking change to trie.copy() -- so I have reverted that change for this PR
Note: if we keep this we need to mention in the docs (breaking).
The issues here run deep, and while I can do my best to limit this PR to non-breaking changes, we may find it impossible to fix one issue without also fixing a few others, which makes breaking changes all the more likely.
Ah, yes, so that was totally not meant as that breaking changes - if necessary or useful - should be avoided, rather a note to myself to mention this in the release notes. 🙂
In the case we discuss here e.g. it makes super much sense (and we (you?) should rather look into the other libraries with a create method if we want to adopt there as well (that would be actually great, but should be a separate PR).
one helpful non-breaking change i'd love to make right away would be to give the Trie class a
debugger. does it not have one be design? like as a performance choice or something?
No, that's not by design or something, it just "is work" to add these debuggers, lol. 😋 Yes, so that would be actually nice. And if you follow the path with this "guard" (if this.DEBUG { debug(...) } or something as we have in SM, VM, devp2p,... it should not impact performance.
Just a general note: when I was starting with this, I was not yet 100% sure if "this is the way to go", this was just started as some (promising, attention: personal judgement 😋) experiment.
If we do it this way, this would throw away a lot of logic, in particular this job queuing thing with this PrioritizedTaskExecutor - or however this is named.
I am currently not seeing why such a task executor is needed, from my understanding Trie node retrieval seems to be a totally deterministic process with no surprises (this findPath name (and other semantics from wording as well) is sometimes putting one on the path thinking that there is some "oh let's look here, no nothing, let's take the other route, ah, nothing as well" process involved. 😜 That's completely not true unless I am missing something.
So anyhow: to confirm that we are not the right track here it would be good if you do substantial mainnet sync trials with the PR on the sideline, just let this running a bit, see how it goes, if the client goes out of memory at some point e.g. with these changes (there is some mention of OOM prevention in "the old code", I am personally not seeing yet where this should happen with max 64 (?) subsequent DB reads).
For the performance perspective it would be good if you take some couple of 100/1000 tx loaded blocks and take this version and then the old one and see how sync times compare (optimally post here).
I do remember this prioritized task executor and I tried to remove it at some point (or at least how I remember is by changing it to an optimized version with binary sort). However this had a side effect why it is still this task executor instead of the binary search one. We should maybe find out what the original motivation for this task executor.
I do remember this prioritized task executor and I tried to remove it at some point (or at least how I remember is by changing it to an optimized version with binary sort). However this had a side effect why it is still this task executor instead of the binary search one. We should maybe find out what the original motivation for this task executor.
Hmm, yeah, this thing is really "interesting". 🤔 🙂
I injected a console log for findPath start and then for priority in pushNodeToQueue(nodeRef: Uint8Array, key: Nibbles = [], priority?: number) along block execution in a client run and this looks like this:
findPath
1
2
3
4
5
6
findPath
1
2
3
4
findPath
1
2
3
4
5
findPath
1
findPath
1
2
3
4
5
findPath
1
2
3
4
findPath
1
2
3
4
5
findPath
So totally linear and simple (and low number, will go a bit higher when in state has gronw, this is in block ~500.000 or so, anyhow).
So this would keep me on the track that this is not doing so much useful.
One theory would be that this was mitigating some flaws of the complicated Promise structrue before (things couldn't execute or something?). But just a theory, does not necessarily need to be true.
I guess we just need to observe this a bit, client sync is likely a good testbed here, solid amount of state + real world scenario. Generally of course also: the higher (block number), the better (on the other hand: testbed before was likely not so ambitious? 🤔)
I agree on the above ^ comment @holgerd77. My gut also says that this prioritized queue thing is a leftover of a very old implementation (it would be somewhat likely this indeed has to do with the old callback structure) and that we can safely remove it, but for completeness just wanted to mention my old experiences. I think if we just let the client run for a while and it does not hiccup we are safe.
I am so glad to hear all of this! @jochem-brouwer and @holgerd77
The git history on the Trie library is a little vague on exactly who wrote what parts of it, and I don't want to offend any current team members by suggesting big changes.
I totally agree that much of the weird complexity to this library can be stripped away, it just requires some finesse, and some basic fundamental changes.
I'm feeling empowered to tear this library apart, and remake it as a more user friendly and "general purpose" set of tools. Should I treat this PR as a vehicle for all of that?
Or should I try to make granular incremental changes in separate PR's?
I think I wrote/rewrote most of the trie library, go ahead refactoring it :)
If you edit a single method, I think it is great if this is a single PR. However if the refactor is "all over the place" this should be self contained in a single PR also, IMO.