iavl icon indicating copy to clipboard operation
iavl copied to clipboard

EPIC: Node & key format

Open tac0turtle opened this issue 3 years ago • 5 comments

Currently IAVL does not take advantage of data locality on disk. The way IAVL store nodes on disk is with a hash that gets stored in a random location on disk and makes it so that IAVL needs to do a random search of disk in order to find the data. With the recent work from osmosis on fast node we saw a large performance improvement with a trade off on writes.

Secondly, there are two key formats, live keys and orphan keys referenced by node hash which also indexes the orphan nodes.

A next step would be to modify IAVL to structure data on disk in a logical manner to reduce random searches and reduce random compaction of disk data.

There are two things that can be done in order to take advantage of this.

Work Scope:

Phase 1:

  • [ ] modify the key format from a hash of the data to be write_version|node_path_in_tree.

Phase 2:

  • [ ] make every node store the version of its children.
  • [ ] remove orphan indexing and hashes

(Proposed by @ValarDragon)

tac0turtle avatar Sep 07 '22 14:09 tac0turtle

Other steps:

  • To modify the key format, you have to update the node struct to add left_child_version and right_child_version in the nodes
  • Probably delete the root key format, and instead just make it a method that returns a key in your main node format space

Can be done later, but needed pre-integration for pruning:

  • Clarify versioning / historic delete API semantics from SDK
  • Make methods to prune (theres a couple approaches, if someones having trouble deriving them, can write up more)

ValarDragon avatar Sep 07 '22 20:09 ValarDragon

modify the key format from a hash of the data to be write_version|node_path_in_tree.

where write_version is the left side and node_path_in_tree is the right side?

Or is it like, both sides have that?

faddat avatar Sep 09 '22 02:09 faddat

NOTE: store/v2 solves that in a way similar to TurboGeth (erigon)

robert-zaremba avatar Sep 09 '22 09:09 robert-zaremba

So this proposal will increase a node (and DB) size, right?

Few important questions

  • What is the definition of write_version?
  • Will that impact state proofs / merkle proofs? This change must be compatible with the existing merkle proof, otherwise we have trouble with IBC. My guess is that it won't, but want to double check.

robert-zaremba avatar Sep 09 '22 10:09 robert-zaremba

NOTE: store/v2 solves that in a way similar to TurboGeth (erigon)

thank you, but as mentioned in a few places store/v2 work is unknown at this time for countless reasons.

tac0turtle avatar Sep 09 '22 11:09 tac0turtle

@marbar3778 It looks like we couldn't modify the key format as write_version|node_path_in_tree. Because when rotating the tree, the path would be changed.

cool-develope avatar Sep 27 '22 20:09 cool-develope

this is expected, do you see a problem with this?

cc @ValarDragon

tac0turtle avatar Oct 01 '22 23:10 tac0turtle

@marbar3778, I'm also working on this, can you assign me as well

catShaark avatar Oct 05 '22 15:10 catShaark