urkel icon indicating copy to clipboard operation
urkel copied to clipboard

What's the purpose of the meta file?

Open davebryson opened this issue 7 years ago • 6 comments

This is really great work! I've started a port in go to get a better understanding of the code base.

What's the purpose of the separate meta file? It appears that it's only opened and closed to read/write a key on start-up. However (if I'm reading the code correctly) any new meta information on commit is written to the actual log file. I don't see where the meta file itself is updated on subsequent commits. Is the purpose of the meta file simply to hold the key? Thanks!

davebryson avatar Aug 06 '18 21:08 davebryson

Is the purpose of the meta file simply to hold the key?

Yes, and the key is there to prevent a very specific kind of attack on the tree. I called it "meta" because maybe it would hold some other information in the future other than just the key.

Full expanation of the attack prevented by having a random key:

The idea is that we write a "meta root" on every commit. This meta root contains a pointer to the previous meta root and a pointer to the latest root node. Attached to it is a 20 byte checksum. The meta root is written as the very last part of the write call. When we boot up, we can: 1. Find the meta root since it's always written on a particular byte boundary with a special 4 byte prefix. 2. Verify that the write succeeded by verifying the checksum. This gives us good crash consistency.

This is where the edge-case comes in: if people are allowed to add arbitrary data to the tree, an attacker may try to construct a piece of data which looks like a meta root. A node which has crashed and needs to reboot by finding a meta root may end up using an attacker's piece of data as the meta root, which could have pointers to anywhere on disk and ultimately cause consensus failures.

So, to combat this, everytime an urkel tree is instantiated for the first time, a random key is generated. The 20 byte checksum/hash for each meta root is calculated by permuting the key with the meta root's data.

This way, everyone on the network computes different checksums for their meta roots, and it becomes impossible for an attacker to try to corrupt the tree.

chjj avatar Aug 07 '18 01:08 chjj

Also note that this could probably also be fixed by ensuring that values are never written on the same boundary that meta roots are (pad them by 1 byte or something). This was just one way of solving the issue.

chjj avatar Aug 07 '18 02:08 chjj

I've started a port in go

That's totally awesome by the way. :)

Are you porting the radix tree or the simplified trie?

chjj avatar Aug 07 '18 02:08 chjj

Ahhh... ok. So the meta key is protecting the meta roots in the log file.

Are you porting the radix tree or the simplified trie?

I'm porting the optimized version of the simplified trie. _get and _insert functional, now working on the store piece.

davebryson avatar Aug 07 '18 12:08 davebryson

I'm porting the optimized version of the simplified trie

Hmmm.... looks like I probably should have started with the radix tree. Is that the future direction of urkel?

davebryson avatar Aug 07 '18 12:08 davebryson

I'm porting the optimized version of the simplified trie. _get and _insert functional, now working on the store piece.

Cool.

Hmmm.... looks like I probably should have started with the radix tree. Is that the future direction of urkel?

Not sure yet whether it's the future direction. But I think once you have the data management working, it shouldn't be too hard to port it to the merklix/radix tree. You just have to store colliding bits on each internal node (if there are any), and also add a new type of absence proof.

chjj avatar Aug 07 '18 21:08 chjj