backer icon indicating copy to clipboard operation
backer copied to clipboard

build merkle tree

Open juliangruber opened this issue 12 years ago • 13 comments

When two or more nodes start replicating (i.e. mirroring, syncing) they need to figure out how their files differ, so they can exchange only what they really have to.

The most efficient way to do this - as far as I know - is to create a merkle tree (also known as hash tree) representing all the files and directories in your sync folder.

Since the file system is a tree this is a perfect fit!

When we're kicking of replicating, both nodes exchange their top hash, the hash of the root node. If that differs, they start going down the tree (i.e. going deeper into directories) and exchange hashes until the know which files are different and which are the same.

The next task would then be to diff files and only send partial changes, but that's for another issue!

Make it work

  • [x] Is there a module that creates merkle trees from directories?
  • if not
    • [x] are there modules, like a merkle tree module, that we should use?
    • [x] any volunteers in writing one?
    • [ ] finish merkle-dir

juliangruber avatar Sep 18 '13 10:09 juliangruber

I've done a lot of work on this. Comparing trees is tough. But I'll give it a go.

heapwolf avatar Sep 18 '13 10:09 heapwolf

Isn't that basically what git does? So far this all sounds like "Let's implement git in Node in a way that can be consumed by users" or am I missing something?

pipobscure avatar Sep 18 '13 10:09 pipobscure

git doesn't handle big files well, github for example limits your files to 100mb. It's not intended for realtime / constant updates, as @dominictarr sure can tell you. Plus git requires both parties to have common histories, but we can't even rely on change histories. Imagine the backer process crashing and you restart it after 1h. There's no way for backer to find out what happened in which order.

juliangruber avatar Sep 18 '13 11:09 juliangruber

I have several parts to this puzzle: this contains a merkle tree exchange protocol: https://github.com/dominictarr/level-merkle (I plan to refactor this to decouple it from leveldb)

generate a merkle tree: https://github.com/dominictarr/merkle

this hashes the files in a directory: https://github.com/dominictarr/nodedrop/blob/master/hash-tree.js

hmm, you probably want a thing to take snapshots of files and put them into a content addressable store.

@phidelta yes, this is a lot like git. but only a subset. Also, in this case, you want to push automatically, which creates some significant differences.

there are quite possibly some parts that can be pulled from https://github.com/creationix/js-git/

dominictarr avatar Sep 18 '13 11:09 dominictarr

large files are not necessarily simple... there may be no way to diff them well, like, if I apply reverb to a wav file, most probably, all the bytes will change, and I'll just have to send the whole thing again.

Probably the best you can do is gc the cache sensibly (don't keep the entire history) and maybe use something more like the bittorrent protocol for replicating large files, especially if many nodes are sharing a particular folder.

dominictarr avatar Sep 18 '13 11:09 dominictarr

ah, the merkle tree module @dominictarr did looks good to go, imo

heapwolf avatar Sep 18 '13 13:09 heapwolf

I just pushed merkle-dir, which is a node modules that creates a merkle tree representation of a given directory. I couldn't really see how I'd integrate other modules, and there might still be critical bugs in this, so reviews are highly appreciated.

There are 2 issues left on the README, that need to be fixed in order for the module to be usable. They're rather small and not too complicated, so, if anyone wants to give it a go :)

juliangruber avatar Sep 19 '13 21:09 juliangruber

@dominictarr It should be binary merkle trees, shouldn't it?

juliangruber avatar Sep 24 '13 15:09 juliangruber

You don't have to have 2 children. I have 16. because then I can branch on the next hex character in the hash, and it means there are less round trips to replicate.

Although, I'm having trouble finding an article that explains how dynamo, riak, etc replicate with merkle trees. (there are plenty on how to use it for verification, though) I implemented something that works, from what that paragraph said. (just search for "merkle" in dynamo paper)

this video explains this works in riak http://coffee.jtuple.com/video/AAE.html

My approach works basically the same way, but focused on a different use case, and has variable size, not fixed size.

dominictarr avatar Sep 24 '13 18:09 dominictarr

@dominictarr am I right that there's no merkle module yet that you can stream entries to?

juliangruber avatar Sep 24 '13 22:09 juliangruber

You can stream to my merkle module. I haven't documented it yet. and I still want to write the bit that tells you when a replication is complete.

I'm gonna be extremely busy over the next week. I'm hanging out a week in lisbon after lxjs though, so I'll certainly get to it then, at least.

dominictarr avatar Sep 25 '13 07:09 dominictarr

I wrote build-merkle.js which uses level-merkle to create a merkle tree from a directory.

@dominictarr this takes >60s to create a merkle tree from my projects folder, which has ~250k files. Is that normal? Also it's way faster when I insert data in one batch vs using a writeStream. Not sure if that's because of leveldb or level-merkle.

juliangruber avatar Sep 25 '13 15:09 juliangruber

When that in-memory version is done this part will just be swapped out. Will try to get the replication done event to work.

juliangruber avatar Sep 25 '13 15:09 juliangruber