ipfs-npm-registry-mirror icon indicating copy to clipboard operation
ipfs-npm-registry-mirror copied to clipboard

What would it look like to use rabin encoded tar files instead of tarballs?

Open mikeal opened this issue 5 years ago • 3 comments

I brought this up on the JS Core meeting today.

The problem with tarballs is that, even if the data inside them is similar, it is never de-duplicated. This lead me to explore what it might look like to use rabin to store package tar files instead of the compressed tarballs.

https://github.com/mikeal/ipfs-npm-rabin-test

The code here is doing a simulation of creating the graph for a single package and then comparing them. It uses an implementation of the new unixfsv2, which is using dag-cbor so it isn't exactly the same as the current IPFS implementation but close enough for investigation.

In short, using rabin encode tar files is mostly slower and larger.

The problem is, whatever you save on deduplication is lost in the lack of compression. I ran a test on my request package to get some preliminary numbers.

The average difference between one version and the next is 104749 bytes in rabin, while the average tarball is only 62525 bytes. On average, there's about 8 more blocks in the rabin encoding as well, which right now would make this much slower.

The total size of the rabin graph is 23677255 compared to only 7815709 for the tarball graph. So, even in the aggregate with all the savings from deduplication in every release, it doesn't make up for the difference in compression.

That doesn't mean this is a dead end, it just means that several other things would need to happen in order for this to better/faster.

  • IPFS would need to reduce the existing penalties for grabbing many small blocks compared to a few large blocks.
  • IPFS would need transport layer compression.
  • IPFS would need storage layer compression.

All of this is currently under discussion but this means that a lot of stuff needs to line up order for this to be worth it.

mikeal avatar Dec 03 '18 23:12 mikeal