ipfs-car icon indicating copy to clipboard operation
ipfs-car copied to clipboard

Root CID does not match js/go-ipfs when packing a dir with 10k sub dirs

Open olizilla opened this issue 3 years ago • 3 comments

Reported by @obo20

Singular file uploads went fine and then a folder of 100 files went fine, but when I tried with a folder of 10k files (a super common use case), I received a different CID with adding to IPFS (go-ipfs/js-ipfs) and then the ipfs-car output

the cid I get from go-ipfs and js-ipfs is: bafybeihq6az265aar27wuhzltxrgge5ywwllcgux7wui4z3ddq4i2cskky the cid I get from ipfs-car is: bafybeigww4x6shkc7vbp7c5slmnw3vo6ioj4gnar6ign5eqbkfpijcavk4

large-folder-10k.zip

olizilla avatar Sep 17 '21 15:09 olizilla

More context... there is a divergence in implementation of large directory sharding between go and js

Yes they diverge: Currently go-ipfs will either do no sharding (by default) or will shard even small folders (sharding enabled), while js-ipfs has a cutoff go-ipfs should for v0.11.0 have sharding enabled by default and is planning on sharding directories that are close to 1MiB in size (there's a deterministic function for this). js-ipfs may choose to do the same as well, but it's not strictly necessary. – @aschmahmann

The auto-shard PR for js-ipfs is here: https://github.com/ipfs/js-ipfs-unixfs/pull/171 The size limit is currently 256KiB but it'll align with go-ipfs before it's merged and will be overrideable by the user – @achingbrain

ipfs-car will adopt the changes in https://github.com/ipfs/js-ipfs-unixfs/pull/171 once they land so that it's CID derivation stays in sync with js-ipfs

olizilla avatar Sep 17 '21 15:09 olizilla

For more context:

The code I'm using looks like:

const results = await packToFs({
      input: contentFilePath,
      output: `${destinationFolder}/data.car`,
      blockstore: new FsBlockStore(),
      wrapWithDirectory: false,
      maxChildrenPerNode: 1024,
      maxChunkSize: 262144
});

The npm versions look like:

    "ipfs-car": "^0.5.8",
    "ipfs-core": "^0.10.6",

obo20 avatar Sep 17 '21 15:09 obo20

As you've found the defaults are different. If sharding is not enabled in js-ipfs (the default) it passes Infinity for shardSplitThreshold . ipfs-car passes nothing so ipfs-unixfs-importer falls back to it's default of 1000. So yes - you just need to expose a shardSplitThreshold option in ipfs-car and pass it on to the importer, then you can manipulate the args to get the same CID as js-ipfs and go-ipfs. Though of course with the knowledge that the default behaviour is going to change soon(ish) to auto-shard based on final block size rather than number of entries in a directory. – achingbrain

olizilla avatar Sep 20 '21 08:09 olizilla