Improved Reprovider.Strategy for entity DAGs (HAMT/UnixFS dirs, big files)
@aschmahmann @petar I remember we discussed this a while ago, as a low-hanging fruit for bigger data providers like Pinata, but was unable to find an issue, so created this one.
Improving provider strategies was previously discussed in: https://github.com/ipfs/go-ipfs/issues/6221, https://github.com/ipfs/go-ipfs/issues/5774, https://github.com/ipfs-inactive/package-managers/issues/84. In this issue I want to propose a well-scoped improvement of codec-aware strategy that could be shipped without refactoring the entire system.
TLDR
- Add a new (opt-in) strategy: when announcing a big UnixFS directory tree, only announce root blocks of directories and files, and skip all internal file data blocks.
- Leverage full content path for finding providers of root blocks.
Problem statement
Right now, we support three values in Reprovider.Strategy which tells reprovider what should be announced. Valid strategies are:
- "all" - announce all stored data (this is also the implicit default)
- "pinned" - only announce pinned data
- "roots" - only announce directly pinned keys and root keys of recursive pins
If the repository gets too big, all and pinned are too expensive and folks are forced to use roots which is codec-agnostic and will only announce the root block of UnixFS DAG.
This means in case of big UnixFS datasets, the user has to write additional orchestration code to go the extra mile and manually pin every file withing a bigger DAG, and make sure those sub-pins are removed when the entire DAG is no longer needed.
Proposed solution: codec-aware (UnixFs) strategy
Depending on a codec, different blocks may have different importance. In case of UnixFS the important blocks are manifest (root) blocks of directories and files. Sub-blocks of individual files with the data itself are not as critical as those manifest blocks. It is CID of manifest block that is looked up on DHT first.
A big data provider may want to opt-in to codec-aware strategy as "best-effort" way to provide something on DHT rather than nothing: in case of UnixFS only provide these manifest blocks on the DHT, facilitating initial lookup without the cost of announcing all the sub-blocks.
Open questions
- Is announcing of those UnixFS root blocks enough?
- Depends. After the manifest block of a big file is fetched, the user is already connected to a peer which most likely has the rest of the blocks and transfer can happen over bitswap. But if the transfer gets interrupted and connection is lost, then it is not possible to resume because we already have root block in local store and we only lookup for missing sub-blocks which were not announced on DHT.
- Potential fix would be to do DHT lookup not only for a specific sub-block in a file, but also for the first UnixFS root block above them (either a root of a file, or a parent directory). Rationale being, if someone has the root of a file, they most likely have the rest.
- We track this in https://github.com/ipfs/kubo/issues/10251
- Depends. After the manifest block of a big file is fetched, the user is already connected to a peer which most likely has the rest of the blocks and transfer can happen over bitswap. But if the transfer gets interrupted and connection is lost, then it is not possible to resume because we already have root block in local store and we only lookup for missing sub-blocks which were not announced on DHT.
Potential fix would be to do DHT lookup not only for a specific sub-block in a file, but also for the first UnixFS root block above them (either a root of a file, or a parent directory). Rationale being, if someone has the root of a file, they most likely have the rest.
This seems reasonable. (I was actually writing this before I've fully red your message.)
The biggest issue with this is that we unofficially create a special case for unixfs files as a single independent entities and that make it harder to create new interessting cross files features in the future.
Two options that I would like to have would break with such thing:
-
Content based chunking. Let's assume I add a
.cararchive to IPFS (you might think "that dumb just add the blocks", but no this is meant for extra support, my pinning service doesn't support a fancy DAG format that I want to use), so I make a.cararchive of my blocks and chunk it perfectly to the block blobs in the.carusing raw leaves then when I want to download it, I use multihash addressed requests (v0.12.0blockstore update). So the downloader thinks it is downloading adag-turbo-3000object, the pinning node thinks it servedag-unixfs->raw-leaf, but both agree because in the end their hashes match. However with this, the pinning service would announce what it thinks is the true root (root of the.car) while the downloader would search for the root of thedag-turbo-3000(which the pinning service does has, just it thinks it's a borringraw-leaf). -
Delta adds. We could add a
--delta=<CID>option to add (or make it a standalone thing, the details are not important). This would use a chunking strategy that would assume that all blocks in--deltaare free and would try to reuse them as much as possible. This would make for cheap incremental updates (note, that would not be that good because we would be limited to blocks, more advanced deltas are capable to pick variable size and arbitrary offsets are far more efficient, but also more expensive to compute and atrocious if you are trying to unthread a very long chain of deltas). Let's assume I download a new version of my app. 90% of the blocks are actually the same as previously, but there is 10% that is new. We can assume that a lot of people already serve the old version, but not much from the new. I would have issue finding nodes serving the old version even tho most of the blocks I can find since they would announce the old root CID and I would search the new. (note I assume the node downloading doesn't already own the original delta cid)
What I would like to see.
I would like to see some priority system. Advertising all CIDs is expensive and only usefull in certain rare scenarios or scenarios that doesn't even exists yet. I think if we could layer strategies that would be nice. So my node would burn full speed at 1200% cpu until all directories and root of files are published which would take a minute hopefully. And then go the a throttled mode at 200% where it will publish all cids in the next 3 hours or so.
- For announcement problem:
-
IPIP-402 introduced the concept of "entity". We could reuse it here, and have
Reprovider.Strategy: entitieswhich only announces the minimal set of blocks required for enumeration. For a file or DAG-CBOR document, that will be a single root blocks. For HAMT-sharded Unixfs directory, it would be the hamt blocks.
-
IPIP-402 introduced the concept of "entity". We could reuse it here, and have
- For content lookup / resume problem
- Clarify docs until we improve implementation: https://github.com/ipfs/kubo/pull/10249
- Improve implementation, make every "get block" operation aware of the content path affinity: https://github.com/ipfs/kubo/issues/10251