Support import of partial CARs without prefetching the full DAG
Is your feature request related to a problem? Please describe.
H: As a user, I would expect that when I import a CAR file that it is retained, including all the links it contains, and that parts are not lost through GC.Many links in my NFTs are of this type and as the owner of these I would also like to save them and make them available. I would be interested to know whether all links are retained when I import the CAR file into Storacha or Filbase.
Describe the solution you'd like
We should allow users the option of pinning all content automatically when importing a car, so NFT owners do not lose content when GC is ran, but they can also only import the content for the car they import.
Describe alternatives you've considered
- Auto pinning all blocks/CIDs in a car recursively - bad because sometimes folks don't want to do this.. but then they could also remove it easily manually.. which is more likely: importing a car and deleting it later, or importing a car and always keeping all of it?
- ???
Additional context
From user convo on slack
C: I have downloaded a partial car file of
collection_cid/image123.pngWhen I import that car file into kubo with ipfs dag import collection_cid_image123.png.car Then done ipfs files cp /ipfs/collection_cid/image123.png How to ensure the parent cid (collection cid) is not garbage collected?
C: I have tested ipfs pin add <collection_cid>/image.png ipfs pin add -r=false <collection_cid> Now I can fetch the ipfs link <collection_cid>/image.png But when I run garbage collection I cant fetch the link <collection_cid>/image.png anymore. I can only fetch the cid of the image.png How to prevent, that the parent CID <collection_cid> of the link is not garbage collected?
C: I am offline and I dont want to do ipfs pin add <collection_cid> online, because it then pins the whole collection directory with all files inside which is super big in size.
C: When offline. I import the car and pin the root and parent cid. And the I can fetch the link
/image123.png But after running garbage collection the link is gone.
H:
ipfs pin add --recursive=false <collection_cid>that will pin only the root cid without pinning the full collection, so that it can resolve <collection_cid>/image.png just fine later. This is all assuming the collection is not huge and the root cid is not a "sharded" repository spanning several blocks.
C: Many thanks Hector. That helps me a lot. :pray: Yes, that's right, when I direct pin the collection_cid with ipfs pin add --recursive=false collection_cid my link
<collection_cid>/image123.pngis resolved without any problems, even when I run the garbage collection. But for some very large collections_cids, that's not enough, because of the issue you alluded to in your answer.How should I proceed then? :face_with_monocle: I don't want the links, which are actually in the car file, to be lost due to garbage collection. Example: https://ipfs.io/ipfs/bafybeih4aaint73mwasanappkamkimwjaoe2m7pm6cagfoamjutxzbt76a/7345.png?format=car
C: Now I understand why the links that are present in the CAR files no longer work after I ran the garbage collection. I would not have expected that. :cry: Why is there no option for ipfs dag import to pin the root cid (recursive pin) and also all parent cids (direct pin). Even the import of car files in ipfs-desktop via the webgui does not do this. Then the car import is useless for me if all links that are present in the car file are lost after running garbage collection. :cry:
H: so... if the root directory is sharded, you would need to pin at least one level of subblocks... you can ipfs refs -r --max-depth 1 and then pin each cid individually. That should be enough for moderately big directories already.
C: The complicated process should take place automatically when importing the car file with ipfs dag import carfile.car or in the webui Import CAR file.
cc @lidel @agmap
I think we need to look deeper. Let's explore what is the actual use need here.
Recursive pinning is already implemented and enabled by default. See ipfs dag import --help and --pin-roots and ipfs-webui/src/bundles/files/actions.js#L426-L427.
@agmap are you able to elaborate how "a partial car file of collection_cid/image123.png" was generated? Are you able to provide .car samples? Or use car inspect on them?
Without seeing the CARs, and only reading about your problems, it looks to me more like a partial (or malformed) CAR causing issues, rather than ipfs-webui or Kubo RPC/CLI ipfs dag import not doing the right thing, but would like to understand what is the perceived bug here.
Note that ipfs dag import already applies --pin-roots=true by default. It will create recursive pin to protect the full DAG behind root CID. As long the entire thing is in a CAR it will work fine. Kubo's ipfs dag export produces full archives which means import/export to/from Kubo is always safe and works in offline mode.
If you are not able to depend on full CAR archives, it means you have unstructured or a malformed CAR, which is just "a bag of blocks" and does not have the full DAG or correct Root CID in the CAR header.
Can WebUI user import partial CARs? No.
To be able to import those "partial shards / bags of blocks" in a way that won't be GC'd while you do it, and work even when you are not able to find all blocks (and still pin things in best-effort fashion), you want to:
- Protect imported shards from being garbage-collected. You have two options here:
- (A) use
ipfs files cpto create lazy pointer in MFS to the root CID. It will protect locally cached blocks that belong to the DAG, but won't prefetch anything.- 🍏 This is already possible from WebUI via "Import" → "From IPFS" on the Files screen
- The downside here is that you need to import shards in order that ensures there are no "unconnected" blocks, and there is always link from the root CID to a block in a CAR.
- (B) run Kubo daemon with GC disabled
- ⚠️ This is not supported by WebUI, requires you to modify the way
ipfs daemonis initialized by running withipfs daemon --enable-gc=false - The upside here is that you dont need to worry about order in which CARs were imported. As long GC is disabled, your won't lose any block, even if you cant walk the DAG from root CID to some of them. (how useful such broken, untraversable DAG is, not sure, probably limited to you doing best-effort mirror of dataset that someone else is hosting fully, but if there is no full copy elsewhere, those unconnected blocks are effectively dead anyway).
- ⚠️ This is not supported by WebUI, requires you to modify the way
- (A) use
- Import each
.carwith a DAG shard withipfs dag import --pin-roots=falseto skip low level pinning (we depend on MFS instead), and thus allow incomplete DAG import- ❌ This is not supported by ipfs-webui v4.7,
ipfs.dag.importwithpinRoots: trueis hardcoded here, which means if you importa a "partial CAR" that has some root CID in CAR header, it will trigger attempt to pin entire thing, and if blocks are not provided on public IPFS mainnet, CAR import will fail.
- ❌ This is not supported by ipfs-webui v4.7,
What is missing? Ability to do ipfs dag import with --pin-roots=false
I believe what is missing in ipfs-webui to support your use case @agmap, is adding a checkbox to Modal disaplayed in "Import" → "From CAR" that allows controlling pinRoots parameter.
It should be checked by default, but the labels should explain to user that they may want to uncheck it if they are working with partial CARs or don't want to create an explicit recursive pin. Perhaps:
- [x] Also recursively pin all root CIDs listed in the CAR header (Uncheck if importing partial CARs or to avoid downloading missing blocks)
As for UI, we have prior art in "Remove" action: if user tries to remove a directory which was also explicitly pinned recursively, an extra chekbox is displayed:
As I understood it:
- The user imports a CAR file with a full NFT collection in the form
<root>/<image>.png - The user is only interested in some images, but it should always be able to reference them as
<root>/imageX.png. Possibly this is how they are linked on chain. - Thus after importing the CAR, the user wants to GC most of the imported content except the images that are of interest, and the intermediary nodes that allow resolving the path to them.
Maybe I didn't get it though.
@hsanjuan got to the heart of the matter. That's exactly my point. I apologize if I haven't always expressed myself in the correct technical terms.
Background is I have bought many NFTs from some big NFT collections and that NFTs have the links to the images, which are mostly ipfs://<root>/image1432.png. Now I want to backup this image links with a car file. I have done with
https://trustless-gateway.link/ipfs/<root>/image1432.png?format=car&dag-scope=full.
If, for example, the large NFT collection is no longer available anywhere in the ipfs network, then my link to my NFT can no longer be accessed. Then I would like to import the CAR file so that the link is available again in the ipfs network.
But as soon as I run garbage collection, the ipfs link is no longer available, because apparently the <root_cid> and possibly sharded cids for bigger directories, which is present in the car file, is not pinned.
I assumed that if I imported a CAR file, everything would be preserved (including all Links inside the CAR file), but this is not the case.
Hector has indicated possibly to direct pin the <root> and some sharded CIDs with ipfs add pin --recursive=false.
I thought something like this should happen automatically when importing a CAR file.
According to @lidel answer in https://github.com/ipfs/ipfs-webui/issues/2379 I could do Import --> From IPFS (ipfs files cp /ipfs/<root>) if necessary to avoid the GC of the <root_cid>. But this also has some downsides, because I don't want to include GB large NFT collections, where I only own one image, in my node. And also its not possible to easily create one big CAR file of all my files locally in my node.
For testing you can use this example https://ipfs.io/ipfs/bafybeih4aaint73mwasanappkamkimwjaoe2m7pm6cagfoamjutxzbt76a/7345.png?format=car Its a 12GB collection where I only own 1 image.
I don't want to include GB large NFT collections, where I only own one image, in my node
fwiw Import --> From IPFS (backed by ipfs files cp) is "lazy" – it does not prefetch data recursively on its own, only the root block.
If you use WebUI's Files screen and enter a directory imported this way, that will trigger additional fetch of additional root blocks of children to learn their types and sizes (same as ipfs files ls --help), but if files are bigger than one block, it wont fetch deeper.
And also its not possible to easily create one big CAR file of all my files locally in my node.
Indeed, Partial DAG export as a CAR is not supported. It will trigger fetch of entire DAG.
Perhaps
Missing features
If I understand correctly, missing things identified in this thread are
- There is no standard/convention for signaling that
file's parent directory CID is also present in a CARv1 created with/ipfs/parent-cid/file?format=car?&dag-scope=all- This is a known problem documented in https://specs.ipfs.tech/http-gateways/trustless-gateway/#car-roots.
- For historical context, I tried to get multiple stakeholters in 2023 to agree on the spec here, and failed to get agreement, so it is explicitly out of Gateway spec, and can't be depended on as we know different implementations behave(d) differently here.
- CARv1 is IMO ossified, and can't be extended/fixed without creating issues across existing tooling. If we want to fix this problem, we need a wither create a new archive format (e.g. CARv3), or make a breaking change to how
ipfs dag import|exportworks with CARv1 in Kubo, and potentially break interop with other tooling.
- This is a known problem documented in https://specs.ipfs.tech/http-gateways/trustless-gateway/#car-roots.
- Files screen in WebUI: ability to do "direct" pin of a directory, without pinning it recursively.
- This could be a checkbox "pin recursively" that can be unchecked
- Ability to export/import MFS as a potentially partial CAR, but without prefetching (only local blocks)
- this is similar to
--pin-rootsUI described in my previous comment. In addition to that, we could add--ignore-missing-blocksto bothdag import|export- this would allow for
ipfs dag export --ignore-missing-blocks --offline $(ipfs files stat / | head -1)that would effectively do the same "local-only" DAG walk asReprovider.Strategy=mfsand output blocks to a CAR
- this would allow for
-
ipfs dag importwould error if both--pin-rootsand--ignore-missing-blocksare passed. -
Files → Import → From CARin WebUI would have a checkboxes for controlling--pin-rootsand--skip-missing-blocksbefore proceeding with import. -
Files → Export CARin WebUI would show modal which would allow user to uncheck--ignore-missing-blocks--offline.
- this is similar to
First one is unlikely, the second one does not fully fix UX, the third one feels like something that effectively improves UX (any "bag of blocks" CAR could be imported via WebUI, and there would be also ability to export partially local DAGs to a "bag of blocks" CAR).
Many thanks @lidel and sorry for my stupid user questions, but I'm not a professional.
Regarding 2) Having the ability to direct pinning would be a huge enhaencement. 👍 An additional second pin symbol would be needed to differentiate.
Too bad there are only two radical pinning strategies (direct and recursive).
Imagine a pinning strategy where only the path blocks that lead to pinned children are directly pinned.
Something like ipfs pin add --recursive-only-to-pinned-children=true
In this way, I could pin a child element (image.png) and then a parent or grandparent element (the collection directory) using the new pinning strategy --recursive-only-to-pinned-children. This would also automatically include the sharded blocks for larger collections. But that's certainly not possible, otherwise @lidel might have suggested it.
Regarding 3)
Upgrading to ipfs dag export --ignore-missing-blocks seems very very helpful to me. 💯 It would allow me to create archive files of my some parts or my entire local node, making backup and transfering to an other computer much much easier. This is currently not possible. I would be particularly pleased about that. 😎
I am not correctly understanding the ipfs dag import --ignore-missing-blocks thing. Because when I now import my CAR file the DAG inside the CAR file incuding the root cid is imported but subjected to garbage collection, if I do not add the root cid to the filesystem (Files-Import-from IPFS // ipfs files cp).
It is not clear to me why I would have to set --pin-roots=false if the root cid is not pinned anyway (at least directly would be possible). But the complete DAG of the CAR file is imported also without --pin-roots=false . 🤔
Yes, the thing you've described for 2) is not possible because ipfs pin add operates on CIDs, not content paths. If you pass a path, it will be resolved to the CID of the path terminus first, and the CID of the terminus element will be pinned. This is not limited to ipfs pin, every command that accepts path or CID will resolve path to final CID before continuing.
For 3) I've filled issue in Kubo:
- https://github.com/ipfs/kubo/issues/10826
I'm marking this as blocked until Kubo has that feature.