js-ipfs
js-ipfs copied to clipboard
`ipfs.dag.import` should allow single blocks
- Version:
version: '0.15.4',
commit: '965f5a4e00254cee48fca2127d95209deaee09cf',
repo: '12',
'ipfs-core': '0.15.4',
'interface-ipfs-core': '^0.155.2'
- Platform:
- Subsystem:
Dag
Severity:
Low
Description:
When calling ipfs.dag.import
we can only supply a CAR stream. The Dag interface should allow us to import single blocks, and sets of blocks, rather than having to create these CAR streams. Moreover, internally, this single function should be called by importCar()
rather than directly putting the blocks in the repo.
Blocks may need to be preloaded but the import of a CAR file skips this step.
Is this a feature that a pull request would be useful for ?
Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:
- "Priority" labels will show how urgent this is for the team.
- "Status" labels will show if this is ready to be worked on, blocked, or in progress.
- "Need" labels will indicate if additional input or analysis is required.
Finally, remember to use https://discuss.ipfs.io if you just need general support.
This would mean that ipfs.dag.export
should also change since import/export methods usually can use each other's returns as parameters.
Export functions usually return something that can be written to disk easily like a string or byte array and this would not do that.
What you could do to 'import' an array of blocks to the repo, instead of changing the import/export methods, is iteration + ipfs.dag.put
api. And getting an array of blocks from a root cid can be done using multiformats/traversal
+ ipfs.dag.get
.
Yes you're right export must be consumable by import. The problem is that ipfs.dag.put
takes a plain object, not a block. There is mention of a test that shows this was sort of considered but no implementation in the code.
Also when importing a CAR file, should it not go the same path as adding any other data to ipfs, such as preload and possibly pinning ? In the current dag implementation, it just writes straight to the repo.
How is the interface of js-ipfs governed ? Is there something higher than this repo, like the golang repo that defined these apis, or are they open to be extended directly by js-ipfs ?
Should I make a pull request that simply sniffs the input to ipfs.dag.put
to see if the supplied data was already a Block
instance, and if so, avoid any of the hashing ?
The problem is that ipfs.dag.put takes a plain object, not a block.
I like this but I'd like the block api to take and return blocks. iirc it used to work that way? Since starting to use the js-multiformats library I haven't used the dag api. Just took another look at the block.put
documentation, guess I've been forgetting to specify the format option which may be affecting the cid for the data.
Also when importing a CAR file, should it not go the same path as adding any other data to ipfs, such as preload and possibly pinning ? In the current dag implementation, it just writes straight to the repo.
I'd expect it to take a similar path to the dag.put
and block.put
methods but just started working with CAR files so don't know enough.
Should I make a pull request that simply sniffs the input to ipfs.dag.put to see if the supplied data was already a Block instance, and if so, avoid any of the hashing ?
this is one of the reasons I wish the block api took blocks.
Yes I suppose the blocks api is a more appropriate place than the dag api. Dag api seems to be "js objects in, js objects out", however blocks api seems to be "Uint8Arrays in, Uint8Arrays out" and all I want is "multiformat/block in, multiformat/block out". I just want to remove the double overhead of hashing, since I've already hashed the block myself, for my own calculation purposes, and now I want to store it as is.
What to do ? Wait for whoever is in charge of the API to say what they prefer and then make a PR in ? So far making a CAR file and importing it using the dag api is the closest thing I've got to avoid double hashing.
In Kubo, the convention / separation of concerns is more or less this (could be bit different in js-ipfs, but not much):
- if you don't care about IPLD and work with raw blocks, use
block put|get
- if you work with IPLD DAGs using Data Model, and want to put DAG-JSON and have it stored as DAG-CBOR, or you want to read dag-pb in Logical Format, use
dag put|get
- if you work with entire DAGs, and want to export them or import using CAR transport, use
dag import|export
all I want is "multiformat/block in, multiformat/block out". I just want to remove the double overhead of hashing, since I've already hashed the block myself, for my own calculation purposes, and now I want to store it as is.
Sounds like you already have a raw block, and already calculated CID for it outside js-ipfs,
and now want to do block put
but without calculating CID for the second time.
@achingbrain any concerns around adding optional cid
parameter that would disable hasher in https://github.com/ipfs/js-ipfs/blob/7a7e091c5d7110542ca7ab6eca1c0c9abb19e54b/packages/ipfs-core/src/components/block/put.js#L24-L30 ?
any concerns around adding optional cid parameter that would disable hasher in
We had this before. But we removed it because the only way to verify that the block you are adding is, in fact, related to the CID you want to store it under is to hash it.
this makes sense to do when the data source is remote and hasn't been verified yet; it seems like this is protecting against accidental misuse by the user and/or malicious actions from a compromised part of a program. the block api accepting multiformats/block here would protect the user similarly. which makes me think the main concern is a malicious actor that can access the block api but not the storage api the repo is using.
Thanks @lidel - you have captured my intent. @achingbrain this stance is attempting to protect the node against the developer that has full control of the node, at the computational expense of said developer. If I want to ruin my node by storing mismatched data, that should be detected by other nodes during bitswap, and (possibly) on rehydration from the data store. This feels congruent to @tabcat's view.
I know much less about this API than you three, but based on @lidel's summary of Kubo's distinctions, dag put|get
sounds most like my use case.
For our specific use case, we have moved on to writing to the repo directly now, but it was surprising to us that we could not manage our own multiformats/block
creation without incurring a double hash penalty. This is a minor inconvenience when compared to the huge assistance provided to us by having the ipfs and libp2p libraries available, and we are especially grateful for the esm and typescript conversions - that must have been a nightmare to pull off 🙏
Notes from today's JS triage call:
- High level Core API (
dag.put
andblock.put
are part of it) abstract away hashing and content integrity details for a reason.- This is not protecting against malicious actors. It is removing footguns from the picture. Regular users and developers who are new to content-addressing and think IPFS is a naive key-value store.
- Experienced developers with higher performance needs will most likely write precomputed CID and Block to the repo directly, skipping CoreAPI and other intermediate abstractions introduced by js-ipfs.
- We are unsure what is the value of higher abstraction API. If you really care about perf, direct repo puts are the way to go.
- Sounds like @inverted-capital ended up doing just that, so we are closing this issue.
- If you feel this should be documented somehow, consider creating an example of working with repo directly :pray:
- If you strongly feel there should be an JS-IPFS top-level API for direct repo puts, open a new issue with proposed command design (this issue changed direction a few times, making it hard to triage for others)
- My personal suggestion would be something like
ipfs.repo.unsafe.put
, to isolate it fromblock
CoreAPI, and make people ause before using it.
- My personal suggestion would be something like
I dont think requiring a multiformats Block be used with the block api would create a footgun, but I'll just look to write to the repo in the future.