js-ipfs icon indicating copy to clipboard operation
js-ipfs copied to clipboard

`ipfs.dag.import` should allow single blocks

Open inverted-capital opened this issue 2 years ago • 5 comments

  • Version:
  version: '0.15.4',
  commit: '965f5a4e00254cee48fca2127d95209deaee09cf',
  repo: '12',
  'ipfs-core': '0.15.4',
  'interface-ipfs-core': '^0.155.2'
  • Platform:
  • Subsystem:

Dag

Severity:

Low

Description:

When calling ipfs.dag.import we can only supply a CAR stream. The Dag interface should allow us to import single blocks, and sets of blocks, rather than having to create these CAR streams. Moreover, internally, this single function should be called by importCar() rather than directly putting the blocks in the repo.

Blocks may need to be preloaded but the import of a CAR file skips this step.

Is this a feature that a pull request would be useful for ?

inverted-capital avatar Aug 02 '22 09:08 inverted-capital

Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review. In the meantime, please double-check that you have provided all the necessary information to make this process easy! Any information that can help save additional round trips is useful! We currently aim to give initial feedback within two business days. If this does not happen, feel free to leave a comment. Please keep an eye on how this issue will be labeled, as labels give an overview of priorities, assignments and additional actions requested by the maintainers:

  • "Priority" labels will show how urgent this is for the team.
  • "Status" labels will show if this is ready to be worked on, blocked, or in progress.
  • "Need" labels will indicate if additional input or analysis is required.

Finally, remember to use https://discuss.ipfs.io if you just need general support.

welcome[bot] avatar Aug 02 '22 09:08 welcome[bot]

This would mean that ipfs.dag.export should also change since import/export methods usually can use each other's returns as parameters.

Export functions usually return something that can be written to disk easily like a string or byte array and this would not do that.


What you could do to 'import' an array of blocks to the repo, instead of changing the import/export methods, is iteration + ipfs.dag.put api. And getting an array of blocks from a root cid can be done using multiformats/traversal + ipfs.dag.get.

tabcat avatar Aug 02 '22 14:08 tabcat

Yes you're right export must be consumable by import. The problem is that ipfs.dag.put takes a plain object, not a block. There is mention of a test that shows this was sort of considered but no implementation in the code.

Also when importing a CAR file, should it not go the same path as adding any other data to ipfs, such as preload and possibly pinning ? In the current dag implementation, it just writes straight to the repo.

How is the interface of js-ipfs governed ? Is there something higher than this repo, like the golang repo that defined these apis, or are they open to be extended directly by js-ipfs ?

Should I make a pull request that simply sniffs the input to ipfs.dag.put to see if the supplied data was already a Block instance, and if so, avoid any of the hashing ?

inverted-capital avatar Aug 04 '22 03:08 inverted-capital

The problem is that ipfs.dag.put takes a plain object, not a block.

I like this but I'd like the block api to take and return blocks. iirc it used to work that way? Since starting to use the js-multiformats library I haven't used the dag api. Just took another look at the block.put documentation, guess I've been forgetting to specify the format option which may be affecting the cid for the data.

Also when importing a CAR file, should it not go the same path as adding any other data to ipfs, such as preload and possibly pinning ? In the current dag implementation, it just writes straight to the repo.

I'd expect it to take a similar path to the dag.put and block.put methods but just started working with CAR files so don't know enough.

Should I make a pull request that simply sniffs the input to ipfs.dag.put to see if the supplied data was already a Block instance, and if so, avoid any of the hashing ?

this is one of the reasons I wish the block api took blocks.

tabcat avatar Aug 04 '22 05:08 tabcat

Yes I suppose the blocks api is a more appropriate place than the dag api. Dag api seems to be "js objects in, js objects out", however blocks api seems to be "Uint8Arrays in, Uint8Arrays out" and all I want is "multiformat/block in, multiformat/block out". I just want to remove the double overhead of hashing, since I've already hashed the block myself, for my own calculation purposes, and now I want to store it as is.

What to do ? Wait for whoever is in charge of the API to say what they prefer and then make a PR in ? So far making a CAR file and importing it using the dag api is the closest thing I've got to avoid double hashing.

inverted-capital avatar Aug 04 '22 23:08 inverted-capital

In Kubo, the convention / separation of concerns is more or less this (could be bit different in js-ipfs, but not much):

  • if you don't care about IPLD and work with raw blocks, use block put|get
  • if you work with IPLD DAGs using Data Model, and want to put DAG-JSON and have it stored as DAG-CBOR, or you want to read dag-pb in Logical Format, use dag put|get
  • if you work with entire DAGs, and want to export them or import using CAR transport, use dag import|export

all I want is "multiformat/block in, multiformat/block out". I just want to remove the double overhead of hashing, since I've already hashed the block myself, for my own calculation purposes, and now I want to store it as is.

Sounds like you already have a raw block, and already calculated CID for it outside js-ipfs, and now want to do block put but without calculating CID for the second time.

@achingbrain any concerns around adding optional cid parameter that would disable hasher in https://github.com/ipfs/js-ipfs/blob/7a7e091c5d7110542ca7ab6eca1c0c9abb19e54b/packages/ipfs-core/src/components/block/put.js#L24-L30 ?

lidel avatar Aug 12 '22 14:08 lidel

any concerns around adding optional cid parameter that would disable hasher in

We had this before. But we removed it because the only way to verify that the block you are adding is, in fact, related to the CID you want to store it under is to hash it.

achingbrain avatar Aug 12 '22 15:08 achingbrain

this makes sense to do when the data source is remote and hasn't been verified yet; it seems like this is protecting against accidental misuse by the user and/or malicious actions from a compromised part of a program. the block api accepting multiformats/block here would protect the user similarly. which makes me think the main concern is a malicious actor that can access the block api but not the storage api the repo is using.

tabcat avatar Aug 12 '22 16:08 tabcat

Thanks @lidel - you have captured my intent. @achingbrain this stance is attempting to protect the node against the developer that has full control of the node, at the computational expense of said developer. If I want to ruin my node by storing mismatched data, that should be detected by other nodes during bitswap, and (possibly) on rehydration from the data store. This feels congruent to @tabcat's view.

I know much less about this API than you three, but based on @lidel's summary of Kubo's distinctions, dag put|get sounds most like my use case.

For our specific use case, we have moved on to writing to the repo directly now, but it was surprising to us that we could not manage our own multiformats/block creation without incurring a double hash penalty. This is a minor inconvenience when compared to the huge assistance provided to us by having the ipfs and libp2p libraries available, and we are especially grateful for the esm and typescript conversions - that must have been a nightmare to pull off 🙏

inverted-capital avatar Aug 16 '22 21:08 inverted-capital

Notes from today's JS triage call:

  • High level Core API ( dag.put and block.put are part of it) abstract away hashing and content integrity details for a reason.
    • This is not protecting against malicious actors. It is removing footguns from the picture. Regular users and developers who are new to content-addressing and think IPFS is a naive key-value store.
  • Experienced developers with higher performance needs will most likely write precomputed CID and Block to the repo directly, skipping CoreAPI and other intermediate abstractions introduced by js-ipfs.
    • We are unsure what is the value of higher abstraction API. If you really care about perf, direct repo puts are the way to go.
    • Sounds like @inverted-capital ended up doing just that, so we are closing this issue.
      • If you feel this should be documented somehow, consider creating an example of working with repo directly :pray:
      • If you strongly feel there should be an JS-IPFS top-level API for direct repo puts, open a new issue with proposed command design (this issue changed direction a few times, making it hard to triage for others)
        • My personal suggestion would be something like ipfs.repo.unsafe.put, to isolate it from block CoreAPI, and make people ause before using it.

lidel avatar Sep 09 '22 16:09 lidel

I dont think requiring a multiformats Block be used with the block api would create a footgun, but I'll just look to write to the repo in the future.

tabcat avatar Sep 09 '22 18:09 tabcat