archiveweb.page icon indicating copy to clipboard operation
archiveweb.page copied to clipboard

Need a way to standardize IPFS storage write/read (eg. Wrap IPFS interfaces in js-ipfs-fetch?)

Open RangerMauve opened this issue 2 years ago • 17 comments

At the moment there seems to be several ways to interact with IPFS:

  • Initializing a js-ipfs node
  • Using js-ipfs-http-client either to a local gateway or some brave-specific stuff
  • Using web3.storage to publish to IPFS using CAR files
  • Using public gateways to download data from IPFS
  • Built in protocol handlers in Agregore

I propose unifying these under a fetch interface based on js-ipfs-fetch

I'm thinking we could have something along these lines:

  • Detect Agregore, return the global fetch() instance
  • Detect Brave, create a js-ipfs-fetch instance from either the window.ipfs object or by talking to the local gateway
  • Detect config options for a public gateway + web3.storage authentication tokens, wrap the two using a more limited fetch interface (no IPNS?)
  • If all else fails, set up a local js-ipfs node and pass it to js-ipfs-fetch

If I understand correctly, the main things we need is the following:

  • GET requests on IPFS URLs (with range support)
  • POST requests to publish several files into a directory and to get back a CID
  • Some sort of equivalent to pinning (less certain on how to best approach this point)

This post is more to track this idea for an eventual implementation.

RangerMauve avatar Jun 18 '22 00:06 RangerMauve

Also, we could add npm-go-ipfs to the mix here to get a go-ipfs node spun up from JS

RangerMauve avatar Jun 22 '22 22:06 RangerMauve

I'm gonna start by making it easier to load up IPFS-fetch in browsers and to take all these settings into consideration.

RangerMauve avatar Jul 04 '22 19:07 RangerMauve

At the moment there seems to be several ways to interact with IPFS:

  • Initializing a js-ipfs node
  • Using js-ipfs-http-client either to a local gateway or some brave-specific stuff
  • Using web3.storage to publish to IPFS using CAR files
  • Using public gateways to download data from IPFS
  • Built in protocol handlers in Agregore

Yes, this is a good summary + also interfacing with local js-ipfs when running as standalone Electron app.

The goal is simply to have the sharing option to work reliably, both for pinning the data in IPFS and generating a shareable link that others can access.

  • If on a browser w/o a local node, probably just using js-ipfs is not going to be reliable enough, so should try to do remote pinning / use web3.storage
  • If on a browser with a local node, should ideally have a unified path to access store data on the local node.

For the shareable link, there's also a few options:

  • Using a gateway link is probably the most cross-platform
  • Using a link to replayweb.page?source=ipfs:// relies on a particular version of replayweb.page
  • Using an ipfs:// link is limited to browsers that support custom protocols.

I propose unifying these under a fetch interface based on js-ipfs-fetch

This definitely makes sense for GET/ read access, but slightly less clear for how to abstract ipfs.addAll() / ipfs.add and other options as, well as the web3.storage http apis. If connecting to a remote http server, that probably makes sense, but is still the best option if just sending data to a local node, assuming we may be dealing with GB of data. I wonder how the CAR-file abstraction fits in here, since that's how web3.storage handles things..

I'm thinking we could have something along these lines:

  • Detect Agregore, return the global fetch() instance
  • Detect Brave, create a js-ipfs-fetch instance from either the window.ipfs object or by talking to the local gateway
  • Detect config options for a public gateway + web3.storage authentication tokens, wrap the two using a more limited fetch interface (no IPNS?)
  • If all else fails, set up a local js-ipfs node and pass it to js-ipfs-fetch

If I understand correctly, the main things we need is the following:

  • GET requests on IPFS URLs (with range support)

Yes

  • POST requests to publish several files into a directory and to get back a CID
  • Some sort of equivalent to pinning (less certain on how to best approach this point)

Yes, I think that's the basic, but could become more complex, if want to add to an existing directory, or control pinning.

This post is more to track this idea for an eventual implementation.

Thanks for starting this!

ikreymer avatar Jul 06 '22 00:07 ikreymer

Regarding links, would it make sense to provide several options in the UI?

This definitely makes sense for GET/ read access, but slightly less clear for how to abstract ipfs.addAll() / ipfs.add and other options as, well as the web3.storage http apis.

web3.storage supports doing an upload using FormData which is something Agregore supports via fetch() too. This should be easy to abstract with the addAll thing.

Regarding CAR files, I think we can just sidestep them all together for now. Using FormData we can potentially skip loading data into JS at all and just pass a File reference in and it'll be serialized within the browser. There might be a bit of funkiness for js-ipfs, but I can work around it I think.

RangerMauve avatar Jul 08 '22 00:07 RangerMauve

I'm thinking it'd be useful to represent the following in the app somewhere that users can configure:

  • List of public gateways for linking to to archives outside of ipfs:// (also for reading when we don't have a local node)
  • List of pinning services to add your data to (auto-detect pinning to local node if possible)
  • List of writable services (web3.storage, etc) (auto-detect writing to local node if possible)

I think we can figure out some defaults here like some cloud stuff run by WebRecorder (or reaching out to web3.storage ourselves), but I think it'd be good to try to expose it to users in general.

RangerMauve avatar Jul 22 '22 18:07 RangerMauve

I'm about to start testing the automatic-ipfs-fetch thing in browsers, and I'm gonna put off the web3.storage stuff for now while we figure out how it's configured (lemme know if I should put focus on it sooner instead)

RangerMauve avatar Jul 22 '22 18:07 RangerMauve

Woof. Just realized that getting FormData to work with js-ipfs-fetch will require a bit of work. Though I think that won't be an issue for our case since we already have a way to PUT data over top of an existing directory and don't necessarily need to upload everything at once.

RangerMauve avatar Jul 22 '22 20:07 RangerMauve

I'm thinking it'd be useful to represent the following in the app somewhere that users can configure:

Yes, it might be time to start adding a settings page for archiveweb.page, though, we should have sensible defaults that 'just work' for the most part.

I think the unified API should focus on two-modes

  1. local node support, if available, such as if using Brave or Agregore, or Electron app.
  2. web3.storage support otherwise

I think the portable format that we'll want to use is CAR files, as that's the portable format that can uploaded to web3.storage as well as added to a local node. Additionally, CAR files will allow us to support custom chunking/tree building which we should explore for WACZ files, and have that be separate from the actual uploading. For this reason, and the encoding that happens for FormData, I think it is probably not the right abstraction here given the flexibility that we'd want.

I think the abstraction that would be most useful is a separate library that can take a CAR file, and options, like useLocalNode and useRemoteStorage, etc.. and will 'do the right thing' in terms of trying one or more services, and report back what is possible given the current capabilities (eg. added to web3.storage, or added to local node).

ikreymer avatar Jul 23 '22 10:07 ikreymer

Cool maybe an API that looks something like this:

create({publicGetwayURL, web3Credentials, localGatewayURL})

  • get(url, {start, end}) => Stream of bytes or buffer?
  • update(url, filename, buffer) => URL
  • share(car) => URL
  • pin(url) => Status
  • isPinned(url) => Boolean
  • unpin(url) => Status

Under the hood this can attempt to auto-detect the best method of supporting these APIs. (along with fallbacks for custom web3.storage and pinning service APIs)

RangerMauve avatar Jul 25 '22 18:07 RangerMauve

For working with ipfs-car in JavaScript there's this API: https://github.com/web3-storage/ipfs-car

I think we can also use this for generating CAR files from the raw data. I'm not sure if it supports IPFS style chunking of large files, however. Also not clear if we could "extend" an existing car (probably not, but I guess we don't really need to).

RangerMauve avatar Jul 25 '22 18:07 RangerMauve

js-ipfs doesn't have many options for chunking data, sadly. 😅 https://github.com/ipfs/js-ipfs/blob/master/docs/core-api/FILES.md#notes

Also with using a memory blockstore, it's not clear whether all data gets persisted in memory as the CAR file is being generated and how the garbage collection for that works. 🤔 Could be an issue for multi-GB archives.

How are archives currently persisted?

RangerMauve avatar Jul 25 '22 18:07 RangerMauve

We can assemble custom UnixFS nodes with this: https://github.com/ipfs/js-ipfs-unixfs/tree/master/packages/ipfs-unixfs#create-an-unixfs-data-element

Not entirely sure how this could be added to a CAR file. 🤔 Would be good to sit down with some folks that already know the internals to see the best path there. (the ipfs-car importer seems to expect raw data rather than IPFS dag nodes)

RangerMauve avatar Jul 25 '22 19:07 RangerMauve

For chunking we can use this code:

https://github.com/ipfs/js-ipfs-unixfs/blob/master/packages/ipfs-unixfs-importer/src/chunker/fixed-size.js

We can probably use the BufferList to split up the chunks

RangerMauve avatar Aug 03 '22 21:08 RangerMauve

TODO for API:

  • Detect what's available before initializing
  • Alternately auto-enable when initializing

Should we have a config to do "only local node"?

Also, IPFS integration might be opt-in and we shouldn't set up js-ipfs and instead prompt the user with info that IPFS isn't supported.

Also also, for node.js we should use thje go-ipfs package since it's more reliable than js-ipfs https://github.com/ipfs/npm-go-ipfs

Also also also, we should support Estuary's CAR uploads: https://docs.estuary.tech/api-content-add-car

RangerMauve avatar Aug 03 '22 22:08 RangerMauve

Gonna hack on the offset chunker code and maybe publish a module.

Also reached out to the js-ipfs-unixfs team to see if they'd want a PR with it once it's done. https://github.com/ipfs/js-ipfs-unixfs/issues/241

RangerMauve avatar Aug 04 '22 14:08 RangerMauve

This is where I can find where the entry offsets are calculated: https://github.com/webrecorder/wabac.js/blob/main/src/wacz/ziprangereader.js#L50

RangerMauve avatar Aug 04 '22 14:08 RangerMauve

Created https://ranger.mauve.moe/auto-js-ipfs/

source: https://github.com/RangerMauve/auto-js-ipfs/pull/1

Going to make an SPA for uploading web archives and uploading them to IPFS. Initially it'll upload raw, but it will also do some content ware chunking along the lines of what we looked into before.

RangerMauve avatar Sep 13 '22 16:09 RangerMauve

This has now been mostly implemented, including custom chunking, which has been moved to https://github.com/webrecorder/awp-sw The current implementation now provides unified access via auto-js-ipfs (in the render process) to detect if local IPFS node is running (default kubo, IPFS Desktop or embedded in Brave) and allows sharing if one is found. No local node is started in Electron app (though could add in the future) and all ipfs-specific code has been removed from Electron node process. It is also possible (but not currently enabled) to write to web3.storage if a token is provided. Not including a hard-coded token, and instead will implement web3.storage support along with UCAN support. Closing this issue.

ikreymer avatar Dec 08 '22 00:12 ikreymer