archiveweb.page
archiveweb.page copied to clipboard
Need a way to standardize IPFS storage write/read (eg. Wrap IPFS interfaces in js-ipfs-fetch?)
At the moment there seems to be several ways to interact with IPFS:
- Initializing a js-ipfs node
- Using js-ipfs-http-client either to a local gateway or some brave-specific stuff
- Using web3.storage to publish to IPFS using CAR files
- Using public gateways to download data from IPFS
- Built in protocol handlers in Agregore
I propose unifying these under a fetch interface based on js-ipfs-fetch
I'm thinking we could have something along these lines:
- Detect Agregore, return the global
fetch()
instance - Detect Brave, create a js-ipfs-fetch instance from either the
window.ipfs
object or by talking to the local gateway - Detect config options for a public gateway + web3.storage authentication tokens, wrap the two using a more limited fetch interface (no IPNS?)
- If all else fails, set up a local js-ipfs node and pass it to js-ipfs-fetch
If I understand correctly, the main things we need is the following:
- GET requests on IPFS URLs (with range support)
- POST requests to publish several files into a directory and to get back a CID
- Some sort of equivalent to pinning (less certain on how to best approach this point)
This post is more to track this idea for an eventual implementation.
Also, we could add npm-go-ipfs to the mix here to get a go-ipfs node spun up from JS
I'm gonna start by making it easier to load up IPFS-fetch in browsers and to take all these settings into consideration.
At the moment there seems to be several ways to interact with IPFS:
- Initializing a js-ipfs node
- Using js-ipfs-http-client either to a local gateway or some brave-specific stuff
- Using web3.storage to publish to IPFS using CAR files
- Using public gateways to download data from IPFS
- Built in protocol handlers in Agregore
Yes, this is a good summary + also interfacing with local js-ipfs when running as standalone Electron app.
The goal is simply to have the sharing option to work reliably, both for pinning the data in IPFS and generating a shareable link that others can access.
- If on a browser w/o a local node, probably just using js-ipfs is not going to be reliable enough, so should try to do remote pinning / use web3.storage
- If on a browser with a local node, should ideally have a unified path to access store data on the local node.
For the shareable link, there's also a few options:
- Using a gateway link is probably the most cross-platform
- Using a link to
replayweb.page?source=ipfs://
relies on a particular version of replayweb.page - Using an
ipfs://
link is limited to browsers that support custom protocols.
I propose unifying these under a fetch interface based on js-ipfs-fetch
This definitely makes sense for GET/ read access, but slightly less clear for how to abstract ipfs.addAll()
/ ipfs.add
and other options as, well as the web3.storage http apis.
If connecting to a remote http server, that probably makes sense, but is still the best option if just sending data to a local node, assuming we may be dealing with GB of data. I wonder how the CAR-file abstraction fits in here, since that's how web3.storage handles things..
I'm thinking we could have something along these lines:
- Detect Agregore, return the global
fetch()
instance- Detect Brave, create a js-ipfs-fetch instance from either the
window.ipfs
object or by talking to the local gateway- Detect config options for a public gateway + web3.storage authentication tokens, wrap the two using a more limited fetch interface (no IPNS?)
- If all else fails, set up a local js-ipfs node and pass it to js-ipfs-fetch
If I understand correctly, the main things we need is the following:
- GET requests on IPFS URLs (with range support)
Yes
- POST requests to publish several files into a directory and to get back a CID
- Some sort of equivalent to pinning (less certain on how to best approach this point)
Yes, I think that's the basic, but could become more complex, if want to add to an existing directory, or control pinning.
This post is more to track this idea for an eventual implementation.
Thanks for starting this!
Regarding links, would it make sense to provide several options in the UI?
This definitely makes sense for GET/ read access, but slightly less clear for how to abstract ipfs.addAll() / ipfs.add and other options as, well as the web3.storage http apis.
web3.storage supports doing an upload using FormData
which is something Agregore supports via fetch()
too. This should be easy to abstract with the addAll
thing.
Regarding CAR files, I think we can just sidestep them all together for now. Using FormData we can potentially skip loading data into JS at all and just pass a File
reference in and it'll be serialized within the browser. There might be a bit of funkiness for js-ipfs, but I can work around it I think.
I'm thinking it'd be useful to represent the following in the app somewhere that users can configure:
- List of public gateways for linking to to archives outside of
ipfs://
(also for reading when we don't have a local node) - List of pinning services to add your data to (auto-detect pinning to local node if possible)
- List of writable services (web3.storage, etc) (auto-detect writing to local node if possible)
I think we can figure out some defaults here like some cloud stuff run by WebRecorder (or reaching out to web3.storage ourselves), but I think it'd be good to try to expose it to users in general.
I'm about to start testing the automatic-ipfs-fetch thing in browsers, and I'm gonna put off the web3.storage stuff for now while we figure out how it's configured (lemme know if I should put focus on it sooner instead)
Woof. Just realized that getting FormData to work with js-ipfs-fetch will require a bit of work. Though I think that won't be an issue for our case since we already have a way to PUT
data over top of an existing directory and don't necessarily need to upload everything at once.
I'm thinking it'd be useful to represent the following in the app somewhere that users can configure:
Yes, it might be time to start adding a settings page for archiveweb.page, though, we should have sensible defaults that 'just work' for the most part.
I think the unified API should focus on two-modes
- local node support, if available, such as if using Brave or Agregore, or Electron app.
- web3.storage support otherwise
I think the portable format that we'll want to use is CAR files, as that's the portable format that can uploaded to web3.storage as well as added to a local node. Additionally, CAR files will allow us to support custom chunking/tree building which we should explore for WACZ files, and have that be separate from the actual uploading. For this reason, and the encoding that happens for FormData, I think it is probably not the right abstraction here given the flexibility that we'd want.
I think the abstraction that would be most useful is a separate library that can take a CAR file, and options, like useLocalNode
and useRemoteStorage
, etc.. and will 'do the right thing' in terms of trying one or more services, and report back what is possible given the current capabilities (eg. added to web3.storage, or added to local node).
Cool maybe an API that looks something like this:
create({publicGetwayURL, web3Credentials, localGatewayURL})
-
get(url, {start, end})
=> Stream of bytes or buffer? -
update(url, filename, buffer)
=> URL -
share(car)
=> URL -
pin(url)
=> Status -
isPinned(url)
=> Boolean -
unpin(url)
=> Status
Under the hood this can attempt to auto-detect the best method of supporting these APIs. (along with fallbacks for custom web3.storage and pinning service APIs)
For working with ipfs-car in JavaScript there's this API: https://github.com/web3-storage/ipfs-car
I think we can also use this for generating CAR files from the raw data. I'm not sure if it supports IPFS style chunking of large files, however. Also not clear if we could "extend" an existing car (probably not, but I guess we don't really need to).
js-ipfs doesn't have many options for chunking data, sadly. 😅 https://github.com/ipfs/js-ipfs/blob/master/docs/core-api/FILES.md#notes
Also with using a memory blockstore, it's not clear whether all data gets persisted in memory as the CAR file is being generated and how the garbage collection for that works. 🤔 Could be an issue for multi-GB archives.
How are archives currently persisted?
We can assemble custom UnixFS nodes with this: https://github.com/ipfs/js-ipfs-unixfs/tree/master/packages/ipfs-unixfs#create-an-unixfs-data-element
Not entirely sure how this could be added to a CAR file. 🤔 Would be good to sit down with some folks that already know the internals to see the best path there. (the ipfs-car importer seems to expect raw data rather than IPFS dag nodes)
For chunking we can use this code:
https://github.com/ipfs/js-ipfs-unixfs/blob/master/packages/ipfs-unixfs-importer/src/chunker/fixed-size.js
We can probably use the BufferList to split up the chunks
TODO for API:
- Detect what's available before initializing
- Alternately auto-enable when initializing
Should we have a config to do "only local node"?
Also, IPFS integration might be opt-in and we shouldn't set up js-ipfs and instead prompt the user with info that IPFS isn't supported.
Also also, for node.js we should use thje go-ipfs package since it's more reliable than js-ipfs https://github.com/ipfs/npm-go-ipfs
Also also also, we should support Estuary's CAR uploads: https://docs.estuary.tech/api-content-add-car
Gonna hack on the offset chunker code and maybe publish a module.
Also reached out to the js-ipfs-unixfs team to see if they'd want a PR with it once it's done. https://github.com/ipfs/js-ipfs-unixfs/issues/241
This is where I can find where the entry offsets are calculated: https://github.com/webrecorder/wabac.js/blob/main/src/wacz/ziprangereader.js#L50
Created https://ranger.mauve.moe/auto-js-ipfs/
source: https://github.com/RangerMauve/auto-js-ipfs/pull/1
Going to make an SPA for uploading web archives and uploading them to IPFS. Initially it'll upload raw, but it will also do some content ware chunking along the lines of what we looked into before.
This has now been mostly implemented, including custom chunking, which has been moved to https://github.com/webrecorder/awp-sw The current implementation now provides unified access via auto-js-ipfs (in the render process) to detect if local IPFS node is running (default kubo, IPFS Desktop or embedded in Brave) and allows sharing if one is found. No local node is started in Electron app (though could add in the future) and all ipfs-specific code has been removed from Electron node process. It is also possible (but not currently enabled) to write to web3.storage if a token is provided. Not including a hard-coded token, and instead will implement web3.storage support along with UCAN support. Closing this issue.