estuary
estuary copied to clipboard
Allow barge to add directories to estuary without having to pin them to a local IPFS node first
As a user with a large data set, directories or buckets of 100s of TiBs, I would like to be able to transfer data, to Estuary, without having to stage it to a local IPFS node using "ipfs add -r" as this requires provisioning a staging area that is larger than the dataset to DAGify it, the pin it to Estuary.
One of the ideas here is to allow barge to do that.
is it okay to first create a set of car files locally, e.g
go get github.com/ipld/go-car/cli/car
car create -f out.car directory
and then transfer those car files somewhere, or does it need to be able to stream the data directly off the local node?
@sheriflouis-FF based on the existing functionality of collections that was added recently, what is still missing to enable this to work fully?
We might say collections satisfies this requirement once we have the commit function in place.
@willscott , we want to avoid users doing additional processing such as directory restructuring, CAR creation etc.
makes sense. I think the code in that car creation tool can be used to stream a carv1 from the original read from disk efficiently. The caveat is that if you stream directly the blocks won't be in online deal order, but we should be able to make that work with the boost-like deal flows.
The remaining piece of tooling that i think has been done in another context, but should be combined with this one is to Handling flipping to a new deal / car at the point that you hit a given chunk size
@willscott @brendalee It looks like this issue is 6-months stale. Is there someone on another team that wants to see this really badly?
i think there are tools that have been made for faster import. i don't know if this is currently seen as a bottleneck for data onboarding.