estuary Allow barge to add directories to estuary without having to pin them to a local IPFS node first

Allow barge to add directories to estuary without having to pin them to a local IPFS node first

Open sheriflouis-FF opened this issue 4 years ago • 7 comments

As a user with a large data set, directories or buckets of 100s of TiBs, I would like to be able to transfer data, to Estuary, without having to stage it to a local IPFS node using "ipfs add -r" as this requires provisioning a staging area that is larger than the dataset to DAGify it, the pin it to Estuary.

One of the ideas here is to allow barge to do that.

Oct 27 '21 23:10 sheriflouis-FF

is it okay to first create a set of car files locally, e.g

go get github.com/ipld/go-car/cli/car
car create -f out.car directory

and then transfer those car files somewhere, or does it need to be able to stream the data directly off the local node?

Oct 31 '21 10:10 willscott

@sheriflouis-FF based on the existing functionality of collections that was added recently, what is still missing to enable this to work fully?

Dec 20 '21 19:12 brendalee

We might say collections satisfies this requirement once we have the commit function in place.

Dec 20 '21 22:12 sheriflouis-FF

@willscott , we want to avoid users doing additional processing such as directory restructuring, CAR creation etc.

Dec 20 '21 22:12 sheriflouis-FF

makes sense. I think the code in that car creation tool can be used to stream a carv1 from the original read from disk efficiently. The caveat is that if you stream directly the blocks won't be in online deal order, but we should be able to make that work with the boost-like deal flows.

The remaining piece of tooling that i think has been done in another context, but should be combined with this one is to Handling flipping to a new deal / car at the point that you hit a given chunk size

Dec 20 '21 22:12 willscott

@willscott @brendalee It looks like this issue is 6-months stale. Is there someone on another team that wants to see this really badly?

Jun 20 '22 15:06 corinne-antonia

i think there are tools that have been made for faster import. i don't know if this is currently seen as a bottleneck for data onboarding.

Jun 20 '22 19:06 willscott

estuary estuary copied to clipboard

Allow barge to add directories to estuary without having to pin them to a local IPFS node first

estuary
estuary copied to clipboard