upload-client store add should support taking a link + BlobLike that only reads from stream if needed
Motivation:
- when devving on https://github.com/web3-storage/migrate-to-w3up/pull/3 , I had to recreate a bit of upload-client adding individual car parts, because a goal of the migration tool is to migrate using only car part links if possible, and avoid copying car part bytes (out of w3s.link and then piped back into s3 presigned url) unless absolutely necessary. In that case, I already have a car part link, and get get the size somewhat easily. i.e. here in add, I can pass in
linkandsizeso that it doesn't need to be derived frombytes, and I can pass in a (lazily sourced) ReadableStream in place of the full bytes.
Problem:
- I can't find a way of using w3up packages to
store/adda single CAR reference (not value) in a way that only reads CAR bytes when needed
Goal:
-
addshould support being called with[link, BlobLike]where BlobLike has size and.stream()..stream()should only be called ifstore/addresult hasstatus=upload - (also probably) have a version of
addthat returns the whole ucanto receipt (but still does retry etc), e.g.addReturningReceipt
Unblocks:
- use this in https://github.com/web3-storage/migrate-to-w3up/pull/3 in order to remove code that does the same thing as
addbut worse, e.g. I'm hoping that by adding affordance for this into upload-client, I'll be able to use the retry functionality from upload-client.
Scenario: Use Upload Client to Upload a CAR Part from old.web3.storage
-
The user of w3up has access to a JSON object like this describing an upload in old.web3.storage: https://github.com/web3-storage/migrate-to-w3up/blob/w32023-to-w3up/var/sharkdao-upload.json#L1
- Note:
partsis an array of CAR cids. For each of those, this scenario involves wanting to pass that exact cid asstore/addnb.link
- Note:
-
The user of w3up can get the value of
store/add.nb.sizeby sending an httpHEADrequest tow3s.link/ipfs/{cid}for the part and using value inContent-Lengthheader -
The user of w3up can lazily get a stream of bytes corresponding to that car part by sending an http
GETrequest tow3s.link/ipfs/{cid} -
User wants to invoke
store/addand get a receipt, and ensure that the stream of bytes:- is not read from until after the
store/addinvocation and iff the result has status=upload. 99% of time in the migration scenario, we expectstatus=done. So laziness here is the critical property of making sure migration runs don't use egress unnecessarily. - is verified against the car part cid: Caller should be able to expect the write target pointed to from
store/addresponse to verify the bytes against thenb.linkCID. (Regardless, 99% of time this scenario will not even send bytes to that target because we expect status=done not status=upload). - is not used to calculate
.nb.sizebecause that is passed in explicitly from prior knowledge
- is not read from until after the
Right now add requires a Blob, which iiuc does not accomodate this scenario because it requires reading all those bytes out of w3s.link to build a whole Blob. But if we relax the type there to be BlobLike = { stream(): ReadableStream<Uint8Array>, size: Number } then I think this scenario can be accommodated.
I think the real problem here is that perhaps we do not expose low level store/add API ? In other words I do think migration needs to work with a lower level API than perhaps what client exposes. I'm pretty sure we expose low level API also which could be utilized but perhaps as static functions instead. I can try to incorporate some of this into better into a new API.