w3up icon indicating copy to clipboard operation
w3up copied to clipboard

upload-client store add should support taking a link + BlobLike that only reads from stream if needed

Open gobengo opened this issue 2 years ago • 2 comments

Motivation:

  • when devving on https://github.com/web3-storage/migrate-to-w3up/pull/3 , I had to recreate a bit of upload-client adding individual car parts, because a goal of the migration tool is to migrate using only car part links if possible, and avoid copying car part bytes (out of w3s.link and then piped back into s3 presigned url) unless absolutely necessary. In that case, I already have a car part link, and get get the size somewhat easily. i.e. here in add, I can pass in link and size so that it doesn't need to be derived from bytes, and I can pass in a (lazily sourced) ReadableStream in place of the full bytes.

Problem:

  • I can't find a way of using w3up packages to store/add a single CAR reference (not value) in a way that only reads CAR bytes when needed

Goal:

  • add should support being called with [link, BlobLike] where BlobLike has size and .stream(). .stream() should only be called if store/add result has status=upload
  • (also probably) have a version of add that returns the whole ucanto receipt (but still does retry etc), e.g. addReturningReceipt

Unblocks:

  • use this in https://github.com/web3-storage/migrate-to-w3up/pull/3 in order to remove code that does the same thing as add but worse, e.g. I'm hoping that by adding affordance for this into upload-client, I'll be able to use the retry functionality from upload-client.

gobengo avatar Feb 20 '24 23:02 gobengo

Scenario: Use Upload Client to Upload a CAR Part from old.web3.storage

  • The user of w3up has access to a JSON object like this describing an upload in old.web3.storage: https://github.com/web3-storage/migrate-to-w3up/blob/w32023-to-w3up/var/sharkdao-upload.json#L1

    • Note: parts is an array of CAR cids. For each of those, this scenario involves wanting to pass that exact cid as store/add nb.link
  • The user of w3up can get the value of store/add .nb.size by sending an http HEAD request to w3s.link/ipfs/{cid} for the part and using value in Content-Length header

  • The user of w3up can lazily get a stream of bytes corresponding to that car part by sending an http GET request to w3s.link/ipfs/{cid}

  • User wants to invoke store/add and get a receipt, and ensure that the stream of bytes:

    • is not read from until after the store/add invocation and iff the result has status=upload. 99% of time in the migration scenario, we expect status=done. So laziness here is the critical property of making sure migration runs don't use egress unnecessarily.
    • is verified against the car part cid: Caller should be able to expect the write target pointed to from store/add response to verify the bytes against the nb.link CID. (Regardless, 99% of time this scenario will not even send bytes to that target because we expect status=done not status=upload).
    • is not used to calculate .nb.size because that is passed in explicitly from prior knowledge

Right now add requires a Blob, which iiuc does not accomodate this scenario because it requires reading all those bytes out of w3s.link to build a whole Blob. But if we relax the type there to be BlobLike = { stream(): ReadableStream<Uint8Array>, size: Number } then I think this scenario can be accommodated.

gobengo avatar Feb 21 '24 18:02 gobengo

I think the real problem here is that perhaps we do not expose low level store/add API ? In other words I do think migration needs to work with a lower level API than perhaps what client exposes. I'm pretty sure we expose low level API also which could be utilized but perhaps as static functions instead. I can try to incorporate some of this into better into a new API.

Gozala avatar Feb 21 '24 21:02 Gozala