js-ipfs-http-client
js-ipfs-http-client copied to clipboard
[WIP] feat: add support for chunked uploads
This is still a work in progress to add support for chunked uploads (ipfs.add
) and fix multiple issues related to adding big files.
Tests are filtered here https://github.com/ipfs/js-ipfs-api/blob/90c40363fbcd55d29307e51f4feabb8be867ded8/test/add-experimental.spec.js#L38-L46 to make review easy, just run ipfs daemon with https://github.com/ipfs/js-ipfs/pull/1540
features/fixes in this PR together with https://github.com/ipfs/js-ipfs/pull/1540:
- big data add non-chunked (this will either break with browser memory or hit the maxBytes config in the daemon, see next)
- really big data add chunked (theoretically the limit is daemon disk space or maybe request timeouts)
- streaming progress reporting
- error handling and reporting
- add multiple files with wrapWithDirectory
- improved browser support, handles
File
's directly from the input
const files = document.getElementById('file').files;
this.ipfsApi
.add([...files], {
wrapWithDirectory: true,
experimental: true,
progress: prog => console.log(`received back: ${prog}`)
chunkSize: 10 * 1024 * 1024
})
.then(console.log)
.catch(console.error);
- jsdoc for top level api and more
Notes:
- trailers https://stackoverflow.com/questions/13371367/do-any-browsers-support-trailers-sent-in-chunked-encoding-responses
Needs:
- https://github.com/ipfs/js-ipfs/pull/1540
Todo:
- [x] validate this example works after https://github.com/ipfs/js-ipfs-api/tree/master/examples/upload-file-via-browser
- [x] what to do with progress ? add another handler ?
- [x] multiple files return from the daemon only first hash
- [ ] ~~concurrent upload chunks~~ new PR for this
- [x] check uuid impl (maybe change to uuid v5 or nano-id)
- [x] avoid preflight as much has possible
- [x] callbackify top level
- [x] try handling non chunked
- [x] fix multipart boundary handling for non chunked
Related:
- https://github.com/ipfs/js-ipfs-api/issues/654
- https://github.com/ipfs/js-ipfs-api/issues/788
- https://github.com/ipfs/js-ipfs/issues/952
- https://github.com/ipfs/js-ipfs-api/issues/842
- https://github.com/ipfs/js-ipfs-api/issues/797 - make sure this PR fixes this /cc @lidel
- https://github.com/ipfs-shipyard/ipfs-companion/issues/480 - will this PR fix this ? /cc @lidel
@lidel your understanding is correct :), updated the PR with some of your feedback
regarding the uuid i had looked into it, for now i want to keep the poor man's version should be safe for now, it goes over math.random a couple of times. (i have a note to go back to this)
final integration will use the normal add api, only with one change a new option called chunkSize, if this option is set to a number we go through the chunked codepath.
about progress im still trying to add directly without files if i succeed this should work the same as right now, if not, one solution i though was adding a new handler uploadProgress
.
the current progress
handler would still work as-is but only in the last request and it would mean adding to ipfs progress only and uploadProgress
would mean upload only progress. With this we wouldn't actually break anything relying on the progress
handler the user would only see 0% for a long time (uploading) and on the last request it would update correctly as data goes in ipfs (adding) to improve on this the developer will have the new updloadProgress
. Does this make sense ?
@hugomrdias thanks!
My view is that we should do our best to to make it work without changing current progress
API.
Details of the chunked upload should be abstracted away in best-effort fashion and hidden behind existing progress reporter.
What if we detect presence of chunkSize
parameter, and switch logic used for progress reporting behind the scenes?
For upload split into N chunks:
- uploading chunks 1 to (N-1) would show "upload only progress" (initially we could just return % based on the number of uploaded chunks, more resolution can be added later)
- uploading the last chunk N could show real "add progress" but only when it is bigger than "upload progress"
The end result would be a best-effort progress reporting that works with existing API and that is not stuck at 0% until the last chunk and behaves in expected manner (% always grows).
@lidel the first two topics should be adressed in the last commit
about the resumeable stuff, its mostly:
- having good errors for failed chunks, http-api should retry those
- extra GET endpoint to return uploaded chunks with this response http-api should able to figure out the missing chunks and only upload those
- one thing missing is how too identify a upload session to resume the current uuid is not enough, need to do more research for this
so, lets leave the resume feature to a follow up PR
the jsdoc should create some nice docs with documentation.js
npx documentation serve ./js-ipfs-api/src/add2/add2.js -w -f html
run this cmd outside of the repo's folder to get the latest documentation.js, aegir still uses an old one.
also should give code completion to anyone using editors with jsdoc support
this can bubble up to the top level public api with minimal changes to this file
@Stebalien could we get your thoughts on adding this to go-ipfs?
This PR is adding a feature to the HTTP add
endpoint that will allow big files to be uploaded to IPFS by making multiple requests.
@lidel kindly put together a good summary of the proposed process:
- Upload payload is split into small parts (
chunkSize = 256000
)- Each part is sent as a a sequence of HTTP POST requests that have
- a unique identifier for entire upload session (uuid? – see below)
- a sequential counter within upload session (a chunk index)
- API backend needs to support additional HTTP headers to perform re-assembly of entire payload from chunks and passing it to the regular
ipfs.files.add
call in a transparent manner
- PR for js-ipfs: https://github.com/ipfs/js-ipfs/pull/1540
- PR for go-ipfs: (TODO)
Reasons for doing this:
- It's not possible to stream a HTTP upload request (in Firefox) without buffering the entire payload into memory first
- Has potential to allow resume for failed upload requests
@hugomrdias @lidel I think that regardless of what happens with this PR we need to switch to using the streaming fetch API. Firefox is notably the only browser that hasn't shipped the streams API yet but it sounds like this might happen soon. I think we can conditionally opt out of it for Firefox for the time being.
Switching to using the streaming fetch API will solve the buffering issue without any changes to the HTTP API and depending on priorities for go-ipfs we might be able to ship this before chunked uploads.
It's also worth noting that, streaming fetch will be way more efficient then multiple HTTP requests for chunked uploading.
That's only for response bodies not request bodies, this is the only way currently available to us. I didn't find any indication that request bodies will get streams soon on any browser
That's only for response bodies not request bodies, this is the only way currently available to us. I didn't find any indication that request bodies will get streams soon on any browser
You're absolutely right - my bad. Thanks for clarifying!
Sounds like worth mentioning here, in case concepts from this PR are revisited in the future:
- a proposal of open protocol for resumable file uploads: https://tus.io / https://tus.io/protocols/resumable-upload.html
Sounds like worth mentioning here, in case concepts from this PR are revisited in the future:
- a proposal of open protocol for resumable file uploads: tus.io / tus.io/protocols/resumable-upload.html
yep i based the impl in tus