deno-aws_api icon indicating copy to clipboard operation
deno-aws_api copied to clipboard

WIP: Streaming request bodies (for S3)

Open danopia opened this issue 3 years ago • 8 comments

Turns out this is pretty tough API-wise because S3 uploads must have a content-length header, so the only way to 'stream' an upload of truely unknown length is to initiate a multi-part upload.

danopia avatar Jan 19 '22 18:01 danopia

+1 for this api. Right now, I am using putObject which accepts a Buffer or more broadly speaking, a Uint8Array, but I have to read the whole file into the buffer and send it.

With a streaming api, similar to AWS SDK's upload, I could pipe a request body directly to s3, which would greatly increase performance.

TillaTheHun0 avatar Mar 31 '22 17:03 TillaTheHun0

With a streaming api, similar to AWS SDK's upload, I could pipe a request body directly to s3, which would greatly increase performance.

Thank you for reporting your use-case!

So looking at upload(), that specific function is implemented by the AWS.S3.ManagedUpload class. It appears to chop your stream into individual 5MB segments and upload them with a Multipart Upload strategy. This is actually separate from streaming request bodies because each 'part' is buffered. So I will track managed/multipart uploads in a separate issue 😅

S3 only supports true streaming uploads if you will know the length of the body upfront. That's what this PR ⬆️ implements. Maybe you know your object size ahead of time, in which case you don't actually need the multipart upload() (but the parallelization probably still helps speeds with huge objects)

danopia avatar Mar 31 '22 18:03 danopia

@TillaTheHun0 I have a working pass on chunked/parallelized S3 uploading in #31, please feel free to vet the behavior in your own stuff before I get it clean enough to merge. You can import { multiPartUpload } from "https://raw.githubusercontent.com/cloudydeno/deno-aws_api/89703336482f008f3f6ebd9f759370fd393ba362/lib/helpers/s3-upload.ts" and then call it with an S3 client as shown in #31's text. If you have any feedback please comment on that PR. Thanks again!

danopia avatar Mar 31 '22 23:03 danopia

@danopia oh sweet, I will check it out! 👍

TillaTheHun0 avatar Apr 01 '22 13:04 TillaTheHun0

+1 for this api. Right now, I am using putObject which accepts a Buffer or more broadly speaking, a Uint8Array, but I have to read the whole file into the buffer and send it.

With a streaming api, similar to AWS SDK's upload, I could pipe a request body directly to s3, which would greatly increase performance.

@TillaTheHun0 May I know how to supply a Buffer to putObject? Looking at the s3.file, I could only supply either one of the following values:

  • Uint8Array
  • string
image

yogesnsamy avatar Oct 14 '22 13:10 yogesnsamy

@yogesnsamy the code is here

Basically we're using readAllwhich accepts a Deno.Reader and reads the contents into a Uint8Array that we pass to putObject

TillaTheHun0 avatar Oct 14 '22 13:10 TillaTheHun0

@yogesnsamy the code is here

Basically we're using readAllwhich accepts a Deno.Reader and reads the contents into a Uint8Array that we pass to putObject

Many thanks @TillaTheHun0. It works.

yogesnsamy avatar Oct 14 '22 13:10 yogesnsamy

Question for y'all, when calling PutObject with a Reader, do you know the byte size of your Reader upfront? That would make streaming easier to implement

danopia avatar Oct 14 '22 19:10 danopia

🚀 A managed-upload module (using S3 multipart) just shipped in v0.8.0. This is most useful for uploading large files (50MB and up) as it will break up your ReadableStream<Uint8Array> into multiple S3 requests.

🗒️ Also, hot tip: in cases where you aren't worried about the file fitting into memory, you can use a Response to easily buffer a ReadableStream<Uint8Array>:

const bodyBuffer = new Uint8Array(await new Response(bodyStream).arrayBuffer());

🗑️ This pull request is still in draft and I don't think it's going to land this time because it's not very useful. True request streaming is only possible to S3 if the body's length is known upfront. That limitation means that this library cannot just accept a ReadableStream<Uint8Array> on its own. The library could buffer the stream up for you, but this would really be a lie, so I'm not immediately in favor of it 🤔

danopia avatar Feb 26 '23 01:02 danopia