aws-sdk-js icon indicating copy to clipboard operation
aws-sdk-js copied to clipboard

S3.putObject only accepts streams that it can determine the length of

Open seebees opened this issue 5 years ago • 19 comments

Is your feature request related to a problem? Please describe.

According to https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property the Body element can be a ReadableStream, however in practice, it will only succeed if the sdk can determine the length (see #2661 or https://github.com/aws/aws-sdk-js/blob/master/lib/event_listeners.js#L167). Looking at https://github.com/aws/aws-sdk-js/blob/master/lib/util.js#L198 a stream will only work if there is a path. This means that only things like fs.createReadStream will work. If the stream is transformed in any way, it will no longer work.

e.g.

Body = fs.createReadStream('./someFile').pipe(someTransform)
s3.putObject({ Bucket, Key, Body }).promise().then(console.log)

Error: Cannot determine length of [object Object]
  at Object.byteLength (aws-sdk/lib/util.js:200:26)
  at Request.SET_CONTENT_LENGTH (aws-sdk/lib/event_listeners.js:163:40)
  at Request.callListeners (aws-sdk/lib/sequential_executor.js:106:20)
  at Request.emit (aws-sdk/lib/sequential_executor.js:78:10)
  at Request.emit (aws-sdk/lib/request.js:683:14)
  at Request.transition (aws-sdk/lib/request.js:22:10)
  at AcceptorStateMachine.runTo (aws-sdk/lib/state_machine.js:14:12)
  at aws-sdk/lib/state_machine.js:26:10
  at Request.<anonymous> (aws-sdk/lib/request.js:38:9)
  at Request.<anonymous> (aws-sdk/lib/request.js:685:12)

Describe the solution you'd like

Update the documentation to more clearly identify which streams will work, and point users to S3.upload

Describe alternatives you've considered

A caller could include the content length, but I think that S3.upload is just a better answer.

seebees avatar Nov 15 '19 23:11 seebees

@seebees I reached out to the respective service teams, will update here once I hear back from them.

ajredniwja avatar Nov 22 '19 22:11 ajredniwja

Agreed. I have the case where I'm getting the stream from a Request body (from a graphQL) API. I have to first read the stream into a Buffer to then be able to invoke putObject

This is quite disturbing, and as it is undocumented, it actually through me off for a few hours before I understood what was wrong

gbataille avatar Feb 14 '20 12:02 gbataille

By the way, I think it is the same as #2442

gbataille avatar Feb 14 '20 12:02 gbataille

This is still an issue...

amouly avatar Mar 23 '20 05:03 amouly

EDIT - I POSTED UPDATED CODE IN A MESSAGE BELOW

I was having this issue using node-fetch

I got it to work by reading the stream into a Buffer like what @gbataille said.

const res = await fetch(url)
const buffer = await res.buffer()
s3.putObject(
    { ACL: 'public-read', Body: buffer, Bucket: 'test', Key: 'fileName' },
    (err, data) => (err ? reject(err) : resolve(data))
  );

EDIT - I POSTED UPDATED CODE IN A MESSAGE BELOW

RusseII avatar Mar 29 '20 21:03 RusseII

@amouly @RusseII @gbataille https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property should do what you want.

Under it all S3 must know the size of the object, but upload will intelligently chunk the message into S3 for you.

seebees avatar Mar 30 '20 02:03 seebees

@RusseII I quickly hit memory issues with reading everything in a Buffer :D

But then indeed, as @seebees mentions, the upload method seems to be of higher level and it works with any kind of stream (it seems)

I don't quite know why those are different. I think putObject simply exposes the Web API while the S3 Service adds some helper methods...

gbataille avatar Mar 30 '20 03:03 gbataille

upload wraps the multi part upload. putObject is just an S3 put, so it requires knowing the exact size.

You can use upload to manipulate the partSize per the documentation linked above.

seebees avatar Mar 30 '20 16:03 seebees

Thanks for info @seebees & @gbataille!!!

I fixed it to use the s3 upload method

import fetch from 'node-fetch';

const res = await fetch(url)
const stream = res.body
s3.upload(
    { ACL: 'public-read', Body: buffer, Bucket: 'test', Key: 'fileName' },
    (err, data) => (err ? reject(err) : resolve(data))
  );

Is the new code :)

RusseII avatar Apr 11 '20 20:04 RusseII

I believe upload() only exists in the SDK v2 which is being deprecated, can anyone confirm if this the case, and whether there is a v3 equivalent?

bertrand-caron avatar May 25 '21 03:05 bertrand-caron

In the meantime, I used the following solution (inspired by this StackOverflow post) to turn a Readable stream into a Buffer:

import {Readable} from 'stream'

export async function streamToBuffer(stream: Readable): Promise<Buffer> {
    const chunks = []
    return new Promise((resolve, reject) => {
        stream.on('data', (chunk) => chunks.push(Buffer.from(chunk)))
        stream.on('error', (err) => reject(err))
        stream.on('end', () => resolve(Buffer.concat(chunks)))
    })
}

bertrand-caron avatar May 25 '21 05:05 bertrand-caron

In the meantime, I used the following solution (inspired by this StackOverflow post) to turn a Readable stream into a Buffer:

import {Readable} from 'stream'

export async function streamToBuffer(stream: Readable): Promise<Buffer> {
    const chunks = []
    return new Promise((resolve, reject) => {
        stream.on('data', (chunk) => chunks.push(Buffer.from(chunk)))
        stream.on('error', (err) => reject(err))
        stream.on('end', () => resolve(Buffer.concat(chunks)))
    })
}

https://aws.amazon.com/blogs/developer/modular-packages-in-aws-sdk-for-javascript/ This seems to be the answer.

sPaCeMoNk3yIam avatar Jun 01 '21 17:06 sPaCeMoNk3yIam

Thanks for the pointer @sPaCeMoNk3yIam. From the blog post, this is the absolute way to go for uploading from a stream. Worked like a charm. Watch your mime types, however. Also, you might have to wrap the stream if the underlying code doesn't recognize it. I'm spooling files from a tar'd gzip'd file, and the S3 code didn't recognize the stream as something it could use. I wrapped it in a passthrough stream.

ETA (05 JUL 2022): In case it's not apparent from the code or my above comments, the code below uses a stream during multipart upload. The length of the file need not be known ahead of time. No complete reading of the stream - just plain old streaming.

import { S3Client } from "@aws-sdk/client-s3";
import { Upload } from "@aws-sdk/lib-storage";
const { PassThrough } = require('stream');
const mime = require('mime-types');

//... entry is a stream

const input = {
  ACL: "public-read",  // ACL not needed if CloudFront can pull via OAI
  Bucket: bucketName,
  Key: outputPath + entry.path,
  Body: entry.pipe(new PassThrough()),
  ContentType: mime.lookup(entry.path),
}

try {
  const multipartUpload = new Upload({
      client: new S3Client({}),
      params: input,
  });

  console.log("Upload file to:", `s3://${input.Bucket}/${input.Key}`);
  await multipartUpload.done();
} catch (err) {
  console.error(err);
}

codeedog avatar Jun 25 '21 09:06 codeedog

I just wanted to say thank you for your comments @sPaCeMoNk3yIam and @codeedog ! I really can't believe that something as explicitly required as this, is missing from the guide and SDK API reference docs... You can't assume that every stream used will be a file stream or with a known size.... That's the point of streams! Anyway... thanks again!

mitsos1os avatar Oct 18 '21 16:10 mitsos1os

If you know the size, you can explicitly set the length attribute

stream['length'] = size

For larger files, this is better than converting to a buffer because it streams the data.

This simple hack worked for me.

magland avatar Mar 30 '22 16:03 magland

Dear amazonians, any ETA for this ? seriously ..
Should I move to GCP ? or R2 ? :D

mgara avatar Jul 05 '22 15:07 mgara

stream['length'] = size

it seems this workaround doesn't work with the version 3 SDK

however, the PutObjectCommand has a parameter called ContentLength and setting this seems to work OK - I can stream data from another HTTP API directly into s3 without having to load the whole file into memory or write to disk

timrobinson33 avatar Jun 07 '23 14:06 timrobinson33

it seems this workaround doesn't work with the version 3 SDK

however, the PutObjectCommand has a parameter called ContentLength and setting this seems to work OK - I can stream data from another HTTP API directly into s3 without having to load the whole file into memory or write to disk

Thanks for this tip. For those working with slices of a file that they are chunking themselves, the size of a Blob can be read with the .size property, and you can pass that directly in to that ContentLength field to fix this error.

dylantf avatar Jul 04 '23 01:07 dylantf

If one tries to set a files content to empty string (I.E. ""), then S3.putObject or PutObjectCommand (v3) can't "determine length". Actually it can but I think your if-comparison need to check against "undefined" or something else and not do a "falsy" comparison. Cheers

hbi99 avatar Nov 27 '23 13:11 hbi99