aws-sdk-js
aws-sdk-js copied to clipboard
S3.putObject only accepts streams that it can determine the length of
Is your feature request related to a problem? Please describe.
According to https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#putObject-property the Body
element can be a ReadableStream
, however in practice, it will only succeed if the sdk can determine the length (see #2661 or https://github.com/aws/aws-sdk-js/blob/master/lib/event_listeners.js#L167).
Looking at https://github.com/aws/aws-sdk-js/blob/master/lib/util.js#L198 a stream will only work if there is a path. This means that only things like fs.createReadStream
will work. If the stream is transformed in any way, it will no longer work.
e.g.
Body = fs.createReadStream('./someFile').pipe(someTransform)
s3.putObject({ Bucket, Key, Body }).promise().then(console.log)
Error: Cannot determine length of [object Object]
at Object.byteLength (aws-sdk/lib/util.js:200:26)
at Request.SET_CONTENT_LENGTH (aws-sdk/lib/event_listeners.js:163:40)
at Request.callListeners (aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (aws-sdk/lib/request.js:683:14)
at Request.transition (aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (aws-sdk/lib/state_machine.js:14:12)
at aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (aws-sdk/lib/request.js:685:12)
Describe the solution you'd like
Update the documentation to more clearly identify which streams will work,
and point users to S3.upload
Describe alternatives you've considered
A caller could include the content length, but I think that S3.upload
is just a better answer.
@seebees I reached out to the respective service teams, will update here once I hear back from them.
Agreed. I have the case where I'm getting the stream from a Request
body (from a graphQL) API.
I have to first read the stream into a Buffer
to then be able to invoke putObject
This is quite disturbing, and as it is undocumented, it actually through me off for a few hours before I understood what was wrong
By the way, I think it is the same as #2442
This is still an issue...
EDIT - I POSTED UPDATED CODE IN A MESSAGE BELOW
I was having this issue using node-fetch
I got it to work by reading the stream into a Buffer
like what @gbataille said.
const res = await fetch(url)
const buffer = await res.buffer()
s3.putObject(
{ ACL: 'public-read', Body: buffer, Bucket: 'test', Key: 'fileName' },
(err, data) => (err ? reject(err) : resolve(data))
);
EDIT - I POSTED UPDATED CODE IN A MESSAGE BELOW
@amouly @RusseII @gbataille https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html#upload-property should do what you want.
Under it all S3 must know the size of the object, but upload will intelligently chunk the message into S3 for you.
@RusseII I quickly hit memory issues with reading everything in a Buffer
:D
But then indeed, as @seebees mentions, the upload
method seems to be of higher level and it works with any kind of stream (it seems)
I don't quite know why those are different. I think putObject
simply exposes the Web API while the S3
Service adds some helper methods...
upload
wraps the multi part upload.
putObject
is just an S3 put, so it requires knowing the exact size.
You can use upload
to manipulate the partSize
per the documentation linked above.
Thanks for info @seebees & @gbataille!!!
I fixed it to use the s3 upload method
import fetch from 'node-fetch';
const res = await fetch(url)
const stream = res.body
s3.upload(
{ ACL: 'public-read', Body: buffer, Bucket: 'test', Key: 'fileName' },
(err, data) => (err ? reject(err) : resolve(data))
);
Is the new code :)
I believe upload()
only exists in the SDK v2 which is being deprecated, can anyone confirm if this the case, and whether there is a v3 equivalent?
In the meantime, I used the following solution (inspired by this StackOverflow post) to turn a Readable
stream into a Buffer
:
import {Readable} from 'stream'
export async function streamToBuffer(stream: Readable): Promise<Buffer> {
const chunks = []
return new Promise((resolve, reject) => {
stream.on('data', (chunk) => chunks.push(Buffer.from(chunk)))
stream.on('error', (err) => reject(err))
stream.on('end', () => resolve(Buffer.concat(chunks)))
})
}
In the meantime, I used the following solution (inspired by this StackOverflow post) to turn a
Readable
stream into aBuffer
:import {Readable} from 'stream' export async function streamToBuffer(stream: Readable): Promise<Buffer> { const chunks = [] return new Promise((resolve, reject) => { stream.on('data', (chunk) => chunks.push(Buffer.from(chunk))) stream.on('error', (err) => reject(err)) stream.on('end', () => resolve(Buffer.concat(chunks))) }) }
https://aws.amazon.com/blogs/developer/modular-packages-in-aws-sdk-for-javascript/ This seems to be the answer.
Thanks for the pointer @sPaCeMoNk3yIam. From the blog post, this is the absolute way to go for uploading from a stream. Worked like a charm. Watch your mime types, however. Also, you might have to wrap the stream if the underlying code doesn't recognize it. I'm spooling files from a tar'd gzip'd file, and the S3 code didn't recognize the stream as something it could use. I wrapped it in a passthrough stream.
ETA (05 JUL 2022): In case it's not apparent from the code or my above comments, the code below uses a stream during multipart upload. The length of the file need not be known ahead of time. No complete reading of the stream - just plain old streaming.
import { S3Client } from "@aws-sdk/client-s3";
import { Upload } from "@aws-sdk/lib-storage";
const { PassThrough } = require('stream');
const mime = require('mime-types');
//... entry is a stream
const input = {
ACL: "public-read", // ACL not needed if CloudFront can pull via OAI
Bucket: bucketName,
Key: outputPath + entry.path,
Body: entry.pipe(new PassThrough()),
ContentType: mime.lookup(entry.path),
}
try {
const multipartUpload = new Upload({
client: new S3Client({}),
params: input,
});
console.log("Upload file to:", `s3://${input.Bucket}/${input.Key}`);
await multipartUpload.done();
} catch (err) {
console.error(err);
}
I just wanted to say thank you for your comments @sPaCeMoNk3yIam and @codeedog ! I really can't believe that something as explicitly required as this, is missing from the guide and SDK API reference docs... You can't assume that every stream used will be a file stream or with a known size.... That's the point of streams! Anyway... thanks again!
If you know the size, you can explicitly set the length attribute
stream['length'] = size
For larger files, this is better than converting to a buffer because it streams the data.
This simple hack worked for me.
Dear amazonians, any ETA for this ? seriously ..
Should I move to GCP ? or R2 ? :D
stream['length'] = size
it seems this workaround doesn't work with the version 3 SDK
however, the PutObjectCommand
has a parameter called ContentLength
and setting this seems to work OK - I can stream data from another HTTP API directly into s3 without having to load the whole file into memory or write to disk
it seems this workaround doesn't work with the version 3 SDK
however, the
PutObjectCommand
has a parameter calledContentLength
and setting this seems to work OK - I can stream data from another HTTP API directly into s3 without having to load the whole file into memory or write to disk
Thanks for this tip. For those working with slices of a file that they are chunking themselves, the size of a Blob
can be read with the .size
property, and you can pass that directly in to that ContentLength
field to fix this error.
If one tries to set a files content to empty string (I.E. ""), then S3.putObject or PutObjectCommand (v3) can't "determine length". Actually it can but I think your if-comparison need to check against "undefined" or something else and not do a "falsy" comparison. Cheers