aws-sdk-js-v3
aws-sdk-js-v3 copied to clipboard
S3.GetObject no longer returns the result as a string
Describe the bug
I'm using the GetObjectCommand
with an S3Client
to pull a file down from S3. In v2 of the SDK I can write response.Body.toString('utf-8')
to turn the response into a string. In v3 of the SDK response.Body
is a complex object that does not seem to expose the result of reading from the socket.
It's not clear if the SDK's current behaviour is intentional, but the change in behaviour since v2 is significant and undocumented.
SDK version number 3.1.0
Is the issue in the browser/Node.js/ReactNative? Node.js
Details of the browser/Node.js/ReactNative version
v12.18.0
To Reproduce (observed behavior)
import { GetObjectCommand, S3Client } from '@aws-sdk/client-s3';
export async function getFile() {
const client = new S3Client({ region: 'eu-west-1' });
const cmd = new GetObjectCommand({
Bucket: 'my-bucket',
Key: '/readme.txt',
});
const data = await client.send(cmd);
console.log(data.Body.toString('utf-8'));
}
Expected behavior It should print the text of the file.
Additional context
data.Body
is a complex object with circular references. Object.keys(data.Body)
returns the following:
[
"_readableState",
"readable",
"_events",
"_eventsCount",
"_maxListeners",
"socket",
"connection",
"httpVersionMajor",
"httpVersionMinor",
"httpVersion",
"complete",
"headers",
"rawHeaders",
"trailers",
"rawTrailers",
"aborted",
"upgrade",
"url",
"method",
"statusCode",
"statusMessage",
"client",
"_consuming",
"_dumped",
"req"
]
This happens as data.Body
is now of type Readable | ReadableStream | Blob
https://github.com/aws/aws-sdk-js-v3/blob/25cb359e69966c549eb505956c2aeee809819245/clients/client-s3/models/models_0.ts#L6560
For your specific example, you can write a streamToString function to convert ReadableStream to a string.
const { S3Client, GetObjectCommand } = require("@aws-sdk/client-s3");
(async () => {
const region = "us-west-2";
const client = new S3Client({ region });
const streamToString = (stream) =>
new Promise((resolve, reject) => {
const chunks = [];
stream.on("data", (chunk) => chunks.push(chunk));
stream.on("error", reject);
stream.on("end", () => resolve(Buffer.concat(chunks).toString("utf8")));
});
const command = new GetObjectCommand({
Bucket: "test-aws-sdk-js-1877",
Key: "readme.txt",
});
const { Body } = await client.send(command);
const bodyContents = await streamToString(Body);
console.log(bodyContents);
})();
@igilham Does this resolve your query?
Thanks, @trivikr. This works in my application but raises a few concerns about the library that are worth sharing:
- There is no documentation for clients and the GetObjectCommand is not documented in the user guide or sample code. The project Readme file implies I could expect the same behaviour as SDKv2.
- My IDE can't tell me what the type of
response.Body
is. It tells me that it'sany
. Perhaps the library configuration could be improved to export the correct type information. - It's nice to have options for data processing, but I shouldn't be forced to write boilerplate I/O code for the most common use case.
- As noted below, I can't find an export of
ReadableStream
andBlob
so it appears to be impossible to make this code type-safe.
For reference, I've rewritten the streamToString
with the missing types added back in to comply with my team's linter settings.
import { Readable } from 'stream';
// Apparently the stream parameter should be of type Readable|ReadableStream|Blob
// The latter 2 don't seem to exist anywhere.
async function streamToString (stream: Readable): Promise<string> {
return await new Promise((resolve, reject) => {
const chunks: Uint8Array[] = [];
stream.on('data', (chunk) => chunks.push(chunk));
stream.on('error', reject);
stream.on('end', () => resolve(Buffer.concat(chunks).toString('utf-8')));
});
}
There is no documentation for clients and the GetObjectCommand is not documented in the user guide or sample code. The project Readme file implies I could expect the same behaviour as SDKv2.
Documentation for getObject operation lists that GetObjectOutput.Body
is Readable | ReadableStream | Blob
API Reference: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/classes/s3.html#getobject
Screenshot
My IDE can't tell me what the type of response.Body is. It tells me that it's any. Perhaps the library configuration could be improved to export the correct type information.
I'm using Visual Studio Code, and it shows type of response.Body
as internal.Readable | ReadableStream<any> | Blob
on hover.
Please create a new issue with details of your IDE and code if problem persists.
Screenshot
- As noted below, I can't find an export of
ReadableStream
andBlob
so it appears to be impossible to make this code type-safe.For reference, I've rewritten the
streamToString
with the missing types added back in to comply with my team's linter settings.import { Readable } from 'stream'; // Apparently the stream parameter should be of type Readable|ReadableStream|Blob // The latter 2 don't seem to exist anywhere. async function streamToString (stream: Readable): Promise<string> { return await new Promise((resolve, reject) => { const chunks: Uint8Array[] = []; stream.on('data', (chunk) => chunks.push(chunk)); stream.on('error', reject); stream.on('end', () => resolve(Buffer.concat(chunks).toString('utf-8'))); }); }
As this code is run on Node.js, you can pass Body as Readable
as follows:
const bodyContents = await streamToString(Body as Readable);
Thanks for following up.
I didn't realise the methods and types were documented. I took the description on the client landing page (go to the README) to mean it was a dead-end. Perhaps improving the wording should be a separate issue.
I can't explain the IDE issue. I'm also on VSCode and it says it's an any
. I find the IDE quite unstable though, so maybe it's just me.
I didn't realise the methods and types were documented. I took the description on the client landing page (go to the README) to mean it was a dead-end. Perhaps improving the wording should be a separate issue.
I've created documentation update request at https://github.com/aws/aws-sdk-js-v3/issues/1878
Thanks. I think that covers my remaining frustrations. I appreciate that it can take time for documentation elsewhere to catch up when a major version is released.
The codesnippet works in Node.js environment, in the browser, you would have a ReadableStream instead of Readable.
Here is my implementation of handling the ReadableStream:
const streamToString = (stream) => {
return new Promise((resolve, reject) => {
if (stream instanceof ReadableStream === false) {
reject(
"Expected stream to be instance of ReadableStream, but got " +
typeof stream
);
}
let text = "";
const decoder = new TextDecoder("utf-8");
const reader = stream.getReader();
const processRead = ({ done, value }) => {
if (done) {
// resolve promise with chunks
console.log("done");
// resolve(Buffer.concat(chunks).toString("utf8"));
resolve(text);
return;
}
text += decoder.decode(value);
// Not done, keep reading
reader.read().then(processRead);
};
// start read
reader.read().then(processRead);
});
};
@trivikr Thanks for that link to the docs! I didn't even know they existed till just now.
The codesnippet works in Node.js environment, in the browser, you would have a ReadableStream instead of Readable.
Here is my implementation of handling the ReadableStream:
const streamToString = (stream) => { return new Promise((resolve, reject) => { if (stream instanceof ReadableStream === false) { reject( "Expected stream to be instance of ReadableStream, but got " + typeof stream ); } let text = ""; const decoder = new TextDecoder("utf-8"); const reader = stream.getReader(); const processRead = ({ done, value }) => { if (done) { // resolve promise with chunks console.log("done"); // resolve(Buffer.concat(chunks).toString("utf8")); resolve(text); return; } text += decoder.decode(value); // Not done, keep reading reader.read().then(processRead); }; // start read reader.read().then(processRead); }); };
I also wasted lots of time on GetObject
and the trifecta of its types. Also, the fact that ReadableStream | Blob
is only Browser, and Readable
only Node made it extremely annoying :)
The streamToString
solution posted above works for Node.
For the browser, I found that using the Response
object from fetch
seems a shorter solution:
new Response(response!.body, {});
This will return a Response
object which will then allow us to use any of the helper methods it has to convert to String, Buffer, Json, etc. See more at https://developer.mozilla.org/en-US/docs/Web/API/Response#methods.
Full example:
const s3 = new S3({
region: "us-east-1",
credentials: {
accessKeyId: "replace-it",
secretAccessKey: "replace-it",
},
});
const resp = await s3.getObject({
Bucket: "your-bucket",
Key: "your-object-key",
});
console.info(await new Response(resp.Body, {}).text())
It's quite unfortunate that everybody has to go through these hoops to get the content out of the response though. Especially considering that we have to do type checking with things like if (resp.Body instanceof Readable)
, or declare special interfaces to avoid differences between browser/Node.
The codesnippet works in Node.js environment, in the browser, you would have a ReadableStream instead of Readable. Here is my implementation of handling the ReadableStream:
const streamToString = (stream) => { return new Promise((resolve, reject) => { if (stream instanceof ReadableStream === false) { reject( "Expected stream to be instance of ReadableStream, but got " + typeof stream ); } let text = ""; const decoder = new TextDecoder("utf-8"); const reader = stream.getReader(); const processRead = ({ done, value }) => { if (done) { // resolve promise with chunks console.log("done"); // resolve(Buffer.concat(chunks).toString("utf8")); resolve(text); return; } text += decoder.decode(value); // Not done, keep reading reader.read().then(processRead); }; // start read reader.read().then(processRead); }); };
I also wasted lots of time on
GetObject
and the trifecta of its types. Also, the fact thatReadableStream | Blob
is only Browser, andReadable
only Node made it extremely annoying :)The
streamToString
solution posted above works for Node. For the browser, I found that using theResponse
object fromfetch
seems a shorter solution:new Response(response!.body, {});
This will return a
Response
object which will then allow us to use any of the helper methods it has to convert to String, Buffer, Json, etc. See more at https://developer.mozilla.org/en-US/docs/Web/API/Response#methods.Full example:
const s3 = new S3({ region: "us-east-1", credentials: { accessKeyId: "replace-it", secretAccessKey: "replace-it", }, }); const resp = await s3.getObject({ Bucket: "your-bucket", Key: "your-object-key", }); console.info(await new Response(resp.Body, {}).text())
It's quite unfortunate that everybody has to go through these hoops to get the content out of the response though. Especially considering that we have to do type checking with things like
if (resp.Body instanceof Readable)
, or declare special interfaces to avoid differences between browser/Node.
the use of Response looks as the neatest solution right now, for json and text payloads.
I've been running into these pain points as well, including Lambda invocation. The payload returned is now a Uint8Array, so it takes a few hoops to get it into a usable format:
const payload = JSON.parse(Buffer.from(data.Payload).toString());
Whereas in the previous JS SDK, it was simply:
const payload = JSON.parse(data.Payload);
I don't understand this new direction with the SDK. I can't say I'm a fan. Maybe @trivikr can weigh in.
Reopening as lot of customers have raised questions. Tagging @AllanZhengYP for comment.
export interface GetObjectOutput {
/**
* <p>Object data.</p>
*/
Body?: Readable | ReadableStream | Blob;
// ... snip
}
This is throwing an error in a NodeJS app, because TS config does not load DOM libs.
This results in the Body
being set to any
.
I'm also very confused about how to read S3 Body responses with SDK v3. The SDK documentation for GetObjectCommand does not describe how to do it, and the SDK examples are also missing it (https://github.com/awsdocs/aws-doc-sdk-examples/issues/1677).
I would ask the AWS SDK team to include in the SDK a simple way to read S3 Body responses. We don't want to re-implement complicated event handlers and helper functions for this simple purpose every time we use GetObject in a project.
In v2 we could just say something like JSON.parse(response.Body?.toString())
. Please make it as simple in v3. Stream-based processing is also important, but it should be only an alternative for the simple case for parsing small JSON objects.
For reference, I was able to do this in Node.js by utilizing node-fetch. I would like something like this be included in AWS SDK.
npm install node-fetch
npm install --save-dev @types/node-fetch
import { Response } from 'node-fetch'
const response = new Response(s3Response.Body)
const data = await response.json()
A one-line alternative is to use get-stream package, as posted here: https://github.com/aws/aws-sdk-js-v3/issues/1096#issuecomment-616743375
I understand the reason for returning a ReadableStream, but a built-in helper method would be nice. Reading the whole body into string in memory is probably good for 99% of cases.
If some helper method would be a part of the SDK we could just call it as readStream(response.Body)
and everyone would be happy not having to add another dependency or 10 lines of boilerplate code to every new project.
I've been running into these pain points as well, including Lambda invocation. The payload returned is now a Uint8Array, so it takes a few hoops to get it into a usable format:
const payload = JSON.parse(Buffer.from(data.Payload).toString());
Whereas in the previous JS SDK, it was simply:
const payload = JSON.parse(data.Payload);
I don't understand this new direction with the SDK. I can't say I'm a fan. Maybe @trivikr can weigh in.
This was by far the easiest solution with the least overhead. Worked for me at least! Much appreciated!
I've been running into these pain points as well, including Lambda invocation. The payload returned is now a Uint8Array, so it takes a few hoops to get it into a usable format:
const payload = JSON.parse(Buffer.from(data.Payload).toString());
Whereas in the previous JS SDK, it was simply:
const payload = JSON.parse(data.Payload);
I don't understand this new direction with the SDK. I can't say I'm a fan. Maybe @trivikr can weigh in.
This was by far the easiest solution with the least overhead. Worked for me at least! Much appreciated!
@sinsys where do you get data.Payload
from? I do not see that in the GetObject model.
I only see data.Body
.
@porteron That was my code snippet - apologies for the confusion. In my comment, I was calling out Lambda, which uses data.Payload
to return a response.
@porteron That was my code snippet - apologies for the confusion. In my comment, I was calling out Lambda, which uses
data.Payload
to return a response.
@ffxsam I attempted your approach using Buffer.from
but for some reason data.Body
is not being recognized as a Buffer. It's just and instance of IncomingMessage
.
const payload = JSON.parse(Buffer.from(data.Body).toString());
Leads to the following error:
TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received an instance of IncomingMessage
Some others are having this same issue. e.g.) https://github.com/aws/aws-sdk-js/issues/3649
Going to try out this approach: https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-764404713
@porteron This might help too. This is what I use in my code:
const data = await S3Object.s3.getObject({
Bucket: this.bucket,
Key: this.key,
});
return assembleStream(data.Body);
export async function assembleStream(
stream: Readable,
options: AssembleStreamOptions = {}
): Promise<string | Buffer> {
return new Promise((resolve, reject) => {
const chunks: Uint8Array[] = [];
stream.on('data', chunk => chunks.push(chunk));
stream.on('error', reject);
stream.on('end', () => {
const result = Buffer.concat(chunks);
resolve(options.string ? result.toString('utf-8') : result);
});
});
}
If you find a method that works that's shorter than mine, let me know!
If you're ok with using a (very popular) 3rd party library, this is shorter - https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-799697205
@ohana54 Ahhh yes, thank you for that!
Hard to understand why getting a string from a simple storage service is unnecessary complicated.
@HerrSchwarz I imagine it was for flexibility, so the dev can decide if they want to stream data into another process or just grab a string directly. This saves having to come up with multiple API functions (getObject, getStream).
But I agree, it is a bit of a pain. There might've been a simpler way to accomplish this and have better DX (developer experience).
I encountered this issue too, in a nodejs application. Since typescript won't compile due to types not found (ReadableStream
and Blob
here, but also File
from dynamodb native attribute types) complaining with error TS2304: Cannot find name 'ReadableStream'.
, what should be the workaround/definitive fix for a nodejs only application?
I can solve the compiling issue by adding "DOM" to my lib
compiler options, but I would rather not if possible.
@kristiannotari rather than adding "DOM" to tsconfig, you should use one of the solutions suggested above - they require additional code, but work in Node+TS.
You can use assembleStream
function from @ffxsam comment or get-stream package I posted.
@kristiannotari rather than adding "DOM" to tsconfig, you should use one of the solutions suggested above - they require additional code, but work in Node+TS.
You can use
assembleStream
function from @ffxsam comment or get-stream package I posted.
Thanks, but that's not the issue I'm facing. I correctly fixed my s3 output Body stream retrieval using get-stream
to convert it to a Buffer. The problem I have now is that I have the typescript compiler set to check definition files for libraries. This create the errors I cited above, because, rightly so, it doesn't find ReadableStream
and Blob
types, among others (see File
from dynamodb native attribute types). I don't know how to manage a scenario like this one where a library supports multiple environments but I only need type definitions for nodejs env. Obviously, this was not an issue with aws sdk v2.
I just noticed that a lot of the comments here are also covered in the v3 documentation, see section "Getting a file from an Amazon S3 bucket".
That is awesome @berenddeboer, it has been added recently to the documentation. For me, the biggest problem is that I can't be sure my own ad-hoc stream implementations are 100% correct and error-resilient in every situation. When AWS provides the implementation, I will trust it.