aws-sdk-js-v3 icon indicating copy to clipboard operation
aws-sdk-js-v3 copied to clipboard

S3.GetObject no longer returns the result as a string

Open igilham opened this issue 3 years ago • 86 comments

Describe the bug I'm using the GetObjectCommand with an S3Client to pull a file down from S3. In v2 of the SDK I can write response.Body.toString('utf-8') to turn the response into a string. In v3 of the SDK response.Body is a complex object that does not seem to expose the result of reading from the socket.

It's not clear if the SDK's current behaviour is intentional, but the change in behaviour since v2 is significant and undocumented.

SDK version number 3.1.0

Is the issue in the browser/Node.js/ReactNative? Node.js

Details of the browser/Node.js/ReactNative version v12.18.0

To Reproduce (observed behavior)

import { GetObjectCommand, S3Client } from '@aws-sdk/client-s3';

export async function getFile() {
  const client = new S3Client({ region: 'eu-west-1' });
  const cmd = new GetObjectCommand({
    Bucket: 'my-bucket',
    Key: '/readme.txt',
  });
  const data = await client.send(cmd);

  console.log(data.Body.toString('utf-8'));
}

Expected behavior It should print the text of the file.

Additional context

data.Body is a complex object with circular references. Object.keys(data.Body) returns the following:

[
  "_readableState",
  "readable",
  "_events",
  "_eventsCount",
  "_maxListeners",
  "socket",
  "connection",
  "httpVersionMajor",
  "httpVersionMinor",
  "httpVersion",
  "complete",
  "headers",
  "rawHeaders",
  "trailers",
  "rawTrailers",
  "aborted",
  "upgrade",
  "url",
  "method",
  "statusCode",
  "statusMessage",
  "client",
  "_consuming",
  "_dumped",
  "req"
]

igilham avatar Jan 06 '21 09:01 igilham

This happens as data.Body is now of type Readable | ReadableStream | Blob https://github.com/aws/aws-sdk-js-v3/blob/25cb359e69966c549eb505956c2aeee809819245/clients/client-s3/models/models_0.ts#L6560

For your specific example, you can write a streamToString function to convert ReadableStream to a string.

const { S3Client, GetObjectCommand } = require("@aws-sdk/client-s3");

(async () => {
  const region = "us-west-2";
  const client = new S3Client({ region });

  const streamToString = (stream) =>
    new Promise((resolve, reject) => {
      const chunks = [];
      stream.on("data", (chunk) => chunks.push(chunk));
      stream.on("error", reject);
      stream.on("end", () => resolve(Buffer.concat(chunks).toString("utf8")));
    });

  const command = new GetObjectCommand({
    Bucket: "test-aws-sdk-js-1877",
    Key: "readme.txt",
  });

  const { Body } = await client.send(command);
  const bodyContents = await streamToString(Body);
  console.log(bodyContents);
})();

@igilham Does this resolve your query?

trivikr avatar Jan 06 '21 16:01 trivikr

Thanks, @trivikr. This works in my application but raises a few concerns about the library that are worth sharing:

  • There is no documentation for clients and the GetObjectCommand is not documented in the user guide or sample code. The project Readme file implies I could expect the same behaviour as SDKv2.
  • My IDE can't tell me what the type of response.Body is. It tells me that it's any. Perhaps the library configuration could be improved to export the correct type information.
  • It's nice to have options for data processing, but I shouldn't be forced to write boilerplate I/O code for the most common use case.
  • As noted below, I can't find an export of ReadableStream and Blob so it appears to be impossible to make this code type-safe.

For reference, I've rewritten the streamToString with the missing types added back in to comply with my team's linter settings.

import { Readable } from 'stream';

// Apparently the stream parameter should be of type Readable|ReadableStream|Blob
// The latter 2 don't seem to exist anywhere.
async function streamToString (stream: Readable): Promise<string> {
  return await new Promise((resolve, reject) => {
    const chunks: Uint8Array[] = [];
    stream.on('data', (chunk) => chunks.push(chunk));
    stream.on('error', reject);
    stream.on('end', () => resolve(Buffer.concat(chunks).toString('utf-8')));
  });
}

igilham avatar Jan 06 '21 17:01 igilham

There is no documentation for clients and the GetObjectCommand is not documented in the user guide or sample code. The project Readme file implies I could expect the same behaviour as SDKv2.

Documentation for getObject operation lists that GetObjectOutput.Body is Readable | ReadableStream | Blob API Reference: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/classes/s3.html#getobject

Screenshot

Screen Shot 2021-01-06 at 9 19 12 AM

My IDE can't tell me what the type of response.Body is. It tells me that it's any. Perhaps the library configuration could be improved to export the correct type information.

I'm using Visual Studio Code, and it shows type of response.Body as internal.Readable | ReadableStream<any> | Blob on hover.

Please create a new issue with details of your IDE and code if problem persists.

Screenshot

Screen Shot 2021-01-06 at 9 13 01 AM

trivikr avatar Jan 06 '21 17:01 trivikr

  • As noted below, I can't find an export of ReadableStream and Blob so it appears to be impossible to make this code type-safe.

For reference, I've rewritten the streamToString with the missing types added back in to comply with my team's linter settings.

import { Readable } from 'stream';

// Apparently the stream parameter should be of type Readable|ReadableStream|Blob
// The latter 2 don't seem to exist anywhere.
async function streamToString (stream: Readable): Promise<string> {
  return await new Promise((resolve, reject) => {
    const chunks: Uint8Array[] = [];
    stream.on('data', (chunk) => chunks.push(chunk));
    stream.on('error', reject);
    stream.on('end', () => resolve(Buffer.concat(chunks).toString('utf-8')));
  });
}

As this code is run on Node.js, you can pass Body as Readable as follows:

const bodyContents = await streamToString(Body as Readable);

trivikr avatar Jan 06 '21 17:01 trivikr

Thanks for following up.

I didn't realise the methods and types were documented. I took the description on the client landing page (go to the README) to mean it was a dead-end. Perhaps improving the wording should be a separate issue.

I can't explain the IDE issue. I'm also on VSCode and it says it's an any. I find the IDE quite unstable though, so maybe it's just me.

image

igilham avatar Jan 06 '21 17:01 igilham

I didn't realise the methods and types were documented. I took the description on the client landing page (go to the README) to mean it was a dead-end. Perhaps improving the wording should be a separate issue.

I've created documentation update request at https://github.com/aws/aws-sdk-js-v3/issues/1878

trivikr avatar Jan 06 '21 17:01 trivikr

Thanks. I think that covers my remaining frustrations. I appreciate that it can take time for documentation elsewhere to catch up when a major version is released.

igilham avatar Jan 07 '21 09:01 igilham

The codesnippet works in Node.js environment, in the browser, you would have a ReadableStream instead of Readable.

Here is my implementation of handling the ReadableStream:

const streamToString = (stream) => {
  return new Promise((resolve, reject) => {
    if (stream instanceof ReadableStream === false) {
      reject(
        "Expected stream to be instance of ReadableStream, but got " +
          typeof stream
      );
    }
    let text = "";
    const decoder = new TextDecoder("utf-8");

    const reader = stream.getReader();
    const processRead = ({ done, value }) => {
      if (done) {
        // resolve promise with chunks
        console.log("done");
        // resolve(Buffer.concat(chunks).toString("utf8"));
        resolve(text);
        return;
      }

      text += decoder.decode(value);

      // Not done, keep reading
      reader.read().then(processRead);
    };

    // start read
    reader.read().then(processRead);
  });
};

leimd avatar Jan 21 '21 06:01 leimd

@trivikr Thanks for that link to the docs! I didn't even know they existed till just now.

ffxsam avatar Jan 28 '21 23:01 ffxsam

The codesnippet works in Node.js environment, in the browser, you would have a ReadableStream instead of Readable.

Here is my implementation of handling the ReadableStream:

const streamToString = (stream) => {
  return new Promise((resolve, reject) => {
    if (stream instanceof ReadableStream === false) {
      reject(
        "Expected stream to be instance of ReadableStream, but got " +
          typeof stream
      );
    }
    let text = "";
    const decoder = new TextDecoder("utf-8");

    const reader = stream.getReader();
    const processRead = ({ done, value }) => {
      if (done) {
        // resolve promise with chunks
        console.log("done");
        // resolve(Buffer.concat(chunks).toString("utf8"));
        resolve(text);
        return;
      }

      text += decoder.decode(value);

      // Not done, keep reading
      reader.read().then(processRead);
    };

    // start read
    reader.read().then(processRead);
  });
};

I also wasted lots of time on GetObject and the trifecta of its types. Also, the fact that ReadableStream | Blob is only Browser, and Readable only Node made it extremely annoying :)

The streamToString solution posted above works for Node. For the browser, I found that using the Response object from fetch seems a shorter solution:

new Response(response!.body, {});

This will return a Response object which will then allow us to use any of the helper methods it has to convert to String, Buffer, Json, etc. See more at https://developer.mozilla.org/en-US/docs/Web/API/Response#methods.

Full example:

const s3 = new S3({
  region: "us-east-1",
  credentials: {
    accessKeyId: "replace-it",
    secretAccessKey: "replace-it",
  },
});
const resp = await s3.getObject({
  Bucket: "your-bucket",
  Key: "your-object-key",
});
console.info(await new Response(resp.Body, {}).text())

It's quite unfortunate that everybody has to go through these hoops to get the content out of the response though. Especially considering that we have to do type checking with things like if (resp.Body instanceof Readable), or declare special interfaces to avoid differences between browser/Node.

lambrospetrou avatar Feb 09 '21 19:02 lambrospetrou

The codesnippet works in Node.js environment, in the browser, you would have a ReadableStream instead of Readable. Here is my implementation of handling the ReadableStream:

const streamToString = (stream) => {
  return new Promise((resolve, reject) => {
    if (stream instanceof ReadableStream === false) {
      reject(
        "Expected stream to be instance of ReadableStream, but got " +
          typeof stream
      );
    }
    let text = "";
    const decoder = new TextDecoder("utf-8");

    const reader = stream.getReader();
    const processRead = ({ done, value }) => {
      if (done) {
        // resolve promise with chunks
        console.log("done");
        // resolve(Buffer.concat(chunks).toString("utf8"));
        resolve(text);
        return;
      }

      text += decoder.decode(value);

      // Not done, keep reading
      reader.read().then(processRead);
    };

    // start read
    reader.read().then(processRead);
  });
};

I also wasted lots of time on GetObject and the trifecta of its types. Also, the fact that ReadableStream | Blob is only Browser, and Readable only Node made it extremely annoying :)

The streamToString solution posted above works for Node. For the browser, I found that using the Response object from fetch seems a shorter solution:

new Response(response!.body, {});

This will return a Response object which will then allow us to use any of the helper methods it has to convert to String, Buffer, Json, etc. See more at https://developer.mozilla.org/en-US/docs/Web/API/Response#methods.

Full example:

const s3 = new S3({
  region: "us-east-1",
  credentials: {
    accessKeyId: "replace-it",
    secretAccessKey: "replace-it",
  },
});
const resp = await s3.getObject({
  Bucket: "your-bucket",
  Key: "your-object-key",
});
console.info(await new Response(resp.Body, {}).text())

It's quite unfortunate that everybody has to go through these hoops to get the content out of the response though. Especially considering that we have to do type checking with things like if (resp.Body instanceof Readable), or declare special interfaces to avoid differences between browser/Node.

the use of Response looks as the neatest solution right now, for json and text payloads.

smiccoli avatar Feb 10 '21 12:02 smiccoli

I've been running into these pain points as well, including Lambda invocation. The payload returned is now a Uint8Array, so it takes a few hoops to get it into a usable format:

const payload = JSON.parse(Buffer.from(data.Payload).toString());

Whereas in the previous JS SDK, it was simply:

const payload = JSON.parse(data.Payload);

I don't understand this new direction with the SDK. I can't say I'm a fan. Maybe @trivikr can weigh in.

ffxsam avatar Feb 10 '21 15:02 ffxsam

Reopening as lot of customers have raised questions. Tagging @AllanZhengYP for comment.

trivikr avatar Feb 10 '21 16:02 trivikr

export interface GetObjectOutput {
    /**
     * <p>Object data.</p>
     */
    Body?: Readable | ReadableStream | Blob;

  // ... snip
}

image


This is throwing an error in a NodeJS app, because TS config does not load DOM libs.

This results in the Body being set to any.

image

moltar avatar Mar 06 '21 16:03 moltar

I'm also very confused about how to read S3 Body responses with SDK v3. The SDK documentation for GetObjectCommand does not describe how to do it, and the SDK examples are also missing it (https://github.com/awsdocs/aws-doc-sdk-examples/issues/1677).

I would ask the AWS SDK team to include in the SDK a simple way to read S3 Body responses. We don't want to re-implement complicated event handlers and helper functions for this simple purpose every time we use GetObject in a project.

In v2 we could just say something like JSON.parse(response.Body?.toString()). Please make it as simple in v3. Stream-based processing is also important, but it should be only an alternative for the simple case for parsing small JSON objects.

For reference, I was able to do this in Node.js by utilizing node-fetch. I would like something like this be included in AWS SDK.

npm install node-fetch
npm install --save-dev @types/node-fetch
import { Response } from 'node-fetch'

const response = new Response(s3Response.Body)
const data = await response.json()

kennu avatar Mar 08 '21 19:03 kennu

A one-line alternative is to use get-stream package, as posted here: https://github.com/aws/aws-sdk-js-v3/issues/1096#issuecomment-616743375

I understand the reason for returning a ReadableStream, but a built-in helper method would be nice. Reading the whole body into string in memory is probably good for 99% of cases.

If some helper method would be a part of the SDK we could just call it as readStream(response.Body) and everyone would be happy not having to add another dependency or 10 lines of boilerplate code to every new project.

m-radzikowski avatar Mar 15 '21 19:03 m-radzikowski

I've been running into these pain points as well, including Lambda invocation. The payload returned is now a Uint8Array, so it takes a few hoops to get it into a usable format:

const payload = JSON.parse(Buffer.from(data.Payload).toString());

Whereas in the previous JS SDK, it was simply:

const payload = JSON.parse(data.Payload);

I don't understand this new direction with the SDK. I can't say I'm a fan. Maybe @trivikr can weigh in.

This was by far the easiest solution with the least overhead. Worked for me at least! Much appreciated!

sinsys avatar Apr 20 '21 02:04 sinsys

I've been running into these pain points as well, including Lambda invocation. The payload returned is now a Uint8Array, so it takes a few hoops to get it into a usable format:

const payload = JSON.parse(Buffer.from(data.Payload).toString());

Whereas in the previous JS SDK, it was simply:

const payload = JSON.parse(data.Payload);

I don't understand this new direction with the SDK. I can't say I'm a fan. Maybe @trivikr can weigh in.

This was by far the easiest solution with the least overhead. Worked for me at least! Much appreciated!

@sinsys where do you get data.Payload from? I do not see that in the GetObject model.

I only see data.Body.

porteron avatar Apr 20 '21 16:04 porteron

@porteron That was my code snippet - apologies for the confusion. In my comment, I was calling out Lambda, which uses data.Payload to return a response.

ffxsam avatar Apr 20 '21 16:04 ffxsam

@porteron That was my code snippet - apologies for the confusion. In my comment, I was calling out Lambda, which uses data.Payload to return a response.

@ffxsam I attempted your approach using Buffer.from but for some reason data.Body is not being recognized as a Buffer. It's just and instance of IncomingMessage.

const payload = JSON.parse(Buffer.from(data.Body).toString());

Leads to the following error:

TypeError [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received an instance of IncomingMessage

Some others are having this same issue. e.g.) https://github.com/aws/aws-sdk-js/issues/3649

Going to try out this approach: https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-764404713

porteron avatar Apr 20 '21 16:04 porteron

@porteron This might help too. This is what I use in my code:

    const data = await S3Object.s3.getObject({
      Bucket: this.bucket,
      Key: this.key,
    });

    return assembleStream(data.Body);
export async function assembleStream(
  stream: Readable,
  options: AssembleStreamOptions = {}
): Promise<string | Buffer> {
  return new Promise((resolve, reject) => {
    const chunks: Uint8Array[] = [];
    stream.on('data', chunk => chunks.push(chunk));
    stream.on('error', reject);
    stream.on('end', () => {
      const result = Buffer.concat(chunks);

      resolve(options.string ? result.toString('utf-8') : result);
    });
  });
}

If you find a method that works that's shorter than mine, let me know!

ffxsam avatar Apr 20 '21 16:04 ffxsam

If you're ok with using a (very popular) 3rd party library, this is shorter - https://github.com/aws/aws-sdk-js-v3/issues/1877#issuecomment-799697205

ohana54 avatar Apr 20 '21 16:04 ohana54

@ohana54 Ahhh yes, thank you for that!

ffxsam avatar Apr 20 '21 16:04 ffxsam

Hard to understand why getting a string from a simple storage service is unnecessary complicated.

HerrSchwarz avatar May 03 '21 16:05 HerrSchwarz

@HerrSchwarz I imagine it was for flexibility, so the dev can decide if they want to stream data into another process or just grab a string directly. This saves having to come up with multiple API functions (getObject, getStream).

But I agree, it is a bit of a pain. There might've been a simpler way to accomplish this and have better DX (developer experience).

ffxsam avatar May 03 '21 16:05 ffxsam

I encountered this issue too, in a nodejs application. Since typescript won't compile due to types not found (ReadableStream and Blob here, but also File from dynamodb native attribute types) complaining with error TS2304: Cannot find name 'ReadableStream'., what should be the workaround/definitive fix for a nodejs only application?

I can solve the compiling issue by adding "DOM" to my lib compiler options, but I would rather not if possible.

kristiannotari avatar May 05 '21 13:05 kristiannotari

@kristiannotari rather than adding "DOM" to tsconfig, you should use one of the solutions suggested above - they require additional code, but work in Node+TS.

You can use assembleStream function from @ffxsam comment or get-stream package I posted.

m-radzikowski avatar May 05 '21 13:05 m-radzikowski

@kristiannotari rather than adding "DOM" to tsconfig, you should use one of the solutions suggested above - they require additional code, but work in Node+TS.

You can use assembleStream function from @ffxsam comment or get-stream package I posted.

Thanks, but that's not the issue I'm facing. I correctly fixed my s3 output Body stream retrieval using get-stream to convert it to a Buffer. The problem I have now is that I have the typescript compiler set to check definition files for libraries. This create the errors I cited above, because, rightly so, it doesn't find ReadableStream and Blob types, among others (see File from dynamodb native attribute types). I don't know how to manage a scenario like this one where a library supports multiple environments but I only need type definitions for nodejs env. Obviously, this was not an issue with aws sdk v2.

kristiannotari avatar May 05 '21 13:05 kristiannotari

I just noticed that a lot of the comments here are also covered in the v3 documentation, see section "Getting a file from an Amazon S3 bucket".

berenddeboer avatar May 10 '21 23:05 berenddeboer

That is awesome @berenddeboer, it has been added recently to the documentation. For me, the biggest problem is that I can't be sure my own ad-hoc stream implementations are 100% correct and error-resilient in every situation. When AWS provides the implementation, I will trust it.

kennu avatar May 11 '21 13:05 kennu