aws-sdk-js-v3 icon indicating copy to clipboard operation
aws-sdk-js-v3 copied to clipboard

The 'total' property is always undefined when uploading files larger than 5MB

Open ti2sugiyama opened this issue 2 years ago • 6 comments

Checkboxes for prior research

Describe the bug

When upload large file using stream by Upload module on webApp, httpUploadProgress event object property 'total' always undefined.

SDK version number

@aws-sdk/[email protected]

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v18.13.0

Reproduction Steps

  1. setup Node.js project
  2. install @aws-sdk/client-s3 and @aws-sdk/lib-storage
  3. prepare upload file (larger than 5 MB)
  4. run the following program
  5. may see ev.total always undefined
const fs = require("fs");
const stream = require("stream");
const S3Client = require("@aws-sdk/client-s3").S3Client;
const Upload = require("@aws-sdk/lib-storage").Upload;

const s3 = new S3Client({
  // your s3 settings
});

async function main() {
  // Use a file larger than 5MB.
  const fileStream = fs.createReadStream("/path/to/your/over5mb/file.dat");
  const passThrough = new stream.PassThrough();
  // Pipe PassThrough to remove fileStream.path
  // to emulate browser file upload.
  fileStream.pipe(passThrough);

  const upload = new Upload({
    client: s3,
    params: {
      Bucket: // your Bucket,
      Key: // your Key,
      Body: passThrough,
    },
  });

  upload.on("httpUploadProgress", function (ev) {
    // ev.total always undefined
    console.log(ev);
  });

  await upload.done();
}

main();

By debugging the code, no assignment for this.totalBytes( references of 'total') anywhere excepted in constructor and function for one chunk

Observed Behavior

httpUploadProgress event object property 'total' always undefined. (In sample code, console output of ev.total is undefined to the end )

Expected Behavior

finally set the correct total file size (In sample code, final console output of ev.total is the correct total file size)

Possible Solution

No response

Additional Information/Context

aws-sdk v2 was recalculating this.totalBytes on readable event https://github.com/aws/aws-sdk-js/blob/b168eaab7aa04f8fc300b38bf67f9c26bf02c28a/lib/s3/managed_upload.js#L188

ti2sugiyama avatar Feb 07 '23 06:02 ti2sugiyama

Hi @ti2sugiyama, thanks for opening this issue. I can confirm the reported behavior. I will investigate this further in order to define further steps on this.

Thanks!

yenfryherrerafeliz avatar Feb 10 '23 06:02 yenfryherrerafeliz

@ti2sugiyama what you say in the comment below is true:

By debugging the code, no assignment for this.totalBytes( references of 'total') anywhere excepted in constructor and function for one chunk

I feel this issue just happens when the body given as parameter does not have any of the properties evaluated in the conditions here. As of right now I will mark this issue to be reviewed so it can be further addressed.

Steps to reproduce:

  • Create an empty node project: npm init -y
  • Add the dependencies:
    • npm i @aws-sdk/client-s3
    • npm i @aws-sdk/lib-storage
  • Then, create an empty file. Ex: src/index.js
  • Copy and paste into the file created above, the following code:
import {S3Client} from "@aws-sdk/client-s3";
import {Upload} from "@aws-sdk/lib-storage";
import * as fs from "fs";
import * as stream from "stream";

const client = new S3Client({
    region: process.env.TEST_REGION,
})
const pathToFile = "./test-file.txt"
const data = "#".repeat(1024 * 1024 * 10);
fs.writeFileSync(pathToFile, data);
const fileStream = fs.createReadStream(pathToFile)
const passThrough = new stream.PassThrough();
fileStream.pipe(passThrough)
const upload = new Upload({
    client: client,
    params: {
        Bucket: process.env.TEST_BUCKET,
        Key: process.env.TEST_KEY,
        Body: passThrough,
    }
})
upload.on('httpUploadProgress', (progress) => {
    console.log('Progress: ', progress)
})
await upload.done();

Thanks!

yenfryherrerafeliz avatar Feb 18 '23 17:02 yenfryherrerafeliz

Same issue. It's been a year, hopefully fixed soon.

devnomic avatar Jan 27 '24 03:01 devnomic

Facing the same issue here. @yenfryherrerafeliz suggested this approach in https://github.com/aws/aws-sdk-js-v3/discussions/5165, maybe there should be a note that it doesn't work for files larger than 5MB?

Also, is using max of all ev.loaded instead of ev.total a reasonable workaround for now? Thanks!

jirimoravcik avatar Apr 18 '24 15:04 jirimoravcik

Any update on this?

beala avatar May 01 '24 16:05 beala

Hello!

I managed to check every data type regarded as "accepted" within the Upload class: string | Uint8Array | Buffer | Readable | ReadableStream | Blob.

The findings:

  • The only working type parameters reporting the totalBytes in a multipart upload are the non-stream ones: string | Uint8Array | Buffer | Blob.

  • When using streams, it works fine for files of a size smaller than the partSize that was defined on initialization. If the size is larger than that, you get an undefined total. This is because the stream is splitted to multi-parts as intended and it is the precise point where things get messy.

    • This behavior is consistent with the __doConcurrentUpload execution code checking if the upload has only 1 part to be uploaded.

    • Then, the __uploadUsingPut executes only when the file size is smaller than the partSize to be splitted (minimum of 5 MB if none was defined). When reporting the progress, it doesn't take the private class member this.totalBytes upon the __notifyProgress call. Instead, the size is calculated from the "dataPart" parameter which was converted to Buffer type previously.

  • What @yenfryherrerafeliz suggested is aligned with the idea that only those types are correctly managed upon initialization:

I feel this issue just happens when the body given as parameter does not have any of the properties evaluated in the conditions here. As of right now I will mark this issue to be reviewed so it can be further addressed.

This is partially true, because the issue is strongly related to the Buffer conversion step, here and here. If the conversion to Buffer had been performed previously to defining the totalBytes in the initialization step here, it should work.

On a multipart upload, the __notifyProgress consumes the this.totalBytes value defined on initialization. This can be referenced here. And there you go: undefined as the value. The byteLength function is not intended to support streams, but works fine using just Buffer, Uint8Array or String.

I believe that a good solution can be achieved by strengthening the types first, replacing the "any" here which in fact is the real cause of the bug: a weak type check allowing for multiple incompatible inputs for this function. Matching type with this type definition should help. Secondly, additional code has to be added inside that function for streams on initialization. These strategies: 1 and 2 might be reutilized to that end.

I hope this helps towards a definitive solution 👍

aagenesds22 avatar Jun 08 '24 17:06 aagenesds22

Hey all,

I am able to replicate the issue when working with a file of 11 megabytes in size. The relevant package versions being utilized are:

"@aws-sdk/client-s3": "^3.614.0" "@aws-sdk/lib-storage": "^3.614.0"/ "@aws-sdk/[email protected]"

The observed result during the file upload process is as follows:

Progress:  {
  loaded: 2154984,
  total: undefined,
  part: 3,
  Key: 'imag_older_version',
  Bucket: 'new-bucket-maggie-ma'
}
Progress:  {
  loaded: 7397864,
  total: undefined,
  part: 1,
  Key: 'imag_older_version',
  Bucket: 'new-bucket-maggie-ma'
}
Progress:  {
  loaded: 12640744,
  total: undefined,
  part: 2,
  Key: 'imag_older_version',
  Bucket: 'new-bucket-maggie-ma'
}

Please note that sensitive information, such as bucket names, has been replaced with placeholders for privacy and security purposes.

Steps to reproduce:

  1. Create an empty node project: npm init -y
  2. Add the dependencies: npm i @aws-sdk/client-s3 npm i @aws-sdk/lib-storage
  3. Then, create an empty file. Ex: src/index.js
  4. add a file which is over 5 mb, run the code :

import { S3Client } from "@aws-sdk/client-s3";
import { Upload } from "@aws-sdk/lib-storage";
import * as fs from "fs";
import * as stream from "stream";

const client = new S3Client({
  // your region
});
const fileStream = fs.createReadStream("./your_file_name");
const passThrough = new stream.PassThrough();
fileStream.pipe(passThrough);
const upload = new Upload({
  client: client,
  params: {
    Bucket: "your_bucket_name",
    Key: "your_key",
    Body: passThrough,
  },
});
upload.on("httpUploadProgress", (progress) => {
  console.log("Progress: ", progress);
});

await upload.done().then((x) => {
  console.log("Upload done", x);
});


@aagenesds22 Regarding adding strict type suggestion - I understand your point. Since this is not a public API, strict typing may not be a necessity. Additionally, introducing dependencies on platform-specific types, especially those from the global namespace, can potentially lead to issues and complications.

I appreciate you taking the time to clarify this point and provide additional context. I have brought this up to the AWS SDK JS team, it's queued now.

Thanks! Maggie

zshzbh avatar Jul 18 '24 19:07 zshzbh