aws-sdk-js-v3 icon indicating copy to clipboard operation
aws-sdk-js-v3 copied to clipboard

[BUG]: Use Upload(@aws-sdk/lib-storage) to carry md5 encountered an error: `The XML you provided was not well-formed or did not validate against our published schema`

Open skypesky opened this issue 2 years ago • 6 comments

Checkboxes for prior research

Describe the bug

When I uploaded the file and asked S3 to check the md5 of the file for me, I got an error: MalformedXML: The XML you provided was not well-formed or did not validate against our published schema

SDK version number

@aws-sdk/[email protected], @aws-sdk/[email protected],

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

node -v: v16.17.1

Reproduction Steps

      const upload = new Upload({
        client,
        params: {
          Bucket: 'test',
          Key: 'demo.pdf',
		 // note: data size > 30 MB
          Body: data,
		 // data md5 value
          ContentMD5: 'wSunmxovn3F4x1+NV+/d1A==',
          Metadata: {
            'x-hash': options.hash,
          },
        },
      });

      await upload.done();

Observed Behavior

Upload failed and error found:

2023-01-03T02:14:46: MalformedXML: The XML you provided was not well-formed or did not validate against our published schema
2023-01-03T02:14:46:     at throwDefaultError (/Users/skypesky/workSpaces/javascript/arcblock/did-storage/node_modules/@aws-sdk/smithy-client/dist-cjs/default-error-handler.js:8:22)
2023-01-03T02:14:46:     at deserializeAws_restXmlCompleteMultipartUploadCommandError (/Users/skypesky/workSpaces/javascript/arcblock/did-storage/packages/s3-driver/node_modules/@aws-sdk/client-s3/dist-cjs/protocols/Aws_restXml.js:3086:43)
2023-01-03T02:14:46:     at processTicksAndRejections (node:internal/process/task_queues:96:5)
2023-01-03T02:14:46:     at async /Users/skypesky/workSpaces/javascript/arcblock/did-storage/node_modules/@aws-sdk/middleware-serde/dist-cjs/deserializerMiddleware.js:7:24
2023-01-03T02:14:46:     at async /Users/skypesky/workSpaces/javascript/arcblock/did-storage/packages/s3-driver/node_modules/@aws-sdk/middleware-signing/dist-cjs/middleware.js:14:20
2023-01-03T02:14:46:     at async /Users/skypesky/workSpaces/javascript/arcblock/did-storage/node_modules/@aws-sdk/middleware-retry/dist-cjs/retryMiddleware.js:27:46
2023-01-03T02:14:46:     at async /Users/skypesky/workSpaces/javascript/arcblock/did-storage/node_modules/@aws-sdk/middleware-logger/dist-cjs/loggerMiddleware.js:5:22
2023-01-03T02:14:46:     at async Upload.__doMultipartUpload (/Users/skypesky/workSpaces/javascript/arcblock/did-storage/packages/s3-driver/node_modules/@aws-sdk/lib-storage/dist-cjs/Upload.js:226:22)
2023-01-03T02:14:46:     at async Upload.done (/Users/skypesky/workSpaces/javascript/arcblock/did-storage/packages/s3-driver/node_modules/@aws-sdk/lib-storage/dist-cjs/Upload.js:39:16)

Expected Behavior

I hope it was a successful upload

Possible Solution

No response

Additional Information/Context

S3_REGION=ap-northeast-1

related: https://github.com/aws/aws-sdk-js-v3/issues/2673

skypesky avatar Jan 03 '23 05:01 skypesky

Hi @skypesky, thanks for opening this issue. I can confirm this is a bug. Seems like the exception that we get is caused by the checksum being provided, which is being sent along with each part and this checksum was calculated for the whole file content, and it needs to be calculated just for the chunk of data sent for that specific part of the file. I can also confirm that the workaround proposed here works fine, but you should remove the md5 parameter from your code. I will mark this issue for review so we can address it further.

Repro steps: Installed the following packages:

yarn add @aws-sdk/client-s3
yarn add @aws-sdk/lib-storage

I used the following code:

import {
    S3Client
} from "@aws-sdk/client-s3";
import {
    Upload
} from "@aws-sdk/lib-storage";
import * as crypto from "crypto";

const client = new S3Client({
    region: 'us-east-2'
});
const body = '#'.repeat(1024 * 1024 * 31);
const md5 = crypto.createHash("MD5").update(body).digest("base64");
const upload = new Upload({
    client: client,
    params: {
        Bucket: process.env.TEST_BUCKET,
        Key: process.env.TEST_KEY,
        Body: body,
        ContentMD5: md5,
        Metadata: {
            'x-hash': md5,
        },
    },
});
const response = await upload.done();

console.log(response);

Thanks!

yenfryherrerafeliz avatar Jan 04 '23 23:01 yenfryherrerafeliz

@yenfryherrerafeliz

Thank you very much for your reply. I have a question, after this bug is fixed, will ContentMD5 finally fill in the md5 of the entire file?

skypesky avatar Jan 15 '23 12:01 skypesky

@skypesky, I do not have a final picture about how it would be, but, according to the documentation each upload part command needs to sent a checksum based in the data sent in that part specifically.

Thanks!

yenfryherrerafeliz avatar Jan 17 '23 22:01 yenfryherrerafeliz

I can confirm we are experiencing the same here, it works perfectly on smaller files 1-2mb but as soon as you send a larger file it spits out the XML error. Watching for the final solution so we can update our code.

andyslack avatar Dec 07 '23 17:12 andyslack

@andyslack what are you doing in the meantime to circumvent this issue?

itzcull avatar Feb 08 '24 01:02 itzcull