aws-sdk-js-v3
aws-sdk-js-v3 copied to clipboard
Race conditions in S3 upload
Checkboxes for prior research
- [X] I've gone through Developer Guide and API reference
- [X] I've checked AWS Forums and StackOverflow.
- [X] I've searched for previous similar issues and didn't find any solution.
Describe the bug
We're getting errors from AWS SDK when using:
import { Upload } from '@aws-sdk/lib-storage';
We create an instance of S3 client via:
const s3Client = new S3({ apiVersion: '2016-04-01' });
then the code (where params is of type PutObjectCommandInput):
const upload = new Upload({
client: s3Client,
params,
});
await upload.done();
These are most likely race conditions in the SDK as we're getting weird errors at times when we perform many simultaneous uploads to S3. Errors we have observed:
"name":"400","message":"UnknownError""name":"XAmzContentSHA256Mismatch","message":"The provided 'x-amz-content-sha256' header does not match what was computed.""name":"Error","message":"char 'H' is not expected.:1:1\n Deserialization error: to see the raw response, inspect the hidden field {error}.$response on this object."
SDK version number
"@aws-sdk/client-s3": "3.565.0", "@aws-sdk/lib-storage": "3.565.0"
Which JavaScript Runtime is this issue in?
Node.js
Details of the browser/Node.js/ReactNative version
20.12.2
Reproduction Steps
We upgraded our node js from 14 to 20 and these errors started appearing.
Observed Behavior
Errors we have observed, which seem to be race conditions:
"name":"400","message":"UnknownError""name":"XAmzContentSHA256Mismatch","message":"The provided 'x-amz-content-sha256' header does not match what was computed.""name":"Error","message":"char 'H' is not expected.:1:1\n Deserialization error: to see the raw response, inspect the hidden field {error}.$response on this object."
Expected Behavior
No errors happening.
Possible Solution
No response
Additional Information/Context
This issue started to appear when we upgraded Node.js from version 14 to 20.
It appears someone else reported this but was ignored, see https://github.com/aws/aws-sdk-js-v3/issues/5455
Is this on Lambda?
many simultaneous uploads to S3
What does the concurrency code look like?
What are the JS class types and sizes of the uploaded bodies?
Please create a repository with an executable minimal reproduction of the issue.
Is this on Lambda?
many simultaneous uploads to S3
What does the concurrency code look like?
What are the JS class types and sizes of the uploaded bodies?
Please create a repository with an executable minimal reproduction of the issue.
It is a normal Node.js application running in Kubernetes (EKS).
The concurrency code has ~100 promises running in parallel, each one executing code that uploads data to S3.
The data is a JSON file of small size <1MB.
Specifically, in the Node.js code, we use Readable.from(data), where data is a normal string with the JSON, which is then passed to Upload from @aws-sdk/lib-storage.
As for the minimal reproduction, we have this issue happening very rarely and haven't been able to get good reproduction code just yet - we're working on that.
On the other hand, the issue started happening when we upgraded Node.js from 14 to 20, we haven't touched the code we use or AWS SDK versions.
We experienced the same issue on lambda without any concurrency(except the one introduced by Uploads queueSize) at all. In some cases the lambda even seems to exit early (without error) while awaiting upload.done() in a loop. My current suspicion is that there is some kind of floating promise somewhere in client-s3/lib-storage because this would also explain the weird lambda behavior
Note: for us this only seems to happen when lambda runs a few requests sequentially in the same context, though because of its random nature its hard to say definitely
Hey, just posting a small update:
- We've been able to reproduce this, repository with code is on its way
- It is environment-agnostic (it was reproduced on a local machine)
- It happens extremely rarely (1 in 1,000,000)
Readable.from(data), wheredatais a normal string with the JSON ... of small size <1MB
I have an idea that doesn't help identify the problem, but as an aside, if the data is originally a string, i.e. buffered into memory already, you don't need to convert it to a Readable stream object to pass it to the SDK.
new Upload({
params: {
Bucket, Key, Body: "a string"
}
}).done();
should work as well. In fact, if it's a small (<5mb) string, you might as well call
const s3 = new S3({});
s3.putObject({ Bucket, Key, Body: "a string" });
Because, Upload calls putObject for data smaller than the minimum multipart upload part size.
I've prepared a reproduction here: https://github.com/apify/s3-node-20-bug-repro
Please let me know if there are any issues running it.
any updates or workarounds to this issue? I believe I'm hitting this but can't seem to find a workaround
I'm experiencing this error as well with 3 out of 5 AWS accounts, but unfortunately cannot determine the origin of it
Hi everyone - I wanted to update here that we haven't been able to replicate this error or identify any potential underlying cause on our SDK side. I personally have attempted to reproduce with different Node versions, regions and uploads up to (1,000,000). Here's latest attempt with the repro shared by fnesved@ above. https://github.com/apify/s3-node-20-bug-repro
import { Readable } from 'stream';
import { S3 } from '@aws-sdk/client-s3';
import { Upload } from '@aws-sdk/lib-storage';
import async from 'async';
// This script uploads a lot of testing files to S3 with a high concurrency,
// to try to reproduce the issue with InternalServer errors from S3
// that happen with Node 20.
const BUCKET = 's3-upload-repro'; // FIXME: Change this to your testing bucket name
const TOTAL_UPLOADS = 1_000_000; // The issue happens approximately every 1_000_000 uploads, this should be enough to reproduce it
const CONCURRENCY = 100;
// Use `maxAttempts: 1` to reproduce the issue easier, but it happens even with the default 3 attempts
const client = new S3({ region: 'us-west-1', maxAttempts: 3 });
let lastSlowDownErrorReportedAt = 0;
const doUpload = async (i) => {
// Sleep for 1 second if SlowDown error was reported in the last second
if (Date.now() - lastSlowDownErrorReportedAt < 1000) {
await new Promise((resolve) => setTimeout(resolve, 1000));
}
try {
process.stdout.write(`Uploading ${i+1}/${TOTAL_UPLOADS}...\r`);
const upload = new Upload({
client,
params: {
Bucket: BUCKET,
Key: `file-${i}`,
Body: Readable.from('abcd'),
},
});
await upload.done();
} catch (e) {
if (e.code === 'ECONNRESET') return; // Ignore ECONNRESET errors, they happen from time to time and they're not important for this test
if (e.code === 'ENOTFOUND') return; // Ignore ENOTFOUND errors, they happen from time to time and they're not important for this test
if (e.Code === 'SlowDown') { // Remember when SlowDown error was reported, so that we can slow down the next requests in case we're rate limited
lastSlowDownErrorReportedAt = Date.now();
return;
}
console.error(`Upload ${i} failed at ${new Date().toISOString()}`, e);
}
}
await async.mapLimit(
Array.from({ length: TOTAL_UPLOADS }, (_, i) => i),
CONCURRENCY,
doUpload,
);
console.log('\nDone');
As you can see, I've attempted to upload multiple times with upload ranges from (10,000-1,000,000) with Node v20.17.0 (v22.8.0 as well), maxAttempts set to 3.
If you're still experiencing this issue, please share your code, environment details along with request logs. You can retrieve raw request logs by adding middlewareStack to client in your code.
client.middlewareStack.add(
(next, context) => async (args) => {
console.log("AWS SDK context", context.clientName, context.commandName);
console.log("AWS SDK request input", args.input);
const result = await next(args);
console.log("AWS SDK request output:", result.output);
return result;
},
{
name: "MyMiddleware",
step: "build",
override: true,
}
);
@m-kemarskyi - can you see if there's any distinctions between your accounts?
This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.
reopened in https://github.com/aws/aws-sdk-js-v3/issues/6940