aws-sdk-js icon indicating copy to clipboard operation
aws-sdk-js copied to clipboard

SDK throws errors/duplicates requests when uploading via redirects

Open michaelskahn opened this issue 4 years ago • 3 comments

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug

When the SDK sends a POST/PUT and receives a redirect response before the request finishes sending, the SDK has a race condition where it does not properly clear state/buffers before sending the retry. This results in undefined behavior, including:

  • Sending multiple/duplicate requests to the redirected server (e.g. two putObject calls for a single invocation)
  • Throwing XML parsing errors trying to parse the unprocessed body of the redirect response as a response to a subsequent retry
  • Throwing Node ERR_INTERNAL_ASSERTION errors

Is the issue in the browser/Node.js? Node.js

If on Node.js, are you running this on AWS Lambda? No

Details of the browser/Node.js version v12.22.0 v14.16.0

SDK version number v2.876.0

To Reproduce (observed behavior)

Start a web server that redirects all requests to S3, for example:

const express = require('express');
const app = express();

app.all('/*', (req, res) => {
  const s3backend = 'https://s3.us-east-1.amazonaws.com';
  const location = `${s3backend}${req.url}`;
  console.log('redirecting to: ' + location);
  res.redirect(301, location);
});

if (require.main === module) {
    app.listen(8888);
}

Send a PUT/POST request with a "large" body (I was able to consistently replicate with a 5MB file created with truncate -s 5M 5_mb.data). We've seen it on files as small as 100KB.

'use strict';

const AWS = require('aws-sdk');
const fs = require('fs');

AWS.config.logger = console;

const s3 = new AWS.S3({
    s3ForcePathStyle: true,
    endpoint: new AWS.Endpoint('http://localhost:8888'),
  });

const params = {
  Key: 'testkey',
  Bucket: <S3 BUCKET>,
  Body: fs.readFileSync('5_mb.data')
};

const req = s3.putObject(params);
req.on('error', (err) => {
  console.log('error');
  console.log(err);
});
req.on('success', () =>  {
  console.log('success');
});
req.on('httpHeaders', (status) => {
  console.log(status);
});
req.on('retry', (resp) => {
  console.log('retry');
});
req.send();

Observe that the SDK sends two POSTs to the server after the redirect.

301
retry
[AWS s3 200 0.191s 1 retries] headBucket({ Bucket: 'tmp' })
retry
200
success
[AWS s3 200 33.227s 2 retries] putObject({
  Key: 'testkey',
  Bucket: 'tmp',
  Body: <Buffer 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 5242830 more bytes>
})
200
success
[AWS s3 200 36.58s 2 retries] putObject({
  Key: 'testkey',
  Bucket: 'tmp',
  Body: <Buffer 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 5242830 more bytes>
})

Sometimes (more often when running on Node 12), additional errors appear including XML parsing errors that indicate the SDK is trying to parse the body of the 301 redirect response as an API response. We've also seen Node assertions fire (this one from Node 12.22.0:

retry
301
retry
internal/assert.js:14
    throw new ERR_INTERNAL_ASSERTION(message);
    ^

Error [ERR_INTERNAL_ASSERTION]: This is caused by either a bug in Node.js or incorrect usage of Node.js internals.
Please open an issue with this stack trace at https://github.com/nodejs/node/issues

    at assert (internal/assert.js:14:11)
    at Socket.socketOnData (_http_client.js:472:3)
    at Socket.emit (events.js:314:20)
    at Socket.EventEmitter.emit (domain.js:483:12)
    at addChunk (_stream_readable.js:297:12)
    at readableAddChunk (_stream_readable.js:272:9)
    at Socket.Readable.push (_stream_readable.js:213:10)
    at TCP.onStreamRead (internal/stream_base_commons.js:188:23) {
  code: 'ERR_INTERNAL_ASSERTION'
}

Expected behavior

When the SDK receives a redirect, it should send the same request to the redirected URL once. It should never throw a ERR_INTERNAL_ASSERTION error. The expected output from the test script is

301
[AWS s3 200 0.187s 1 retries] headBucket({ Bucket: 'tmp' })
retry
200
success
[AWS s3 200 18.81s 1 retries] putObject({
  Key: 'testkey',
  Bucket: 'tmp',
  Body: <Buffer 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 5242830 more bytes>
})

Screenshots

Here's a screenshot of part of a packet capture for a failing request. The server sends the the 301 almost immediately, while the client is still sending the upload data. The HTTP 100 isn't part of the issue, as we've seen this happen with files under 1MB as well.

Screen Shot 2021-03-31 at 12 20 08 PM

Additional context

We've identified a partial workaround for this issue by forcing the server to wait for the entire request body before sending a response. Specifically, adding app.use(express.raw({limit: '50mb'})); to the express server results in consistent expected behavior, at the cost of performance.

michaelskahn avatar Mar 31 '21 16:03 michaelskahn

Hi @michaelskahn thanks for reaching out. Is the problem still persisting with the latest version of SDK?

vudh1 avatar Dec 10 '21 20:12 vudh1

This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.

github-actions[bot] avatar Dec 16 '21 00:12 github-actions[bot]

Hi @vudh1. I am still able to reproduce the problem with the latest SDK (2.1046.0) and the steps from before.

michaelskahn avatar Dec 16 '21 00:12 michaelskahn

Greetings! We’re closing this issue because it has been open a long time and hasn’t been updated in a while and may not be getting the attention it deserves. We encourage you to check if this is still an issue in the latest release and if you find that this is still a problem, please feel free to comment or open a new issue.

github-actions[bot] avatar Feb 23 '23 00:02 github-actions[bot]