aws-sdk-js-v3 icon indicating copy to clipboard operation
aws-sdk-js-v3 copied to clipboard

S3 commands hang after uploading an empty stream using `@aws-sdk/lib-storage`

Open martinslota opened this issue 1 year ago • 10 comments

Checkboxes for prior research

Describe the bug

Immediately ending the stream feeding the Upload utility from @aws-sdk/lib-storage sometimes leads to the S3 client getting "locked up": a subsequent command sent through it never finishes.

SDK version number

@aws-sdk/[email protected], @aws-sdk/[email protected]

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

v20.15.0

Reproduction Steps

Detailed steps to reproduce are listed in https://github.com/martinslota/aws-sdk-lib-storage-race. Those steps run against Localstack, but the same code also works against actual AWS S3.

Observed Behavior

After uploading an empty stream, commands sent through the S3 client never finish.

Expected Behavior

After uploading an empty stream, commands sent through the S3 client should continue to work as usual.

Possible Solution

No response

Additional Information/Context

No response

martinslota avatar Aug 31 '24 13:08 martinslota

Hi @martinslota - thanks for reaching out.

Before proceeding, I'd like to request your assistance in confirming that the issue persists when running the code against the actual AWS service as the example you shared appears to be using a LocalStack endpoint (https://s3.localhost.localstack.cloud:4566). While LocalStack is a great tool for local development and testing, it's always a good practice to validate the behavior against the real AWS service before raising a formal issue.

Additionally, to better understand the underlying interactions between your code and the AWS SDK, please add middlewareStack to S3 client in your code to get request logs. This middleware will log the context, request input and response output for each AWS SDK operation, providing valuable insights into the data being sent and received.

client.middlewareStack.add(
  (next, context) => async (args) => {
    console.log("AWS SDK context", context.clientName, context.commandName);
    console.log("AWS SDK request input", args.input);
    const result = await next(args);
    console.log("AWS SDK request output:", result.output);
    return result;
  },
  {
    name: "MyMiddleware",
    step: "build",
    override: true,
  }
);

Looking forward to your response!

aBurmeseDev avatar Sep 03 '24 07:09 aBurmeseDev

Hi @martinslota - thanks for reaching out.

👋

Before proceeding, I'd like to request your assistance in confirming that the issue persists when running the code against the actual AWS service as the example you shared appears to be using a LocalStack endpoint (https://s3.localhost.localstack.cloud:4566). While LocalStack is a great tool for local development and testing, it's always a good practice to validate the behavior against the real AWS service before raising a formal issue.

Yes, the same happens against the actual AWS S3 service. I also adapted https://github.com/martinslota/aws-sdk-lib-storage-race so it runs against S3 by default.

Additionally, to better understand the underlying interactions between your code and the AWS SDK, please add middlewareStack to S3 client in your code to get request logs. This middleware will log the context, request input and response output for each AWS SDK operation, providing valuable insights into the data being sent and received.

client.middlewareStack.add(
  (next, context) => async (args) => {
    console.log("AWS SDK context", context.clientName, context.commandName);
    console.log("AWS SDK request input", args.input);
    const result = await next(args);
    console.log("AWS SDK request output:", result.output);
    return result;
  },
  {
    name: "MyMiddleware",
    step: "build",
    override: true,
  }
);

Looking forward to your response!

I added the code above but additionally redacted the Body field from response output. The output looks like this.

Previously I also tried with an hour-long timeout instead of 5 seconds and I still saw the script fail.

martinslota avatar Sep 03 '24 12:09 martinslota

I'm experiencing this as well

fenos avatar Sep 23 '24 10:09 fenos

Hey @aBurmeseDev sorry for the direct ping, do you know where the can problem might be? I can have a look, currently this issue is making requests randomly hang around 1% with high traffic

I have run the reproduction repo that @martinslota kindly provided and indeed it also happens locally after around 5k uploads in my case I tested with MinIO so it's definitely a client side issue

This happens also with the AWS S3 Service

fenos avatar Sep 25 '24 06:09 fenos

Hi @fenos - this's interesting! I haven't been able to reproduce it on my end with S3 itself but haven't tested with MinIO. I'll have to dig deeper into it. If there's any other specific case or pattern that I need to be aware of, please do share.

aBurmeseDev avatar Sep 26 '24 23:09 aBurmeseDev

Yeah, it is really random, I don't really have a reliable way of reproducing it yet Sometimes the done promise is never resolved from the Upload class, no requests are sent by the SDK for uploading.

I have otel telemetry on the aws SDK and added custom span in the uploader, and it simply hangs on the done without even sending any request. I have an idle timeout of 60 seconds that kicks in which then closes the request socket.

I can confirm that it is not something to do with the default agent option maxSockets of 50, since I've increased that number to 700 and instrumented the Agent. Free sockets stay at 60-75 whereas busy sockets are 20-30 at any given time.

I have tried with a couple of different setups for the Agent, using native and others. Currently, I'm using agentkeepalive which simply extends the native Agent with some better connection handling.

in my app I pretty much pass the request body as a stream to the Uploader


const s3Client = new S3Client({
   ...credentials,
   requestHandler: new NodeHandler({
      httpAgent: new Agent({
        maxSockets: 700,
        freeSocketTimeout: 15000
  })
})

fastify.post('/upload', async (req, reply) => {
    const upload = new Upload({
        client: s3Clients,
        params: {
            ...bucketAndKey,
            // body as Readable stream http.IncomingMessage
            Body: req.raw
        }
    })

   await upload.done()

   return reply.send('OK')
})

Files are particularly small ~1MB. As soon as I get more findings I'll share, currently adding as much telemetry as possible

This issue only occurs with the Uploader

fenos avatar Sep 27 '24 10:09 fenos

Appreciate the details. Please do share once you have more findings while I further look into it on my end.

aBurmeseDev avatar Oct 01 '24 21:10 aBurmeseDev

Hello @aBurmeseDev I've added more debugging spans and here is what's happening:

image

The metadata I gathered is the following:

Request Content-Lenght: 320,817 bytes

{
   # Bytes uploaded from the Upload class
   "bytesUploaded":0,

   # Passthrough proxy Stream information 
   "dataStream":{
      "closed":false,
      "paused":false,
      "errored":false,
      "writable":true,
      "byteRead":130853
   },

   # custom HttpAgent pool count
   "httpAgentStats":{
      "busySocketCount":0,
      "freeSocketCount":1,
      "pendingRequestCount":0,
      "errorSocketCount":0,
      "timeoutSocketCount":0,
      "createSocketErrorCount":0
   },

   # Parts uploaded
   "progress": []
}

Telemetry

  • Use a passthrough stream to read from the request stream and count the bytes read (just to make sure the client is actually sending the body)

  • Listening for the httpUploadProgress from the SDK to determine how many bytes were uploaded

  • Printing the httpPool sockets information from the HTTP agent passed in the NodeHandler

The code looks like this:

async function uploadObject(
    bucketName: string,
    key: string,
    body: Readable,
    signal?: AbortSignal
  ): Promise<void> {
    if (signal?.aborted) {
      throw ERRORS.Aborted('Upload was aborted')
    }

    // Create a Proxy Stream
    const passThrough = new PassThrough()

    if (signal) {
      addAbortSignal(signal, passThrough)
    }

    passThrough.on('error', () => {
      body.unpipe(passThrough)
    })

    body.on('error', (err) => {
      if (!passThrough.closed) {
        passThrough.destroy(err)
      }
    })

    const byteReader = createByteCounterStream()
    const bodyStream = body.pipe(passThrough)


    // Watch upload stream
    let upload: Upload | undefined = undefined

    // Upload stats
    const uploadProgress: Progress[] = []

    // Get current stream status
    const getStreamStatus = (): StreamStatus => ({
      time: new Date(),
      bytesUploaded: uploadProgress[uploadProgress.length - 1]?.loaded || 0,
      dataStream: {
        closed: bodyStream.closed,
        paused: bodyStream.isPaused(),
        errored: Boolean(bodyStream.errored),
        writable: bodyStream.writable,
        byteRead: byteReader.bytes,
      },
      httpAgentStats: gatherHttpAgentStats(this.agent.httpsAgent.getCurrentStatus()),
      progress: uploadProgress,
    })

    let streamStatus = getStreamStatus()

    const streamWatcher = setInterval(() => {
      streamStatus = getStreamStatus()
    }, 1000)

    // Upload the stream
    try {
      const data = await stream.pipeline(
        bodyStream,
        byteReader.transformStream,
        async (bodyStream) => {
          if (signal?.aborted) {
            throw ERRORS.Aborted('Upload was aborted')
          }

          upload = new Upload({
            client: this.client,
            params: {
              Bucket: bucketName,
              Key: key,
              Body: bodyStream as Readable,
            },
          })

          upload.on('httpUploadProgress', (progress) => {
            uploadProgress.push({
              total: progress.total,
              part: progress.part,
              loaded: progress.loaded,
            })
            if (uploadProgress.length > 100) {
              uploadProgress.shift()
            }
          })

          signal?.addEventListener(
            'abort',
            () => {
              upload?.abort()
            },
            { once: true }
          )

          return await upload.done()
        },
        { signal }
      )
    } catch (err: any) {
      if (err instanceof Error && err.name === 'AbortError') {
        const span = trace.getActiveSpan()
        if (span) {
          // Print how far we got uploading the file
          const lastStreamStatus = getStreamStatus()
          const { progress, ...lastSeenStatus } = streamStatus
          span.setAttributes({
            lastStreamStatus: JSON.stringify(lastStreamStatus),
            lastSeenStatus: JSON.stringify(lastSeenStatus),
          })
        }
      }

      throw err
    } finally {
      clearInterval(streamWatcher)
    }
  }

As you can see, the request sometimes gets stuck on the __doConcurrentUpload method even though there is data in the stream. I have a timeout of 60s which abort the signal if the socket is idling

The server is running on ECS Fargate.

I'm not sure where else I can trace this down further, I hope this helps to track down the issue. If not please let me know where else I should be adding telemetry.

fenos avatar Oct 04 '24 08:10 fenos

Commenting to keep this thread alive as I am also experiencing this issue.

astev89 avatar Dec 20 '24 17:12 astev89

+1

H6LS1S avatar Apr 09 '25 08:04 H6LS1S

@aBurmeseDev just bumping this up, as it is still happening with the latest version of the SDK. It happens consistently to me, for example, I'm uploading 100 empty files concurrently using the Upload class (also tried with putObject same issue), after the upload, we send a HeadObject request for each. Around 36 succeed (upload + head), and the remaining hang indefinitely on the HeadRequest.

The maxSockets on my S3Client is set to 700, so there must be something blocking elsewhere in the SDK class and not in the http agent.

I'll debug this further again, hopefully I can get into the problem

fenos avatar Aug 01 '25 09:08 fenos

also related https://github.com/aws/aws-sdk-js-v3/issues/7165

fenos avatar Aug 01 '25 10:08 fenos

I've added the following middleware to determine if the request hangs and indeed it does:

 this.client.middlewareStack.add(
      (next, context) => async (args) => {
        const r = setTimeout(() => {
          console.log('COMMAND HANGS', {
            clientName: context.clientName,
            commandName: context.commandName,
            input: args.input,
            stats: this.agent.httpsAgent.getCurrentStatus(),
          })
        }, 10000)
        const result = await next(args)
        clearTimeout(r)
        return result
      },
      {
        name: 'MyMiddleware',
        step: 'build',
        override: true,
      }
    )

receiving the output:

COMMAND HANGS, {
  clientName: 'S3Client',
  commandName: 'HeadObjectCommand',
  input: {
    Bucket: '[Redacted]',
    Key: 'bjhaohmqunupljrqypxz/empty-bucket-2/aa0d/aa01/aa00/aa00/aaaaa-test-4/a-file-2025-08-01T13-36-21-323Z-1754055390089-0.8331603008948798/.somethingElse/7f087fe7-2f3f-4a7f-9a71-e3443532a059'
  },
  stats: {
    createSocketCount: 101,
    createSocketErrorCount: 0,
    closeSocketCount: 101,
    errorSocketCount: 0,
    timeoutSocketCount: 0,
    requestCount: 606,
    freeSockets: {},
    sockets: {},
    requests: {}
  }
}

fenos avatar Aug 01 '25 13:08 fenos

Hi folks, this is Trivikram - one of the maintainers of AWS SDK for JavaScript (v3).

I looked into the discussions since the first report, and here is a summary from my understanding:

  • It's the SDK operation call which hangs, likely after the build step as per https://github.com/aws/aws-sdk-js-v3/issues/6426#issuecomment-3144633409
  • It's related to content of the string. In one case, a empty files were uploaded. In another case, an ongoing stream for an upload was ended.

I was not able to reproduce https://github.com/martinslota/aws-sdk-lib-storage-race even after uploading 1000+ objects

Export finished
AWS SDK context S3Client PutObjectCommand
AWS SDK request input {
  Bucket: 'test-js-6426',
  Key: 'object_1110',
  Body: <Buffer 73 6f 6d 65 20 64 61 74 61>
}

I'll continue looking into in to check if a consistent minimal repro can be created. If you come across a minimal repro, please share it. Or if existing minimal repro works for you, do share if any specific steps needs to be followed.

From the provided data, it appears that the issue is not specific to Upload utility in lib-storage. It's unique to HeadObject request after a PutObject operation was complete. In the repro which folks are able to reproduce, PutObject call was made for an empty file.

We don't know where the issue is. The question which needs answer is if the SDK make http request for HeadObject call?

  • For folks who used LocalStack in their repro, was the request registered in your local backend?
  • For folks able to reproduce with S3, we might want to override NodeHttpHandler itself to check the status of .handle call.

trivikr avatar Aug 13 '25 16:08 trivikr

@trivikr I just revived https://github.com/martinslota/aws-sdk-lib-storage-race a little bit and improved the instructions, both against Localstack and against AWS. In case they do not work for you, please do ping me - I'm happy to improve them further.

For me the reproduce still works fine.

For folks who used LocalStack in their repro, was the request registered in your local backend?

The last request log I see in Localstack's logs is a successful response to the GetObject command after which the script hanged. This is to say that when the tail of the script output looks like this:

Downloading batch with key object_282...
AWS SDK context S3Client GetObjectCommand
AWS SDK request input { Bucket: 'martin-slota-test', Key: 'object_282' }
file:///Users/martin.slota/Code/aws-sdk-lib-storage-race/node_modules/p-timeout/index.js:70
		const timeoutError = new TimeoutError();
		                     ^

TimeoutError: Promise timed out after 5000 milliseconds
    at file:///Users/martin.slota/Code/aws-sdk-lib-storage-race/node_modules/p-timeout/index.js:70:24
    at new Promise (<anonymous>)
    at pTimeout (file:///Users/martin.slota/Code/aws-sdk-lib-storage-race/node_modules/p-timeout/index.js:48:25)
    at file:///Users/martin.slota/Code/aws-sdk-lib-storage-race/index.js:123:9
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Node.js v20.15.0

the last request log from Localstack looks like this:

2025-08-13 19:50:52 2025-08-13T17:50:52.210  INFO --- [et.reactor-0] localstack.request.aws     : AWS s3.GetObject => 200; 000000000000/eu-west-1; GetObjectRequest({'Bucket': 'martin-slota-test', 'IfMatch': None, 'IfModifiedSince': None, 'IfNoneMatch': None, 'IfUnmodifiedSince': None, 'Key': 'object_282', 'Range': None, 'ResponseCacheControl': None, 'ResponseContentDisposition': None, 'ResponseContentEncoding': None, 'ResponseContentLanguage': None, 'ResponseContentType': None, 'ResponseExpires': None, 'VersionId': None, 'SSECustomerAlgorithm': None, 'SSECustomerKey': None, 'SSECustomerKeyMD5': None, 'RequestPayer': None, 'PartNumber': None, 'ExpectedBucketOwner': None, 'ChecksumMode': None}, headers={'x-amz-user-agent': 'aws-sdk-js/3.637.0', 'user-agent': 'aws-sdk-js/3.637.0 ua/2.0 os/darwin#24.6.0 lang/js md/nodejs#20.15.0 api/s3#3.637.0', 'Host': 'martin-slota-test.s3.localhost.localstack.cloud:4566', 'amz-sdk-invocation-id': '131436c8-7fa8-42b1-b7d2-1b30bb856b35', 'amz-sdk-request': 'attempt=1; max=3', 'x-amz-date': '20250813T175052Z', 'x-amz-security-token': 'IQoJb3JpZ2luX2VjEOr//////////wEaCWV1LXdlc3QtMSJGMEQCIGIP2pLZzkfGBQDdwiyBT0SAvH4dUc0y6AS5lknoEq+hAiB3d0rdOoNkbKlarhzQqe8GEk8klsDf//9LVxdSkhiOWSrLAggyEAIaDDUzMTc3NjM3MjQ5MyIMrR3p9nUnH3ACLRrcKqgCcOgHUgv/avK8FYUNTqLvVYKCGy0ljXNzxyiwVDrcKoNPav//dwNmRy5abZnSd0tJ/rO80GVFXRN0EnLQSkXw3FHDnfAnY/r9FFDrq0LjK1n7FyPihVvh3TDXSdYsct/8bftmdU4H8fQrlofSap71IY/1HTG8EbwX0vejl4dZ3R6I097WyDiWtcz1EPrJmjer4IKiTW5xCrBb1OFZTgj5uTjcqkMhSWtpBYWN9fXKTZkdl+32ansLMyCgCFSNZfq/dsGK1n/0JcDzzeGmkjRbAuD9povVeJekbHd1OLL0vtvZl2ensvDHtsu1JBgJlvOC+nVtCcflmBsvtdjFls8X61Org79Q4fC6dd3BYHaeHsW4tFhZDbSHar5oH6cc1cLV4iICYiVrfGkwzI7zxAY6ngHWQQzBAHqKxggtbnboxxiWRa4JiWz8FABN5UrL3rYlX7RelteWByL1ekSE5iP9jSVeanOpzhgVOTMdxDNqgn8Fp954EgS4RL5LXfwOFFYd8E+hRGluSfiIFKK2C9dmtRTkf2Xw2oKNXy46KfHrEPGFFzwnupL0RZs2OC15J2HlQKsstZPeMmC551QT86TKg2XOpOqmfN3gyPO27Ei0wg==', 'x-amz-content-sha256': 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855', 'authorization': 'AWS4-HMAC-SHA256 Credential=ASIAXXUCXP4GUWFQQ3NQ/20250813/eu-west-1/s3/aws4_request, SignedHeaders=amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date;x-amz-security-token;x-amz-user-agent, Signature=22b49cddfd79d02194d51a69da6a461f32e9317322e8371e1bc089ddfb80dc58', 'Connection': 'keep-alive', 'x-moto-account-id': '000000000000'}); GetObjectOutput({'AcceptRanges': 'bytes', 'LastModified': 'Wed, 13 Aug 2025 17:50:52 GMT', 'ContentLength': '9', 'ETag': '"1e50210a0202497fb79bc38b6ade6c34"', 'ContentType': 'application/octet-stream', 'Body': <localstack.services.s3.storage.ephemeral.EphemeralS3StoredObject object at 0xffff7f7b3bd0>, 'ServerSideEncryption': <ServerSideEncryption.AES256: 'AES256'>}, headers={'Content-Type': 'application/octet-stream', 'accept-ranges': 'bytes', 'Last-Modified': 'Wed, 13 Aug 2025 17:50:52 GMT', 'Content-Length': '9', 'ETag': '"1e50210a0202497fb79bc38b6ade6c34"', 'x-amz-server-side-encryption': <ServerSideEncryption.AES256: 'AES256'>, 'x-amz-request-id': 'b5b872a3-51f9-4ba5-922b-0ee6f1d5f0f2', 'x-amz-id-2': 's9lzHYrFp76ZVxRcpX9+5cjAnEH2ROuNkd2BHfIa6UkFVdtjf5mKR3/eTPFvsiP/XV/VLi31234='})

For folks able to reproduce with S3, we might want to override NodeHttpHandler itself to check the status of .handle call.

I also added a custom request handler that logs response status and headers. The tail of the script's output against AWS looks like this for me:

AWS SDK context S3Client PutObjectCommand
AWS SDK request input { Bucket: 'martin-slota-test', Key: 'object_91', Body: <Buffer > }
AWS SDK HTTP Response: {
  statusCode: 200,
  headers: {
    'x-amz-id-2': 'aJ3FyEeXR/PTREvV3H5WswNO1m1/t1YnBuVxbYT4NBhLX2IlEbbRB2Zdv1S/9s3djWOk/kF9aw8=',
    'x-amz-request-id': '2P86T7T6FP525N8V',
    date: 'Wed, 13 Aug 2025 18:02:27 GMT',
    'x-amz-server-side-encryption': 'AES256',
    etag: '"d41d8cd98f00b204e9800998ecf8427e"',
    'x-amz-checksum-crc64nvme': 'AAAAAAAAAAA=',
    'x-amz-checksum-type': 'FULL_OBJECT',
    'content-length': '0',
    server: 'AmazonS3'
  }
}
AWS SDK request output: {
  '$metadata': {
    httpStatusCode: 200,
    requestId: '2P86T7T6FP525N8V',
    extendedRequestId: 'aJ3FyEeXR/PTREvV3H5WswNO1m1/t1YnBuVxbYT4NBhLX2IlEbbRB2Zdv1S/9s3djWOk/kF9aw8=',
    cfId: undefined,
    attempts: 1,
    totalRetryDelay: 0
  },
  ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
  ServerSideEncryption: 'AES256',
  Body: 'REDACTED'
}
Downloading batch with key object_90...
AWS SDK context S3Client GetObjectCommand
AWS SDK request input { Bucket: 'martin-slota-test', Key: 'object_90' }
file:///Users/martin.slota/Code/aws-sdk-lib-storage-race/node_modules/p-timeout/index.js:70
                const timeoutError = new TimeoutError();
                                     ^

TimeoutError: Promise timed out after 5000 milliseconds
    at file:///Users/martin.slota/Code/aws-sdk-lib-storage-race/node_modules/p-timeout/index.js:70:24
    at new Promise (<anonymous>)
    at pTimeout (file:///Users/martin.slota/Code/aws-sdk-lib-storage-race/node_modules/p-timeout/index.js:48:25)
    at file:///Users/martin.slota/Code/aws-sdk-lib-storage-race/index.js:142:9
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Node.js v20.15.0

martinslota avatar Aug 13 '25 18:08 martinslota

Hi @martinslota , thank you for reviving https://github.com/martinslota/aws-sdk-lib-storage-race

I was able to reproduce the SDK hanging after object_36.

My logs indicate that (similar to yours but for object_91):

  • SDK sends GetObjectRequest for object_36
  • The server (LocalStack) returns 200 response for object_36
  • The test code times out in 5s before the request is processed.

To ensure that timeout doesn't play any role in this repro, I increased it to 50s. Even though server immediately sent a response, the repro waited for 50s before timing out.

This means there's no issue on the server side. The issue is likely in Node.js runtime http module which is not returning response, or in the SDK NodeHttpHandler which is not processing the response from the server. If it's former, we need to check if any http configuration can be changed to avoid it, or report it upstream. If it's latter, then it can be fixed in the SDK itself.

I'll debug it a little more. Here are the contents of the handle function if you want to deep dive in the mean time.

trivikr avatar Aug 14 '25 16:08 trivikr

While diving deep, I removed all logging statements and just introduced two which log request method and path before sending request and after receiving response as follows:

    console.log(`sending request: ${request.method} for ${request.path}`);
    const response = await this.innerHandler.handle(request, options);
    console.log(`received response: ${request.method} for ${request.path}`);

I ran the minimal repro by seeing NODE_DEBUG=http

Here are the logs for successful GET

HTTP 44534: have free socket
HTTP 44534: outgoing message end.
HTTP 44534: AGENT incoming response!
received response: GET for /object_273
HTTP 44534: AGENT socket keep-alive
HTTP 44534: CLIENT socket onFree
HTTP 44534: agent.on(free) test-js-6426.s3.localhost.localstack.cloud:4566:::::::::::::::::::::
HTTP 44534: removeSocket test-js-6426.s3.localhost.localstack.cloud:4566::::::::::::::::::::: writable: true
sending request: GET for /object_274
HTTP 44534: have free socket
HTTP 44534: outgoing message end.
HTTP 44534: AGENT incoming response!
received response: GET for /object_274
HTTP 44534: AGENT socket keep-alive
HTTP 44534: CLIENT socket onFree
HTTP 44534: agent.on(free) test-js-6426.s3.localhost.localstack.cloud:4566:::::::::::::::::::::
HTTP 44534: removeSocket test-js-6426.s3.localhost.localstack.cloud:4566::::::::::::::::::::: writable: true

Here are the logs for GET which hangs

sending request: PUT for /object_277
HTTP 44534: have free socket
HTTP 44534: AGENT incoming response!
HTTP 44534: AGENT incoming response!
HTTP 44534: write ret = true
HTTP 44534: outgoing message end.
received response: PUT for /object_277
HTTP 44534: AGENT socket keep-alive
HTTP 44534: CLIENT socket onFree
HTTP 44534: agent.on(free) test-js-6426.s3.localhost.localstack.cloud:4566:::::::::::::::::::::
HTTP 44534: removeSocket test-js-6426.s3.localhost.localstack.cloud:4566::::::::::::::::::::: writable: true
sending request: GET for /object_276
HTTP 44534: have free socket
HTTP 44534: outgoing message end.
HTTP 44534: AGENT socket keep-alive
HTTP 44534: CLIENT socket onFree
HTTP 44534: agent.on(free) test-js-6426.s3.localhost.localstack.cloud:4566:::::::::::::::::::::
HTTP 44534: removeSocket test-js-6426.s3.localhost.localstack.cloud:4566::::::::::::::::::::: writable: true

I don't see the following log

HTTP 44534: AGENT incoming response!

That's indicating that JS SDK is not receiving response from server, although it's sent by the backend (verified from LocalStack logs).

trivikr avatar Aug 14 '25 18:08 trivikr

I played around with the repro further in my fork https://github.com/trivikr/aws-sdk-lib-storage-race

Noticed two things:

  • The issue is not reproducible with PutObject, and is likely unique to Upload in lib-storage. Tested in commit https://github.com/trivikr/aws-sdk-lib-storage-race/commit/8c16e698eaf7864ddf3488e636cf7a0c3596e215
  • The issue is reproducible if I just upload an empty stream and download it in succession. Tested in commit https://github.com/trivikr/aws-sdk-lib-storage-race/commit/c7801e030ed2e0fecbbbd955cc049f4ee2853c64

trivikr avatar Aug 14 '25 21:08 trivikr

To find out if an issue with lib-storage, I logged the feed part which was yielded

File: node_modules/@aws-sdk/lib-storage/dist-cjs/index.js

// src/chunks/getChunkStream.ts

async function* getChunkStream(data, partSize, getNextData) {
  // ...
  const toYield = {
    partNumber,
    data: currentBuffer.chunks.length !== 1 ? import_buffer.Buffer.concat(currentBuffer.chunks) : currentBuffer.chunks[0],
    lastPart: true
  };
  console.log({ yield: toYield, chuckLength: toYield.data.byteLength });
  yield toYield;
}

No difference between upload for getObject call which was hung vs what was successful

{
  yield: { partNumber: 1, data: <Buffer >, lastPart: true },
  chuckLength: 0
}
sending request: PUT for /object_26
received response: PUT for /object_26
sending request: GET for /object_26
received response: GET for /object_26
{
  yield: { partNumber: 1, data: <Buffer >, lastPart: true },
  chuckLength: 0
}
sending request: PUT for /object_27
received response: PUT for /object_27
sending request: GET for /object_27

trivikr avatar Aug 14 '25 21:08 trivikr

Another deep dive with lib-storage is to log when PutObject call is succeessful

File: node_modules/@aws-sdk/lib-storage/dist-cjs/index.js

  async __uploadUsingPut(dataPart) {
    var _a;
// ...
    const totalSize = byteLength(dataPart.data);
    console.log({ 
      singleUploadResult: this.singleUploadResult,
      totalSize,
    });
    this.__notifyProgress({
// ...
  }

No difference between upload for getObject call which was hung vs what was successful

LocalStack
sending request: GET for /object_14
received response: GET for /object_14
sending request: PUT for /object_15
received response: PUT for /object_15
{
  singleUploadResult: {
    '$metadata': {
      httpStatusCode: 200,
      requestId: '1b850ab9-ac73-4591-af54-591033510c14',
      extendedRequestId: 's9lzHYrFp76ZVxRcpX9+5cjAnEH2ROuNkd2BHfIa6UkFVdtjf5mKR3/eTPFvsiP/XV/VLi31234=',
      cfId: undefined,
      attempts: 1,
      totalRetryDelay: 0
    },
    ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
    ServerSideEncryption: 'AES256',
    Bucket: 'test-js-6426',
    Key: 'object_15',
    Location: 'https://test-js-6426.s3.localhost.localstack.cloud:4566/object_15'
  },
  totalSize: 0
}
sending request: GET for /object_15
received response: GET for /object_15
sending request: PUT for /object_16
received response: PUT for /object_16
{
  singleUploadResult: {
    '$metadata': {
      httpStatusCode: 200,
      requestId: '7c7452cc-18a6-4ee7-a17f-ba530e862359',
      extendedRequestId: 's9lzHYrFp76ZVxRcpX9+5cjAnEH2ROuNkd2BHfIa6UkFVdtjf5mKR3/eTPFvsiP/XV/VLi31234=',
      cfId: undefined,
      attempts: 1,
      totalRetryDelay: 0
    },
    ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
    ServerSideEncryption: 'AES256',
    Bucket: 'test-js-6426',
    Key: 'object_16',
    Location: 'https://test-js-6426.s3.localhost.localstack.cloud:4566/object_16'
  },
  totalSize: 0
}
sending request: GET for /object_16
S3
sending request: PUT for /object_680
received response: PUT for /object_680
{
  singleUploadResult: {
    '$metadata': {
      httpStatusCode: 200,
      requestId: '6V429CE0N165S0P4',
      extendedRequestId: 'LfqfFnulFE0fmGQPz6j686QG7U9iOLdC3oRL5p1iYzhU4UY4T5ErDTck+iGUJ0tRB+/lLF/YXiI=',
      cfId: undefined,
      attempts: 1,
      totalRetryDelay: 0
    },
    ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
    ServerSideEncryption: 'AES256',
    Bucket: 'test-js-6426',
    Key: 'object_680',
    Location: 'https://test-js-6426.s3.us-west-2.amazonaws.com/object_680'
  },
  totalSize: 0
}
sending request: GET for /object_680
received response: GET for /object_680
sending request: PUT for /object_681
received response: PUT for /object_681
{
  singleUploadResult: {
    '$metadata': {
      httpStatusCode: 200,
      requestId: '6V4BD41ZD8HAYBBS',
      extendedRequestId: 'pFk7+O+liFmPtXbfA/q4LoKKKp0y3XnCM139DnBXvsKJtFHYrehFn6EUnAK/EN3WNCi4JkuEBHM=',
      cfId: undefined,
      attempts: 1,
      totalRetryDelay: 0
    },
    ETag: '"d41d8cd98f00b204e9800998ecf8427e"',
    ServerSideEncryption: 'AES256',
    Bucket: 'test-js-6426',
    Key: 'object_681',
    Location: 'https://test-js-6426.s3.us-west-2.amazonaws.com/object_681'
  },
  totalSize: 0
}
sending request: GET for /object_681

trivikr avatar Aug 14 '25 22:08 trivikr

To check if NodeHttpHandler is receiving response, I made the following changes

File edited: node_modules/@smithy/node-http-handler/dist-cjs/index.js

      const requestFunc = isSSL ? import_https.request : import_http.request;
      console.log(`[internal] sending request: ${request.method} for ${request.path}`);
      const req = requestFunc(nodeHttpsOptions, (res) => {
        console.log(`[internal] received response: ${request.method} for ${request.path}`);

Logs show that NodeHttpHandler is not receiving response, i.e. the callback for https.request is not called

[internal] sending request: GET for /object_34
[internal] received response: GET for /object_34
[internal] sending request: PUT for /object_35
[internal] received response: PUT for /object_35
[internal] sending request: GET for /object_35
file:///Users/trivikr/workspace/aws-sdk-lib-storage-race/node_modules/p-timeout/index.js:70
                const timeoutError = new TimeoutError();
                                     ^

TimeoutError: Promise timed out after 5000 milliseconds
    at file:///Users/trivikr/workspace/aws-sdk-lib-storage-race/node_modules/p-timeout/index.js:70:24
    at new Promise (<anonymous>)
    at pTimeout (file:///Users/trivikr/workspace/aws-sdk-lib-storage-race/node_modules/p-timeout/index.js:48:25)
    at file:///Users/trivikr/workspace/aws-sdk-lib-storage-race/index.js:84:9
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)

Node.js v22.18.0

trivikr avatar Aug 14 '25 22:08 trivikr

While reading the code, I came across this comment:

https://github.com/aws/aws-sdk-js-v3/blob/756643ecd734b143ded1f46997ca7c3488cda313/lib/lib-storage/src/Upload.ts#L348-L356

This could be a hint as to what is wrong. This comment is basically inaccurate. Because of the Promise.all right above, we can actually end up calling the AbortMultipartUpload while some promises are still pending.

matthieusieben avatar Sep 08 '25 08:09 matthieusieben

Here is a PR that might help with this issue, though I'm not sure as I'm not able to reproduce the error locally.

matthieusieben avatar Sep 08 '25 08:09 matthieusieben

@matthieusieben That seems very promising! I tried to apply your patch but even with that in place I can still reproduce the issue against both AWS S3 and against Localstack.

martinslota avatar Sep 08 '25 11:09 martinslota

I've been digging around a bit more and overall I think I've managed to conclude the following:

  1. The issue is not specific to @aws-sdk/lib-storage and can easily be reproduced by just using the regular S3 client with PutObject commands.
  2. One potential workaround for the issue is to avoid invoking httpRequest.end() with an empty buffer, and instead invoking httpRequest.end() without arguments.
  3. The underlying issue is likely a race within Node.js itself, specifically between the 'free' event handler on HttpAgent and the 'finish' event handler in the HttpClient. What I'm basically seeing is a socket getting freed up and used for a new request, followed by the finish handler on the old request wiping the socket clean and thus blocking the response to the new request from getting received. I'm guessing that the wrong assumption is that the socket is still associated with the old request at that time.

I'll try to follow up with updates to the minimal reproduce repository and a bug report in the Node.js repo.

martinslota avatar Sep 24 '25 19:09 martinslota

I simplified the reproduce so it directly uses PutObject commands.

I also reported a new issue here: https://github.com/nodejs/node/issues/60001.

I suppose this issue could potentially get closed - I don't think that the root cause is inside @aws-sdk/lib-storage.

martinslota avatar Sep 24 '25 22:09 martinslota