aws-sdk-js-v3
aws-sdk-js-v3 copied to clipboard
Need better docs and examples encouraging the use of socketTimeout in AWS clients for V3. SDK should warn or error when invalid V2 style params are used.
Describe the issue
Recently we spent several days trying to debug mysterious RequestTimeout errors that started showing up in one of our AWS environments. This was occurring with the S3 client but after understanding the issue it could easily occur for any SDK client.
We had converted from V2 to V3 months ago and did not notice any problems till recently.
After close inspection, we had missed the conversion of the V2 httpOptions.timeout to the new requestTimeout.socketTimeout in V3. Although it is not specified in the docs, the default socketTimeout must be shorter than 72s since we were hitting it often at that interval.
So it should be made much more obvious that this is an important value to increase since the default is no longer sufficient and can just start failing.
Another PR was just created with the same problem, so I will also link it here. #6762 (I commented on the solution to help the user). I guess this evidence that more docs are needed.
I would expect that the typscript docs would explain the importance of the socketTimeout and what the default value is when unset. It might also provide the type of error one might expect if this is too low (RequestTimeout). It should be made clear that this value should be set since the current default is not sufficient (at least for S3). Currently the description for this item is completely blank.
I see that you recently updated the migrating.md notable-changes section which helps if people were coming from V2 to V3. However I think it may still need some improvement to explain the importance of the socketTimeout value and what may happen if not set.
I believe that the examples provided and especially the best practices docs or examples should include this value by default to encourage its use since you really should not run with in production with the default.
It also would have been really helpful to have warnings or error messages when we create an AWS client using the old V2 parameters like httpOptions.timeout. This would have prevented us from introducing this regression bug with our code. Currently there does not appear to be any indication that you are using the old values and that they will no longer do anything. I know it's difficult to check all settings, but the ones that have clearly moved, should be errored or at least warned about so the code can be corrected.
I don't know if the other API's are as sensitive to timeouts as S3 is, but I would suspect it would be good practice to increase the socketTimeout for all of them if the default remains so low. The main one that we hit was when using lib-storage to do a multi-part upload.
The code examples for the various commands also do not show any best practices for setting the options. Many of the examples don't even show the options object at all. So while maybe it is overkill to have these details in every example, maybe you should have a comment by the options object linking to a nice doc that would spell out all the important client settings that one should typically use.
I think it would be smart for best practices to show all the details one needs to make your code production ready.
Another thing that would have been helpful for us is that when using @aws-sdk/lib-storage to do a multi-part upload, the error generated by the Upload object is just a basic AbortError with no indication of why it was aborted. So it wasn't until we turned on AWS SDK client logging that we could find the real error the RequestTimeout. So you might mention in the docs for Upload object that AbortError could be caused by a RequestTimeout and that enabling logging for errors is a smart thing. I think this tip alone deserves it's own issue so I will add one for SDK logging too.
Links
- Typescript docs for NodeHttpHandlerOptions with empty socketTimeout description https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-smithy-node-http-handler/Interface/NodeHttpHandlerOptions/
- https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/migrating/notable-changes/
- https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/configuring-the-jssdk.html (could include a nice page about the typical settings you will likely want to enable and why)
- Example docs which do not include socketTimeout (just a sampling since most examples exclude it)
- https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/node-configuring-maxsockets.html
- https://github.com/aws/aws-sdk-js-v3/blob/main/supplemental-docs/CLIENTS.md#new-in-v35210
- https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/javascript_s3_code_examples.html
- https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/javascriptv3/example_code/s3/client.js
- https://github.com/awsdocs/aws-doc-sdk-examples/blob/main/javascriptv3/example_code/s3/actions/put-object.js
- https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-lib-storage/
-
socketTimeoutis deprecated as noted in the source code documentation https://github.com/smithy-lang/smithy-typescript/blob/main/packages/types/src/http/httpHandlerInitialization.ts#L25.- that is why it does not appear much in documentation. It was replaced by
requestTimeout, which does the same thing. - the default timeout, regardless, is 0, which means no timeout. If you'd like to report the reproduction conditions of your issue, we may be able to identify something else as the problem.
- that is why it does not appear much in documentation. It was replaced by
-
this was not transferred to the API website because the extractor tool did not understand the comment written after the
@deprecatedtsdoc was the description, and the page itself doesn't highlight the@deprecatedannotation. I've re-ordered the annotation and description so that the description appears in the API docs site, but the other thing is something that needs to be fixed. -
we also need to update https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/migrating/notable-changes/ and https://github.com/aws/aws-sdk-js-v3/blob/main/UPGRADING.md to update references to socketTimeout.
@kuhe Wow, thanks for mentioning this, we will update to requestTimeout. I was wondering why there were so many similar looking settings.
Regarding the issue we encountered, in the afternoon of 12/16/2024, our lambda code which peforms a streaming read from S3, transform records, and streaming write to S3 using the @aws-sdk/lib-storage Upload to do a multi-part upload, started encountering RequestTimeouts in 30-40% of our executions. Looking at the logs, the stream would often make progress for several minutes and then just timeout. The time between the last successful S3 call till the one that failed was about 72s in many cases. In the previous several months, logs only showed it happening maybe 3 times, but now it was happening regularly 1000s of times every hour. This code has been in place for years and worked well even after upgrading to AWS SDK V3 earlier in the year. The problem got persistently worse failing up to 50% of the time on 12/17/2024.
This problem persisted and we involved AWS support team (both lambda and S3 teams). After exhausting all ideas and suggestions, we enabled more logging and were able to zero in on the fact that it was a RequestTimeout that was the root cause.
After realizing the timeout property had changed to socketTimeout (from V2 to V3), we set this to 15m and everything started succeeding properly. So it appears that for whatever reason, there is a default s3 client socketTimeout/requestTimeout that is less than 72s. If this was due to something else like a Node.js timeout then I don't believe setting socketTimeout would have resolved it. I will be happy to discuss further.
Also note that another user also reported requestTimeout issue after they migrated to V3 (and did not have socketTimeout or requestTimeout set). So there appears to be some default timeout when it is not set. See #6762 Adding a value resolved their issue too.
We hit the RequestTimeout while using: @aws-sdk/[email protected] @aws-sdk/[email protected] @smithy/[email protected] in case the code has changed since then
To add context from us:
We ran @aws-sdk/[email protected] => everything was good
After updating to @aws-sdk/[email protected] => the issue started to appear.
Downgrading solves it again.
My bet are the dependency updates in 3.709 but did not dive deeper yet.
It appears only on putObject calls. getObject stlll seem to work.
We don't use explicit Streaming but directly putObject strings.
Our application code (beside dependencies) did not change for over 6 months at that place (that was migrating to V3).
What we also see with that version update are following warnings:
@smithy/node-http-handler:WARN - socket usage at capacity=50 and 3085 additional requests are enqueued. See https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/node-configuring-maxsockets.html or increase socketAcquisitionWarningTimeout=(millis) in the NodeHttpHandler config.
Which rarely appeared before the update but now appear on most uploads every 20 seconds. The notable thing here is, that the number of enqueued requests stays the same until the Lambda functions times out after 15 minutes.
We will also no try to update to requestTimeout and see if that solves the problem (we also still have httpOptions configured).
@chbiel Another thing I learned is that AWS SDK V3 sets a default maxSockets to 50 if you don't set it, so you can increase it so more requests can go in parallel.
const awsOptions = {
...
requestHandler: {
httpsAgent: { keepAlive: true, maxSockets: 5000 },
requestTimeout: 300 * 1000 // 5m
}
}
I also flag keepAlive true just to ensure it was true, default might be fine.
Here's the article which mentions the default is 50 max sockets and how to increase it. https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/node-configuring-maxsockets.html
@jeffbski-rga thanks for the hint, added it :)
all in all using the requestHandler configuration solved our problems.
things I would like to see improved:
- the previously mentioned error message is very confusing in this context. There should be a more explicit error message instead of only showing those two kinds of error messages:
Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.and
@smithy/node-http-handler:WARN - socket usage at capacity=50 and 3085 additional requests are enqueued. See https://docs.aws.amazon.com/sdk-for-javascript/v3/developer-guide/node-configuring-maxsockets.html or increase socketAcquisitionWarningTimeout=(millis) in the NodeHttpHandler config.
But I also cannot tell what error message would be helpful here beside "You use a wrong configuration, please use requestHandler config"...
- The
requestTimeoutconfiguration in combination withkeepAliveseems to behave differently then the previoushttpOptions.timeoutconfiguration. Setting it the the same value as before still leads to errors for bigger payload to put into S3. The only reason I can imagine right now is, that the newrequestTimeoutsets the timeout for the overall connection that is reused to send several payload. In comparisonhttpOptions.timeoutseem to be used for every single request.
Thank you both for your suggestions, we'd have to review the options with the team and consider the option to improve this part of our docs. I'll keep this open and mark it to be reviewed in near future. Again, thank you for providing feedbacks!
Regarding the value of maxSockets:
In this documentation page, we recommend setting maxSockets to a factor of the workload's batch size: https://github.com/aws/aws-sdk-js-v3/blob/main/supplemental-docs/performance/parallel-workloads-node-js.md#example-scenario.
This is something that I believe isn't specific to the AWS SDK, but node:https in general. In my testing there is usually a maxSockets value that is optimal for a given workload size and hardware configuration.
- setting too many sockets increases latency since at some point opening a new socket appears to take more time than waiting for one of the existing sockets to free.
- for example, the optimal socket count for a 7000 item workload seems to be either 350 or 700, instead of 7000 sockets.
It would be great to have this type of real world knowledge applied to the best practices docs and examples with some notes explaining the reasoning (or at least links to other articles explaining the reasons for the settings).
I'm planning to undeprecate the field called socketTimeout for clients using the NodeHttpHandler.
When it was deprecated in https://github.com/aws/aws-sdk-js-v3/pull/4508, the reasoning for doing so was incorrect.
Going forward I intend to update the documentation with the following 3 timeouts:
- connectionTimeout - establishing a connection
- requestTimeout - total time to complete a request
- socketTimeout - idle timeout for an open socket
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.