aws-sdk-go icon indicating copy to clipboard operation
aws-sdk-go copied to clipboard

Retry disabled on "read: connection reset"

Open vcschapp opened this issue 3 years ago • 6 comments

Confirm by changing [ ] to [x] below to ensure that it's a bug:

Describe the bug

Retry strategy does not retry when remote host causes "read: connection reset".

Version of AWS SDK for Go? Example: v1.20.2 .. v1.38.65 inclusive

Version of Go (go version)? 1.15+, but the version of Go isn't really relevant here

To Reproduce (observed behavior)

  1. Use any AWS service client within the SDK with MaxRetry > 0
  2. Get a TCP connection reset from the remote service
  3. It doesn't retry. Instead the SDK client immediately gives up with an error like: RequestError: send request failed\ncaused by: Post \"https://<service>.<region>.amazonaws.com/records\": read tcp 169.254.76.1:35798->52.94.227.177:443: read: connection reset by peer

Expected behavior

It should retry.

Additional context

  • The commit that disables the retry for "read: connection reset" is c3d27102, which references S3 multipart upload.
  • This makes me think that this change might have been put in to solve an issue with S3 multipart upload while breaking the more general behavior with other services.
  • In general resets often happen when remote services are not gracefully started up/added to load balancer or shut down (removed from load balancer) and I don't understand why this wouldn't be a good opportunity to retry.

vcschapp avatar Jun 22 '21 16:06 vcschapp

This issue has been raised before in several GitHub issues, so for simplicity I will link the corresponding reasoning on why the SDK does not retry the read: connection reset by peer errors. See here.

If in the context of your application the operation in question is idempotent and safe to retry then you may wish to implement a custom retryer that allows retrying for this specific error condition. See the aws/request#Retryer) interface.

skmcgrail avatar Jun 24 '21 20:06 skmcgrail

This issue has not received a response in 1 week. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.

github-actions[bot] avatar Jul 02 '21 00:07 github-actions[bot]

Thanks Sean. I appreciate the answer (and incidentally sorry for putting you through another iteration of the same bug report).

If I could suggest one action on the AWS side that might reduce the incidence of these bug reports and help other devs understand the idempotency point you're making, it would be: Could you add a comment in the code, with a one-liner explanation of which read: connection reset by peer isn't retried, and put a slightly deeper explanation (maybe with reference to this ticket and others) into the commit message? Having that would have helped me take other steps without bothering you with a ticket.

vcschapp avatar Jul 02 '21 15:07 vcschapp

sounds like a good idea @vcschapp , Would you like to try making a PR with the changes or would you rather we handle it?

KaibaLopez avatar Jul 06 '21 22:07 KaibaLopez

My preference would be for you to handle it.

vcschapp avatar Jul 06 '21 23:07 vcschapp

Has this been fixed?

Emil-G avatar Jan 10 '23 10:01 Emil-G

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Apr 01 '24 21:04 github-actions[bot]