aws-sdk-go
aws-sdk-go copied to clipboard
Retry disabled on "read: connection reset"
Confirm by changing [ ] to [x] below to ensure that it's a bug:
- [X] I've gone though Developer Guide and API reference
- [X] I've checked AWS Forums and StackOverflow for answers
- [X] I've searched for previous similar issues and didn't find any solution
Describe the bug
Retry strategy does not retry when remote host causes "read: connection reset".
Version of AWS SDK for Go? Example: v1.20.2 .. v1.38.65 inclusive
Version of Go (go version
)?
1.15+, but the version of Go isn't really relevant here
To Reproduce (observed behavior)
- Use any AWS service client within the SDK with
MaxRetry
> 0 - Get a TCP connection reset from the remote service
- It doesn't retry. Instead the SDK client immediately gives up with an error like:
RequestError: send request failed\ncaused by: Post \"https://<service>.<region>.amazonaws.com/records\": read tcp 169.254.76.1:35798->52.94.227.177:443: read: connection reset by peer
Expected behavior
It should retry.
Additional context
- The commit that disables the retry for "read: connection reset" is c3d27102, which references S3 multipart upload.
- This makes me think that this change might have been put in to solve an issue with S3 multipart upload while breaking the more general behavior with other services.
- In general resets often happen when remote services are not gracefully started up/added to load balancer or shut down (removed from load balancer) and I don't understand why this wouldn't be a good opportunity to retry.
This issue has been raised before in several GitHub issues, so for simplicity I will link the corresponding reasoning on why the SDK does not retry the read: connection reset by peer
errors. See here.
If in the context of your application the operation in question is idempotent and safe to retry then you may wish to implement a custom retryer that allows retrying for this specific error condition. See the aws/request#Retryer) interface.
This issue has not received a response in 1 week. If you want to keep this issue open, please just leave a comment below and auto-close will be canceled.
Thanks Sean. I appreciate the answer (and incidentally sorry for putting you through another iteration of the same bug report).
If I could suggest one action on the AWS side that might reduce the incidence of these bug reports and help other devs understand the idempotency point you're making, it would be: Could you add a comment in the code, with a one-liner explanation of which read: connection reset by peer
isn't retried, and put a slightly deeper explanation (maybe with reference to this ticket and others) into the commit message? Having that would have helped me take other steps without bothering you with a ticket.
sounds like a good idea @vcschapp , Would you like to try making a PR with the changes or would you rather we handle it?
My preference would be for you to handle it.
Has this been fixed?
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.