aws-sdk-java-v2
aws-sdk-java-v2 copied to clipboard
Credential expired during retry
Describe the bug
In RetryableStage execute method, the "AwsCredentails" does not attempt to renew if it has expired. Therefore, if a method called with the existing credential is expiring soon, the number of retry is less than intended due to the expiration of the credential.
Expected Behavior
For retry with EqualJitterBackoffStrategy, expect an expired credential will be renew during retry.
Current Behavior
If a request (in our case S3Client.getObject) failed with s retryable Exception and the credential expired between two retry, we got a S3Exception before the retry limit reached.
software.amazon.awssdk.services.s3.model.S3Exception: The provided token has expired. (Service: S3, Status Code: 400, Request ID: 3YWKVBNJPNTXPJX2, Extended Request ID: GkR56xA0r/Ek7zqQdB2ZdP3wqMMhf49HH7hc5N2TAIu47J3HEk6yvSgVNbX7ADuHDy/Irhr2rPQ=)
at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:123)
at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:79)
at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:59)
at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:40)
at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:64)
at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:34)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:135)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:161)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$0(BaseSyncClientHandler.java:84)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:169)
at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:62)
at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:52)
at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:62)
at software.amazon.awssdk.services.s3.DefaultS3Client.getObject(DefaultS3Client.java:4371)
at com.freddiemac.fe.distributed.computing.grid.aws.S3Bucket.getObject(S3Bucket.java:131)
at com.freddiemac.fe.distributed.computing.grid.aws.S3Bucket.getTaskOutput(S3Bucket.java:112)
at com.freddiemac.fe.distributed.computing.grid.aws.Job$Ready.readTask(Job.java:70)
at com.freddiemac.fe.distributed.computing.grid.aws.Job.readTask(Job.java:257)
at com.freddiemac.fe.distributed.computing.grid.aws.AwsProcessor.lambda$completeResponse$3(AwsProcessor.java:135)
at com.freddiemac.fe.distributed.computing.grid.api.CompletionHandler.handle(CompletionHandler.java:223)
at com.freddiemac.fe.distributed.computing.grid.api.CompletionHandler.lambda$newTaskHandler$18(CompletionHandler.java:213)
at com.freddiemac.fe.distributed.computing.grid.api.GridClient.lambda$convertToBiFunction$2(GridClient.java:169)
at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:443)
at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)
Reproduction Steps
To consistently reproduce the issue, we create our own implementation of ResponseTransformer that always throw RetryableException. And setup retry policy that will retry pass the credential expired (In our case, the credential has an hour of life. So we setup retry policy go over an hour). Then we call S3Client.getObject with our ResponseTransformer implementation. In stead of failed with reaching the retry limit, we got the S3Exception with the provided token has expired.
Possible Solution
For every retry, the request may call AwsCredentialsProvider resolveCredentails to ensure the freshness of the credential
Additional Information/Context
No response
AWS Java SDK version used
2.16.104
JDK version used
1.8.0_181
Operating System and version
Redhat 7.9
Hello @f400810-freddiemac ,
Thank you very much for your submission. Could you please provide your credentials configuration? What credential provider are you using while experiencing this behavior?
Best,
Yasmine
Hi @yasminetalby,
We are using StsAssumeRoleWithSamlCredentialsProvider with Ping Identity as the third party to provide the token. The Ping Identity's token expires in an hour.
Basically, build the StsAssumeRoleWithSamlCredentialsProvider with:
StsAssumeRoleWithSamlCredentialsProvider.builder().stsClient(stsClient).refreshRequest(assumeRoleWithSamlRequestSupplier).build();
where build stsClient with awsStsRegionEndpoint in vpc endpoint format (https://[vpceid].sts.[region].vpce.amazonaws.com) and sdkHttpClientSupploer.get() will return a new UrlConnectionHttpClient as:
StsClient.builder().region(region).httpClient(sdkHttpClientSupplier.get()).credentialsProvider(AnonymousCredentialsProvider.create()).endpointOverride(awsStsRegionEndpoint).build()
and assumeRoleWithSamlRequestSupplier is a Supplier<AssumeRoleWithSamlRequest> which every get() call will retrieve a new Ping Identity's token.
Thanks f400810-freddiemac
Hello @f400810-freddiemac ,
Thank you very much for providing this information. The behavior you are experiencing is due to the current approach of the SDK to resolve credentials. In the specific case you describe, this process creates limitation on the retry attempts. We have added this item to our current backlog.
Thank you very much for your feedback and submission! I will post an update here once this has been resolved.
Sincerely,
Yasmine
to confirm them:
- on a retry aws credentials are not resolved again
- the error returned by s3 doesn't include a specific error type we can look for in our own code and retry on, just the text "software.amazon.awssdk.services.s3.model.S3Exception: The provided token has expired. (Service: S3, Status Code: 400)
#2 I can cope with as the s3a connector has effectively given up on aws retries with the v2 move, too problematic as it retries on things like UnknownHostExceptions. But our own error handling needs to know what sdk failures are recoverable, and we assume that 400 isn't.
Is there a specific, stable errorDetail we could use for this?