aws-sdk-java-v2 icon indicating copy to clipboard operation
aws-sdk-java-v2 copied to clipboard

Credential expired during retry

Open f400810-freddiemac opened this issue 3 years ago • 5 comments

Describe the bug

In RetryableStage execute method, the "AwsCredentails" does not attempt to renew if it has expired. Therefore, if a method called with the existing credential is expiring soon, the number of retry is less than intended due to the expiration of the credential.

Expected Behavior

For retry with EqualJitterBackoffStrategy, expect an expired credential will be renew during retry.

Current Behavior

If a request (in our case S3Client.getObject) failed with s retryable Exception and the credential expired between two retry, we got a S3Exception before the retry limit reached.

software.amazon.awssdk.services.s3.model.S3Exception: The provided token has expired. (Service: S3, Status Code: 400, Request ID: 3YWKVBNJPNTXPJX2, Extended Request ID: GkR56xA0r/Ek7zqQdB2ZdP3wqMMhf49HH7hc5N2TAIu47J3HEk6yvSgVNbX7ADuHDy/Irhr2rPQ=)
        at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleErrorResponse(CombinedResponseHandler.java:123)
        at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handleResponse(CombinedResponseHandler.java:79)
        at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:59)
        at software.amazon.awssdk.core.internal.http.CombinedResponseHandler.handle(CombinedResponseHandler.java:40)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:64)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:34)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
        at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:135)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:161)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$0(BaseSyncClientHandler.java:84)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:169)
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:62)
        at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:52)
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:62)
        at software.amazon.awssdk.services.s3.DefaultS3Client.getObject(DefaultS3Client.java:4371)
        at com.freddiemac.fe.distributed.computing.grid.aws.S3Bucket.getObject(S3Bucket.java:131)
        at com.freddiemac.fe.distributed.computing.grid.aws.S3Bucket.getTaskOutput(S3Bucket.java:112)
        at com.freddiemac.fe.distributed.computing.grid.aws.Job$Ready.readTask(Job.java:70)
        at com.freddiemac.fe.distributed.computing.grid.aws.Job.readTask(Job.java:257)
        at com.freddiemac.fe.distributed.computing.grid.aws.AwsProcessor.lambda$completeResponse$3(AwsProcessor.java:135)
        at com.freddiemac.fe.distributed.computing.grid.api.CompletionHandler.handle(CompletionHandler.java:223)
        at com.freddiemac.fe.distributed.computing.grid.api.CompletionHandler.lambda$newTaskHandler$18(CompletionHandler.java:213)
        at com.freddiemac.fe.distributed.computing.grid.api.GridClient.lambda$convertToBiFunction$2(GridClient.java:169)
        at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:822)
        at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:797)
        at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:443)
        at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)
        at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)
        at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)
        at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)

Reproduction Steps

To consistently reproduce the issue, we create our own implementation of ResponseTransformer that always throw RetryableException. And setup retry policy that will retry pass the credential expired (In our case, the credential has an hour of life. So we setup retry policy go over an hour). Then we call S3Client.getObject with our ResponseTransformer implementation. In stead of failed with reaching the retry limit, we got the S3Exception with the provided token has expired.

Possible Solution

For every retry, the request may call AwsCredentialsProvider resolveCredentails to ensure the freshness of the credential

Additional Information/Context

No response

AWS Java SDK version used

2.16.104

JDK version used

1.8.0_181

Operating System and version

Redhat 7.9

f400810-freddiemac avatar Sep 06 '22 21:09 f400810-freddiemac

Hello @f400810-freddiemac ,

Thank you very much for your submission. Could you please provide your credentials configuration? What credential provider are you using while experiencing this behavior?

Best,

Yasmine

yasminetalby avatar Sep 13 '22 05:09 yasminetalby

Hi @yasminetalby,

We are using StsAssumeRoleWithSamlCredentialsProvider with Ping Identity as the third party to provide the token. The Ping Identity's token expires in an hour.

Basically, build the StsAssumeRoleWithSamlCredentialsProvider with: StsAssumeRoleWithSamlCredentialsProvider.builder().stsClient(stsClient).refreshRequest(assumeRoleWithSamlRequestSupplier).build();

where build stsClient with awsStsRegionEndpoint in vpc endpoint format (https://[vpceid].sts.[region].vpce.amazonaws.com) and sdkHttpClientSupploer.get() will return a new UrlConnectionHttpClient as: StsClient.builder().region(region).httpClient(sdkHttpClientSupplier.get()).credentialsProvider(AnonymousCredentialsProvider.create()).endpointOverride(awsStsRegionEndpoint).build()

and assumeRoleWithSamlRequestSupplier is a Supplier<AssumeRoleWithSamlRequest> which every get() call will retrieve a new Ping Identity's token.

Thanks f400810-freddiemac

f400810-freddiemac avatar Sep 13 '22 14:09 f400810-freddiemac

Hello @f400810-freddiemac ,

Thank you very much for providing this information. The behavior you are experiencing is due to the current approach of the SDK to resolve credentials. In the specific case you describe, this process creates limitation on the retry attempts. We have added this item to our current backlog.

Thank you very much for your feedback and submission! I will post an update here once this has been resolved.

Sincerely,

Yasmine

yasminetalby avatar Sep 16 '22 23:09 yasminetalby

to confirm them:

  1. on a retry aws credentials are not resolved again
  2. the error returned by s3 doesn't include a specific error type we can look for in our own code and retry on, just the text "software.amazon.awssdk.services.s3.model.S3Exception: The provided token has expired. (Service: S3, Status Code: 400)

#2 I can cope with as the s3a connector has effectively given up on aws retries with the v2 move, too problematic as it retries on things like UnknownHostExceptions. But our own error handling needs to know what sdk failures are recoverable, and we assume that 400 isn't.

Is there a specific, stable errorDetail we could use for this?

Created HADOOP-18990. S3A: retry on credential expiry

steveloughran avatar Nov 25 '23 13:11 steveloughran