aws-sdk-java icon indicating copy to clipboard operation
aws-sdk-java copied to clipboard

The security token included in the request is expired

Open arun-rai opened this issue 2 years ago • 8 comments

Describe the bug

Observing one of our container fail to fetch Message from SQS with error 'The security token included in the request is expired'. This had happened with multiple times for brief period with multiple services (SQS+ Dynamo DB). Issue got resolved exactly after 15 minutes, without intervening anything.

AWS java SDK - aws-java-sdk-core-1.12.201 Using InstanceProfileCredentialsProvider = new InstanceProfileCredentialsProvider(true)); Cluster: Kubernetes v1.21.11 IAM component: kube2iam (https://github.com/jtblin/kube2iam)

ERROR log: com.amazonaws.services.sqs.model.AmazonSQSException: The security token included in the request is expired (Service: AmazonSQS; Status Code: 403; Error Code: ExpiredToken; Request ID: 25b9b369-9954-5d2c-8c21-e84006d4ec55; Proxy: null).

Expected Behavior

Issue should not be occurring intermittently. SDK should always be able to fetch token before it get expired.

Current Behavior

On of container fail to fetch Message from SQS with error 'The security token included in the request is expired'. This had happened with multiple times for brief period with multiple se

Reproduction Steps

Intermittently. Not always producible.

Possible Solution

NA

Additional Information/Context

NA

AWS Java SDK version used

aws-java-sdk-core-1.12.201

JDK version used

java version "11.0.7" 2020-04-14

Operating System and version

Kubernetes v1.21.11

arun-rai avatar Jul 14 '22 05:07 arun-rai

@arun-rai did you guys find a solution to this issue? We are also facing exactly same issue on our k8s cluster from few days while using dymanodb.

arunhegde avatar Jul 27 '22 16:07 arunhegde

Thanks @arunhegde for letting us know that it causing issue to multiple clients. We don't get any solution yet.

arun-rai avatar Jul 28 '22 04:07 arun-rai

Any updates here? It's also causing us issues while working with SQS & SES

FerasMaali avatar Jul 30 '22 17:07 FerasMaali

We have opened a AWS support ticket on the same as we are repeatedly seeing the issues on DynamoDB, Kinesis and Cloudwatch. Our case ID is 10196898021. Please let us know if you guys find out any solutions/work arounds to fix

arunhegde avatar Aug 02 '22 14:08 arunhegde

@arunhegde just wanted to checkout about your AWS cased ID 10196898021? Did we get any details why this is happening.

@debora-ito please help to priorities if any work you are planning to execute, as this is occurring regularly. Thanks!

arun-rai avatar Aug 08 '22 06:08 arun-rai

@arun-rai - I think we got some leads. We are still testing it. At the moment it appears to be application issue (the code that is running in the pods).

  1. In our case we had multiple AWS credentials providers other than the default credentials provider. We cleaned them up, made sure we always have a single credentials provider and that is the default credentials provider. Also made sure that all our AWS service calls (dynamodb, cloudwatch, kinesis, x-ray) use the same.

  2. Also after some investigation on AWS Spring Cloud library we have explicitly set the following property cloud.aws.credentials.instance-profile=true which skips checking for all other credentials in the chain and always uses instance profile.

You can set log level on AWSCredentialsProviderChain to debug so you know when it's getting credentials, and from where.

With above two changes we have been constantly monitoring our services and haven't seen the issue in last 4 days which used to happen almost every day otherwise. Still we can't confidently say that the issue is fixed but so far its looking positive.

Let us know how it goes for you

arunhegde avatar Aug 08 '22 07:08 arunhegde

Hey everyone, apologies for the long silence in here. Taking a look at this and at the state of the support cases, will provide an update shortly.

debora-ito avatar Aug 08 '22 17:08 debora-ito

@arun-rai I don't really have suggestions to add to @arunhegde comment.

In a traditional use of InstanceProfileCredentialsProvider, the SDK would automatically refresh the credentials since "eagerlyRefreshCredentialsAsync" is set to true. Using the kube2iam seems to add a complexity to the credential fetching but I'm not familiar with the lib.

You should identify which process is getting the credentials, when and why it's using expired credentials when it shouldn't, with additional logs if needed.

debora-ito avatar Aug 10 '22 00:08 debora-ito

It looks like this issue has not been active for more than five days. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please add a comment to prevent automatic closure, or if the issue is already closed please feel free to reopen it.

github-actions[bot] avatar Aug 15 '22 03:08 github-actions[bot]

Forgot to update. We made one more change probably that’s also the reason to fix it. We upgraded aws SDK version used by kube2iam.

On Mon, 15 Aug 2022 at 9:13 AM, github-actions[bot] < @.***> wrote:

It looks like this issue has not been active for more than five days. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please add a comment to prevent automatic closure, or if the issue is already closed please feel free to reopen it.

— Reply to this email directly, view it on GitHub https://github.com/aws/aws-sdk-java/issues/2802#issuecomment-1214583061, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFMS62C43DDGWAK3UTHQY23VZG4F5ANCNFSM53Q6EODA . You are receiving this because you were mentioned.Message ID: @.***>

arunhegde avatar Aug 15 '22 03:08 arunhegde

@arunhegde - If I understood correctly, you didn't upgrade the kube2iam image. It's just the aws cli or some other aws sdk that you run on the nodes has been upgraded?

Could you share the upgrade steps? And share the previous and current versions?

vickeyrihal1 avatar Aug 19 '22 11:08 vickeyrihal1

  • Updated aws-sdk-go to v1.44.42 from v1.8.7 (kube2iam/go.mod)
  • Updated to go 1.17 from go 1.16 (in kube2iam/go.mod)
  • Update go-iptables to v0.6.0 from v0.1.0 (in kube2iam/go.mod)
  • In kube2iam Dockerfile changed FROM alpine:3.15.5 to FROM scratch
  • Set AWS_METADATA_SERVICE_TIMEOUT: to '3'

These are the main changes i could see... there were many indirect dependency updates with this..

arunhegde avatar Aug 19 '22 11:08 arunhegde

We did very similar actions to the above from @arunhegde and have also seen a resolution to this issue.

slimm609 avatar Aug 23 '22 18:08 slimm609

Good to know @slimm609 !

arunhegde avatar Aug 24 '22 08:08 arunhegde

Looks like this was resolved, closing.

debora-ito avatar Aug 26 '22 18:08 debora-ito

COMMENT VISIBILITY WARNING

Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

github-actions[bot] avatar Aug 26 '22 18:08 github-actions[bot]

@arunhegde Are you using java sdk version > 1.12.188? This seems to be coming from these lines of code: https://github.com/aws/aws-sdk-java/blame/master/aws-java-sdk-core/src/main/java/com/amazonaws/auth/BaseCredentialsFetcher.java#L162-L179

The sdk is extending the age of credentials by 15 minutes incase it was not able get success from IMDS endpoint.

abhisheknsit avatar Sep 05 '22 10:09 abhisheknsit