aws-sdk-java icon indicating copy to clipboard operation
aws-sdk-java copied to clipboard

Consider Throwing Exception when Invalid IAM Role Used for Web Identity

Open pauldthomson opened this issue 3 years ago • 4 comments

Describe the issue

The AWSCredentialsProviderChain swallows exceptions thrown by any providers in the chain and moves on to the next provider. In the case of a Kubernetes Pod on EKS, using IAM Roles for Service Accounts, this means it's possible for an incorrectly specified role (wrong name etc) to not manifest in a meaningful error. This was witnessed when trying to upload an object to an S3 Bucket, the error resulted in an exception about an invalid signature on the request, likely due to invalid credentials being used (I suspect the EC2 provider provided credentials).

It seems logical that since the STSAssumeRoleWithWebIdentitySessionCredentialsProvider is only created when the webIdentityTokenFilePath is set (which comes from the AWS_WEB_IDENTITY_TOKEN_FILE env var), then it could throw an exception if the specified role is failed to be assumed (for whatever reason) as the presence of the above env var indicates an intention to use IRSA.

Steps to Reproduce

  1. Set up an EKS Pod with IRSA as per the normal instructions: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
  2. Refer to a non-existant IAM role in the ServiceAccount spec
  3. Attempt to use any AWS service via the SDK that would require specific role permissions

Current Behavior

Instead of something akin to "the specified role does not exist", we got the following exception:

com.amazonaws.services.s3.model.AmazonS3Exception: Requests specifying Server Side Encryption with AWS KMS managed keys require AWS Signature Version 4.

probably due to using invalid credentials to sign the request.

Your Environment

  • AWS Java SDK version used: 1.11.920
  • JDK version used: openjdk8
  • Operating System and version: Alpine Docker image on EKS

pauldthomson avatar Dec 21 '20 20:12 pauldthomson

Hi @pauldthomson this same issue was discussed in Java SDK v2 (https://github.com/aws/aws-sdk-java-v2/issues/1915) with a slightly different use case. Quoting @dagnir in the v2 thread:

"Unfortunately, it's difficult to infer intent just from the presence or absence of the environment variables or system properties; the situation described where the Web Identity provider failed so the chain moved on to the next one in the precedence list is the intended behavior of the chain."

If the SDK is expected to use STSAssumeRoleWithWebIdentitySessionCredentialsProvider, we suggest you set it explicitly on the S3 client, this way any exception encountered will be thrown by the caller, will not be swallowed.

debora-ito avatar Dec 22 '20 23:12 debora-ito

I understand that it's intended behaviour, but the presence of the environment variable seems to be explicitly tied to the annotation existing on the ServiceAccount (only sets the roleARN if the annotation exists, and only adds the env vars if the roleARN is set), which seems to indicate intent to use an IAM role for the Pod?

Also it seems the Go SDK behaves in the way I'm suggesting, i.e. explicitly using the web identity credentials if the environment variable is set: https://github.com/aws/aws-sdk-go/blob/36b087e6d33e7230b6e9d0bc6aef7177c97e59ef/aws/session/credentials.go#L24-L47

Is there known scenarios where this would not be the case? i.e. that the environment variables would be set but web identity wasn't intended to be used? Cos there's also the skip containers annotation (see step 4) which can be used to not mutate the pod.

Or could the documentation maybe be a little more explicit here? The IRSA docs just say "you must be on at least vX.X.X", but nothing about probably needing to make the use of the STSAssumeRoleWithWebIdentitySessionCredentialsProvider explicit. From the POV of a user, the impression is "I just annotate my ServiceAccount accordingly, make sure I'm using an up-to-date SDK and I'll be using my Pod's IAM role", but that's not necessarily the case.

pauldthomson avatar Dec 23 '20 02:12 pauldthomson

I agree with @pauldthomson in that the intent should be inferred from the environment variable and override the default behaviour.

While I understand this could appear as a breaking change for some users, with the environment able to dictate internal logic of this library, this is very much how modern apps should be able to run.

Also as mentioned by @pauldthomson , I believe the risk is very minimal for this to be seen as a breaking change as there is no reason to a user would set the AWS_WEB_IDENTITY_TOKEN_FILE variable unless they actually wanted to use the AssumeRoleWithWebIdentity functionality.

edify42 avatar Feb 15 '21 22:02 edify42

Changing this to a feature request.

debora-ito avatar May 07 '21 01:05 debora-ito