aws-sdk-go-v2 Unable to generate valid sts PresignedUrl for use with EKS token auth

Describe the bug

The URL that I generate with the sts PresignClient doesn't work when used as an EKS authentication token. I am able to generate a url with the v1 client and it works just fine.

Expected Behavior

I expect the token that I generate with the v2 client to allow me to authenticate to my EKS cluster.

Current Behavior

The token I generate, when used, results in the error:

$ kubectl get nodes
error: You must be logged in to the server (Unauthorized)

Reproduction Steps

Link to code: https://gist.github.com/nickzelei/44f371254eae0e9d00a86fe3f4f0fc48 - Update line 25 with a valid cluster name

Take the token that is printed first and drop that in a kube config. Should get unauthorized error.

Take the second token, and do the same thing.

Possible Solution

No response

Additional Information/Context

Code I'm using to generate the token that I use to drop into a kube config.

https://gist.github.com/nickzelei/44f371254eae0e9d00a86fe3f4f0fc48

AWS Go SDK V2 Module Versions Used

	github.com/aws/aws-sdk-go-v2 v1.17.1
	github.com/aws/aws-sdk-go-v2/config v1.17.10
	github.com/aws/aws-sdk-go-v2/credentials v1.12.23
	github.com/aws/aws-sdk-go-v2/service/acm v1.15.2
	github.com/aws/aws-sdk-go-v2/service/ec2 v1.63.1
	github.com/aws/aws-sdk-go-v2/service/ecr v1.17.18
	github.com/aws/aws-sdk-go-v2/service/eks v1.22.1
	github.com/aws/aws-sdk-go-v2/service/elasticloadbalancingv2 v1.18.22
	github.com/aws/aws-sdk-go-v2/service/iam v1.18.23
	github.com/aws/aws-sdk-go-v2/service/kms v1.18.11
	github.com/aws/aws-sdk-go-v2/service/s3 v1.27.9
	github.com/aws/aws-sdk-go-v2/service/sts v1.17.1
	github.com/aws/smithy-go v1.13.4

Compiler and Version used

go version go1.19.3 darwin/arm64

Operating System and version

macOS Monterey Version 12.6

Nov 13 '22 00:11 nickzelei

Hi @nickzelei

Thanks for opening this issue. Since I'm not an EKS expert by any stretch of the imagination, I'll need a little bit more in-depth repro steps to get to reproduce your situation.

As of now, Im able to use your V2 code to retrieve the token, and a presigned url. With postman Im able to call that url and get a valid response:

<GetCallerIdentityResponse xmlns="https://sts.amazonaws.com/doc/2011-06-15/">
    <GetCallerIdentityResult>
        <Arn>arn:aws:iam::REDACTED:user/Administrator</Arn>
        <UserId>REDACTED</UserId>
        <Account>REDACTED</Account>
    </GetCallerIdentityResult>
    <ResponseMetadata>
        <RequestId>REDACTED</RequestId>
    </ResponseMetadata>
</GetCallerIdentityResponse>

This is the part I'm not sure about; When you run into the following:

$ kubectl get nodes
error: You must be logged in to the server (Unauthorized)

Is this within one of your cluster's EC2 nodes?

Any clarifying info would be extremely helpful.

Thank you very much! Ran~

Nov 28 '22 23:11 RanVaknin

Hey @RanVaknin -

the kubectl get nodes call I was showcasing was just an example of the call to kubernetes failing authentication with the given token. In other words: I am unable to make any requests to kubernetes with the generated token.

If it would make it easier, I can set up another snippet that includes a call to kubernetes in Go that uses the token, but I'd need a bit to get that set up.

However, in the mean time, the generated token can be dropped into a ~/.kube/config and just used with kubectl directly.

Example kube config:

apiVersion: v1
clusters:
  - cluster:
      certificate-authority-data: <REDACTED>
      server: https://<redacted>.gr7.us-west-2.eks.amazonaws.com
    name: arn:aws:eks:us-west-2:<redacted>:cluster/my-cluster
contexts:
  - context:
      cluster: arn:aws:eks:us-west-2:<redacted>:cluster/my-cluster
      user: my-cluster
    name: my-cluster
current-context: my-cluster
kind: Config
preferences: {}
users:
  - name: my-cluster
    user:
      token: k8s-aws-v1.<redacted>

The token generated with the v1 client will allow you to utilize kubectl and be authenticated. Whereas the token generated with the v2 client will return unauthorized errors.

Also, I have been testing with an IAM role that has full access to the cluster (in fact, the role I am using was the role that created the cluster, so it has full permissions as the owner)

Also, FWIW - I get the same response in postman as well. It's the interaction with EKS that is where it fails. Unfortunately, my knowledge ends there, I'm not entirely sure how EKS is configured to utilize that token. All I know is that the generated URL must be incorrect or something as it's not able to finish authentication with the v2 token.

Nov 29 '22 01:11 nickzelei

@RanVaknin I faced exactly the same issue that @nickzelei is referring to, and after doing multiple searches, I figured out the issue.

Refer https://github.com/aws/aws-cli/blob/develop/awscli/customizations/eks/get_token.py#L250

Here, the awscli command aws eks get-token provides an interface to pass the roleArn and sets the RoleSessionName='EKSGetTokenAuth'. So we need a similar Interface to get the signed header with a role Arn.
I tried the following based on the reference of https://github.com/aws/aws-sdk-go-v2/issues/1382#issuecomment-1010464182 and it worked for me by using stscreds.NewAssumeRoleProvider

@nickzelei If you are still looking for a solution, then you can try the following. . This worked in my case

        roleARN := "<RoleARN as defined inside EKS cluster aws-auth configmap of kube-system namespce>"
	roleSessionName := "EKSGetTokenAuth"
        
	stsclient := sts.NewFromConfig(cfg)
        // patch following in your gist https://gist.github.com/nickzelei/44f371254eae0e9d00a86fe3f4f0fc48#file-main-go-L32
	provider := stscreds.NewAssumeRoleProvider(client, "",
		func(o *stscreds.AssumeRoleOptions) {
			o.RoleSessionName = roleSessionName
			o.RoleARN = roleARN
			o.TokenProvider = stscreds.StdinTokenProvider
		})

	cfg.Credentials = aws.NewCredentialsCache(provider)
	presignClient := awssts.NewPresignClient(sts.NewFromConfig(cfg))

Jan 07 '23 03:01 ashutosrath

Hm, that unfortunately still did not work for me. However, I just punched in the role that the existing credentials I have were generated for, so maybe that was the difference.

Ideally the code doesn't have any idea the role it is assuming...it just takes the credentials from the environment and uses it to generate a presign url (with its own credentials) like in my gist above.

Glad you were able to figure something out though. For now, I will stick with the v1 client for generating the k8s tokens, until I can find something that works with the v2 client.

Jan 07 '23 05:01 nickzelei

@nickzelei can you try adding the X-Amz-Expires custom header in your PresignGetCallerIdentity call?

Something like

...
	out, err := presignclient.PresignGetCallerIdentity(ctx, &sts.GetCallerIdentityInput{}, func(opt *sts.PresignOptions) {
		opt.Presigner = newCustomHTTPPresignerV4(opt.Presigner, map[string]string{
			k8sHeader: clusterName,
                         "X-Amz-Expires": "60",
		})
	})
...

Feb 13 '23 16:02 karmingc

@nickzelei can you try adding the X-Amz-Expires custom header in your PresignGetCallerIdentity call?

Something like

...
	out, err := presignclient.PresignGetCallerIdentity(ctx, &sts.GetCallerIdentityInput{}, func(opt *sts.PresignOptions) {
		opt.Presigner = newCustomHTTPPresignerV4(opt.Presigner, map[string]string{
			k8sHeader: clusterName,
                         "X-Amz-Expires": "60",
		})
	})
...

Wow, that worked! Thank you - I can finally clean up my codebase and get rid of the v1 client. Here is a final script that generates a token that actually works with kubectl: https://gist.github.com/nickzelei/338a32de48913cf49ae44ace245eef33

Surprising how much boilerplate is required to generate this. Would love to see this baked into the SDK somehow - or just simply be better documented.

Feb 14 '23 03:02 nickzelei

This is working for me



type STSTokenRetriever struct {
	PresignClient StsPresignClientInteface
}

func NewSTSTokenRetriver(client StsPresignClientInteface) STSTokenRetriever {
	return STSTokenRetriever{PresignClient: client}
}

func (s *STSTokenRetriever) GetToken(ctx context.Context, clusterName string, cfg aws.Config) string {
	out, err := s.PresignClient.PresignGetCallerIdentity(ctx, &sts.GetCallerIdentityInput{}, func(opt *sts.PresignOptions) {
		opt.Presigner = newCustomHTTPPresignerV4(opt.Presigner, map[string]string{
			k8sHeader:       clusterName,
			"X-Amz-Expires": "60",
		})
	})
	if err != nil {
		panic(err)
	}
	token := fmt.Sprintf("%s%s", tokenPrefix, base64.RawURLEncoding.EncodeToString([]byte(out.URL))) //RawURLEncoding
	return token

}

type customHTTPPresignerV4 struct {
	client  sts.HTTPPresignerV4
	headers map[string]string
}

func newCustomHTTPPresignerV4(client sts.HTTPPresignerV4, headers map[string]string) sts.HTTPPresignerV4 {
	return &customHTTPPresignerV4{
		client:  client,
		headers: headers,
	}
}

func (p *customHTTPPresignerV4) PresignHTTP(
	ctx context.Context, credentials aws.Credentials, r *http.Request,
	payloadHash string, service string, region string, signingTime time.Time,
	optFns ...func(*v4.SignerOptions),
) (url string, signedHeader http.Header, err error) {
	for key, val := range p.headers {
		r.Header.Add(key, val)
	}
	return p.client.PresignHTTP(ctx, credentials, r, payloadHash, service, region, signingTime, optFns...)
}

and in calling function or main

stcClient := sts.NewFromConfig(cfg)
preSignClient := sts.NewPresignClient(stcClient)
tokenRetriver := NewSTSTokenRetriver(preSignClient)
token := tokenRetriver.GetToken(ctx, clusterName, cfg)

Feb 16 '23 07:02 SrikanthBhandary

Tested out these solutions and found that I had to add r.Header.Del("amz-sdk-request") to the custom presigner for it to work.

Jan 30 '24 01:01 Monkeyanator

Tested out these solutions and found that I had to add r.Header.Del("amz-sdk-request") to the custom presigner for it to work.

This was a regression in [email protected] where a middleware was adding the amz-sdk-request header to all requests. You no longer need to explicitly delete that header. We ran into a similar issue in eksctl.

Jun 03 '24 08:06 cPu1

I don't see conclusive evidence of a defect and this issue is ancient at this point so I'm going to close. If you're affected by this, please open a new issue.

Jun 03 '24 13:06 lucix-aws

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

Jun 03 '24 13:06 github-actions[bot]