amazon-eks-pod-identity-webhook icon indicating copy to clipboard operation
amazon-eks-pod-identity-webhook copied to clipboard

Annotations eks.amazonaws.com/skip-containers and eks.amazonaws.com/sts-regional-endpoints aren't working

Open shardulsrivastava opened this issue 4 years ago • 9 comments

What happened:

I am using EKs version 1.21 and trying to use IRSA, for that m setting these annotations :

eks.amazonaws.com/sts-regional-endpoints: "true
eks.amazonaws.com/skip-containers: sidecar-busybox-container

however, the container is still getting injected with environment variables for container sidecar-busybox-container and I don't see the use of STS regional endpoints with the pod.

What you expected to happen:

as per the docs, here it should have skipped mutating sidecar-busybox-container container and add the environment variable AWS_STS_REGIONAL_ENDPOINTS to use regional STS endpoints.

How to reproduce it (as minimally and precisely as possible):

  1. Create a cluster with the below YAML
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: iam-cluster
  region: us-east-1
  version: "1.21"

availabilityZones: 
  - us-east-1a
  - us-east-1b
  - us-east-1c

iam:
  withOIDC: true
  serviceAccounts:
  - metadata:
      name: s3-reader
    attachPolicyARNs:
    - "arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"

managedNodeGroups:
  - name: managed-ng-1
    instanceType: t3a.medium
    minSize: 1
    maxSize: 4
    desiredCapacity: 
  1. Annotate the service account s3-reader with these annotations :
kubectl annotate \
  sa s3-reader \
    "eks.amazonaws.com/audience=sts.amazonaws.com" \
    "eks.amazonaws.com/sts-regional-endpoints=true" \
    "eks.amazonaws.com/token-expiration=43200" \
    "eks.amazonaws.com/skip-containers=sidecar-busybox-container"
  1. Create a pod with 2 containers :
apiVersion: v1
kind: Pod
metadata:
  name: iam-test
spec:
  serviceAccountName: s3-reader
  restartPolicy: Never
  containers:
  - name: iam-test
    image: amazon/aws-cli
    args: [ "sts", "get-caller-identity" ]

  - name: sidecar-busybox-container
    image: radial/busyboxplus:curl

once the pod is created, check the environment variables for the containers :

kubectl get pods iam-test -ojson|jq -r '.spec.containers[].env)'

Anything else we need to know?:

Environment: EKS v1.21

  • AWS Region: us-east-1
  • EKS Platform version (if using EKS, run aws eks describe-cluster --name <name> --query cluster.platformVersion): "eks.2"
  • Kubernetes version (if using EKS, run aws eks describe-cluster --name <name> --query cluster.version): 1.21
  • Webhook Version: Not sure how to get it from the cluster.

shardulsrivastava avatar Aug 25 '21 09:08 shardulsrivastava

For anyone looking for a solution.

eks.amazonaws.com/skip-containers only works on pods and eks.amazonaws.com/sts-regional-endpoints isn't working on EKS1.21 due to an issue that was fixed recently.

shardulsrivastava avatar Aug 29 '21 10:08 shardulsrivastava

eks.amazonaws.com/sts-regional-endpoints isn't working on our EKS clusters with version 1.21

oba11 avatar Sep 16 '21 12:09 oba11

@oba11 is this working for you now? Trying to confirm if https://github.com/aws/amazon-eks-pod-identity-webhook/issues/110 was included in the "Platform" release of eks.3 under 1.21

jukie avatar Dec 08 '21 06:12 jukie

There was an outage in AWS us-east-1 region yesterday, and we discovered that "eks.3 + k8s 1.21" version of the platform (latest) doesn't have this fix included. Adding a eks.amazonaws.com/sts-regional-endpoints annotation doesn't work. We were hitting STS endpoint in us-east-1 region, which resulted in the pods running with EKS Node role instead of IAM role that was passed to them via ServiceAccount annotation.

vgrigoruk avatar Dec 08 '21 20:12 vgrigoruk

same does not work for me as well

kubernetes version : 1.21 platform version: eks.2

Can we please prioritize on fixing this

bseenu avatar Dec 08 '21 20:12 bseenu

This should be fixed in EKS 1.21 eks.3, @vgrigoruk could you share your serviceaccount and pod specs if possible (with arn, account id etc redacted)?

Here are my specs and test for reference, I see AWS_STS_REGIONAL_ENDPOINTS on my Pod as expected:

$ kubectl version
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-06eac09", GitCommit:"5f6d83fe4cb7febb5f4f4e39b3b2b64ebbbe3e97", GitTreeState:"clean", BuildDate:"2021-09-13T14:20:15Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

$ eksctl create iamserviceaccount \
            --name matthew \
            --namespace default \
            --cluster my-cluster \
            --attach-policy-arn arn:aws:iam::aws:policy/IAMReadOnlyAccess \
            --approve \
            --override-existing-serviceaccounts

$ kubectl annotate serviceaccount -n default matthew eks.amazonaws.com/sts-regional-endpoints=true --overwrite && k delete po pause && k create -f kubernetes/pod-matthew.yaml && k get po pause -o yaml | grep AWS
serviceaccount/matthew annotated
pod "pause" deleted
pod/pause created
    - name: AWS_STS_REGIONAL_ENDPOINTS
    - name: AWS_DEFAULT_REGION
    - name: AWS_REGION
    - name: AWS_ROLE_ARN
    - name: AWS_WEB_IDENTITY_TOKEN_FILE

~ $ k get sa matthew -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::x:role/eksctl-my-cluster-addon-iam-Role1-1X9GRP4HWB56F
    eks.amazonaws.com/sts-regional-endpoints: "true"
  creationTimestamp: "2021-12-15T21:46:18Z"
  labels:
    app.kubernetes.io/managed-by: eksctl
  name: matthew
  namespace: default
  resourceVersion: "4491"
  uid: 448b4fc8-5f29-4503-852d-3e18125624df
secrets:
- name: matthew-token-lgpvj

~ $ cat kubernetes/pod-matthew.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pause
spec:
  containers:
    - name: pause
      image: k8s.gcr.io/pause
  serviceAccount: matthew

~ $ k get po pause -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2021-12-17T00:40:42Z"
  name: pause
  namespace: default
  resourceVersion: "215892"
  uid: 40f639a8-7c0f-4670-b4ea-e443dc91167d
spec:
  containers:
  - env:
    - name: AWS_STS_REGIONAL_ENDPOINTS
      value: regional
    - name: AWS_DEFAULT_REGION
      value: us-west-2
    - name: AWS_REGION
      value: us-west-2
    - name: AWS_ROLE_ARN
      value: arn:aws:iam::x:role/eksctl-my-cluster-addon-iam-Role1-1X9GRP4HWB56F
    - name: AWS_WEB_IDENTITY_TOKEN_FILE
      value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    image: k8s.gcr.io/pause
    imagePullPolicy: Always
    name: pause
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-8gbqg
      readOnly: true
    - mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount
      name: aws-iam-token
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-192-168-101-177.us-west-2.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: matthew
  serviceAccountName: matthew
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: aws-iam-token
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          audience: sts.amazonaws.com
          expirationSeconds: 86400
          path: token
  - name: kube-api-access-8gbqg
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2021-12-17T00:40:42Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2021-12-17T00:40:44Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2021-12-17T00:40:44Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2021-12-17T00:40:42Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://aab8cd024e1de7051116d4c5dd287dcd81c71b8deee6a521bcb0f062182d5d2e
    image: k8s.gcr.io/pause:latest
    imageID: docker-pullable://k8s.gcr.io/pause@sha256:a78c2d6208eff9b672de43f880093100050983047b7b0afe0217d3656e1b0d5f
    lastState: {}
    name: pause
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2021-12-17T00:40:43Z"
  hostIP: 192.168.101.177
  phase: Running
  podIP: 192.168.101.209
  podIPs:
  - ip: 192.168.101.209
  qosClass: BestEffort
  startTime: "2021-12-17T00:40:42Z"

wongma7 avatar Dec 17 '21 00:12 wongma7

I'm running on eks.4 and although the environment variables are now set correctly the endpoint it tries to use is still us-east-1. I've got the AWS region/default environment variables set to eu-west-2, but it call the us-east-1 regional endpoint

barrydobson avatar Dec 24 '21 16:12 barrydobson

@barrydobson I'd like to know a bit more to debug the issue you're facing. How are you verifying that the regional endpoint is not used? Are you using CloudTrail events ? You could use the CloudTrail EventHistory and look for EventName AssumeRoleWithWebIdentity . In the event record, you'll find clientProvidedHostHeader . Does that value contain us-east-1 ?

jyotimahapatra avatar Jan 20 '22 20:01 jyotimahapatra

Any update on when this will be fixed? I have pods that are annotated correctly to use regional endpoint and i still get issues intermittently where pod falls back to EKS node instance profile. Most of the time when exception is raised the logs show it tries to use global endpoint and us-east-1

sqlaide avatar Apr 21 '22 19:04 sqlaide