amazon-eks-pod-identity-webhook Injection sometimes fails on pods created from deployments

What happened:

Volume and environment config was not injected into pods created from a deployment.

What you expected to happen:

The config to be injected, or some debug information about why the injection didn't happen.

How to reproduce it (as minimally and precisely as possible):

Creating a couple of pods with a direct pod and a deployment:

(๑•ᴗ•)⊃━~/g/irsa-reproducer━☆ﾟ cat reproducer.yaml
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  namespace: test-app
spec:
  serviceAccount: test-app
  serviceAccountName: test-app
  containers:
  - name: hello
    image: busybox
    command: ['sh', '-c', 'sleep 3600']
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: test-app-token-njbrl
      readOnly: true
  volumes:
  - name: test-app-token-njbrl
    secret:
      defaultMode: 420
      secretName: test-app-token-njbrl
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-deploy
  namespace: test-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-deploy
  template:
    metadata:
      labels:
        app: test-deploy
    spec:
      serviceAccount: test-app
      serviceAccountName: test-app
      containers:
      - name: hello
        image: busybox
        command: ['sh', '-c', 'sleep 3600']
        volumeMounts:
        - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
          name: test-app-token-njbrl
          readOnly: true
      volumes:
      - name: test-app-token-njbrl
        secret:
          defaultMode: 420
          secretName: test-app-token-njbrl

And applying together:

(๑•ᴗ•)⊃━~/g/irsa-reproducer━☆ﾟ kubectl apply -f reproducer.yaml

Results in the directly created pod receiving the injected config, but sometimes not the pod created by the deployment:

(๑•ᴗ•)⊃━~/g/irsa-reproducer━☆ﾟ kubectl get pods -n test-app -o jsonpath='{range .items[*]}{@.metadata.name}{" (serviceAccount: "}{@.spec.serviceAccount}{") "}{@.spec.volumes[*].name}{"\n"}{end}'
test-deploy-6794999d9-cqkwm (serviceAccount: test-app) test-app-token-njbrl
test-pod (serviceAccount: test-app) aws-iam-token test-app-token-njbrl

Anything else we need to know?:

This happens much more regularity (almost guaranteed) on an EKS cluster in one of our AWS accounts, but not in other accounts.

Feels somewhat similar to https://github.com/godaddy/kubernetes-external-secrets/issues/419

Environment:

AWS Region: eu-west-1
EKS Platform version: eks.2
Kubernetes version: 1.17
Webhook Version:

Oct 16 '20 09:10 kragniz

@kragniz Did you see specific errors in kubectl get events when this appears ? Are you working in EKS ? normally you do not need to specify the volumeMounts in EKS: example

Oct 23 '20 08:10 allamand

@allamand volumeMounts doesn't need to be added, but I included it to be explicit about which mounts were being added.

Here's another instance (but without the explicit volumes for the service token):

apiVersion: v1
kind: Pod
metadata:
 name: test-pod
 namespace: test-app
spec:
 serviceAccount: test-app
 serviceAccountName: test-app
 containers:
 - name: hello
   image: busybox
   command: ['sh', '-c', 'sleep 3600']
---
apiVersion: apps/v1
kind: Deployment
metadata:
 name: test-deploy
 namespace: test-app
spec:
 replicas: 1
 selector:
   matchLabels:
     app: test-deploy
 template:
   metadata:
     labels:
       app: test-deploy
   spec:
     serviceAccount: test-app
     serviceAccountName: test-app
     containers:
     - name: hello
       image: busybox
       command: ['sh', '-c', 'sleep 3600']

When watching kubectl get events -A -w while applying:

test-app    0s          Normal    Scheduled           pod/test-pod                                             Successfully assigned test-app/test-pod to ip-172-31-3-158.eu-west-1.compute.internal
test-app    0s          Normal    ScalingReplicaSet   deployment/test-deploy                                   Scaled up replica set test-deploy-fdfc554f to 1
test-app    0s          Normal    SuccessfulCreate    replicaset/test-deploy-fdfc554f                          Created pod: test-deploy-fdfc554f-sd7nb
test-app    0s          Normal    Scheduled           pod/test-deploy-fdfc554f-sd7nb                           Successfully assigned test-app/test-deploy-fdfc554f-sd7nb to ip-172-31-18-62.eu-west-1.compute.internal
test-app    0s          Normal    Pulling             pod/test-pod                                             Pulling image "busybox"
test-app    0s          Normal    Pulled              pod/test-pod                                             Successfully pulled image "busybox"
test-app    0s          Normal    Created             pod/test-pod                                             Created container hello
test-app    0s          Normal    Started             pod/test-pod                                             Started container hello
test-app    1s          Normal    Pulling             pod/test-deploy-fdfc554f-sd7nb                           Pulling image "busybox"
test-app    0s          Normal    Pulled              pod/test-deploy-fdfc554f-sd7nb                           Successfully pulled image "busybox"
test-app    0s          Normal    Created             pod/test-deploy-fdfc554f-sd7nb                           Created container hello
test-app    0s          Normal    Started             pod/test-deploy-fdfc554f-sd7nb                           Started container hello

In this instance, config failed to be injected into either of the pods:

(๑•ᴗ•)⊃━~/g/irsa-reproducer━☆ﾟ kubectl get pods -n test-app -o jsonpath='{range .items[*]}{@.metadata.name}{" (serviceAccount: "}{@.spec.serviceAccount}{") "}{@.spec.volumes[*].name}{"\n"}{end}'
test-deploy-fdfc554f-sd7nb (serviceAccount: test-app) test-app-token-njbrl
test-pod (serviceAccount: test-app) test-app-token-njbrl

Is there no way to obtain logs from the webhook that's deployed with EKS clusters?

Nov 02 '20 18:11 kragniz

If you have access to audit logs and you can find the Pod creation event, it should have a mutation decision. Can you confirm if the webhook was called and mutated was false? https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#mutating-webhook-auditing-annotations

Nov 02 '20 22:11 wongma7

We currently have the same issue. I just activated the audit logs at the eks cluster control plane but i cant see any event which is related to the webhook. Do you have an example how such a event look like ?

Nov 11 '20 15:11 derbauer97

Hi, we've been seeing the issue described above. Is there a simple set of steps we can follow to help have this issue debugged?

Dec 17 '20 16:12 sitsofe

It was some kind of API Problem on the master nodes. We fixed it by updating the EKS Control Plane to the next Version. The Support recommended to open a support ticket that they can restart the Kubernetes API, because you do no have access to the api logs of Mutating Webhooks.

Dec 17 '20 16:12 derbauer97

We currently have the same issue using EKS 1.16, although is not only isolated to pods created by deployments but also individual pods. It happens randomly. I have checked audit logs and we have mutated: false

{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"","mutation.webhook.admission.k8s.io/round_0_index_0":"{\"configuration\":\"istio-sidecar-injector\",\"webhook\":\"sidecar-injector.istio.io\",\"mutated\":false}","mutation.webhook.admission.k8s.io/round_0_index_1":"{\"configuration\":\"pod-identity-webhook\",\"webhook\":\"iam-for-pods.amazonaws.com\",\"mutated\":false}","mutation.webhook.admission.k8s.io/round_0_index_3":"{\"configuration\":\"vpc-resource-mutating-webhook\",\"webhook\":\"mpod.vpc.k8s.aws\",\"mutated\":false}"

In kubelet, we often find this error for the containers that fail to project the token: 579 kuberuntime_manager.go:935] getPodContainerStatuses for pod "test-container_default(e494a824-8028-4775-a966-6153ed26dd6a)" failed: rpc error: code = Unknown desc = Error: No such container: e2c69b28df377c75fbe0c46d506da495e754046285f27df47095b79d56e0d2e5

Any idea how I can continue to troubleshoot this? Is there any way to get direct logs from the webhook, as in EKS setups the pod is not accessible, well at least I couldn't find it but I have the mutator kubectl get mutatingwebhookconfigurations -o wide NAME CREATED AT istio-sidecar-injector 2020-11-11T15:30:39Z pod-identity-webhook 2020-06-12T16:09:24Z

Any help would be greatly appreciated

Jan 21 '21 17:01 alexganwd

I can confirm that I run into this as well. I've only seen it on deployments. I haven't been able to check the logs to try and troubleshoot yet.

Sep 21 '23 23:09 ekristen

amazon-eks-pod-identity-webhook amazon-eks-pod-identity-webhook copied to clipboard

Injection sometimes fails on pods created from deployments

amazon-eks-pod-identity-webhook
amazon-eks-pod-identity-webhook copied to clipboard