amazon-eks-pod-identity-webhook
amazon-eks-pod-identity-webhook copied to clipboard
Injection sometimes fails on pods created from deployments
What happened:
Volume and environment config was not injected into pods created from a deployment.
What you expected to happen:
The config to be injected, or some debug information about why the injection didn't happen.
How to reproduce it (as minimally and precisely as possible):
Creating a couple of pods with a direct pod and a deployment:
(๑•ᴗ•)⊃━~/g/irsa-reproducer━☆゚ cat reproducer.yaml
apiVersion: v1
kind: Pod
metadata:
name: test-pod
namespace: test-app
spec:
serviceAccount: test-app
serviceAccountName: test-app
containers:
- name: hello
image: busybox
command: ['sh', '-c', 'sleep 3600']
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: test-app-token-njbrl
readOnly: true
volumes:
- name: test-app-token-njbrl
secret:
defaultMode: 420
secretName: test-app-token-njbrl
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deploy
namespace: test-app
spec:
replicas: 1
selector:
matchLabels:
app: test-deploy
template:
metadata:
labels:
app: test-deploy
spec:
serviceAccount: test-app
serviceAccountName: test-app
containers:
- name: hello
image: busybox
command: ['sh', '-c', 'sleep 3600']
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: test-app-token-njbrl
readOnly: true
volumes:
- name: test-app-token-njbrl
secret:
defaultMode: 420
secretName: test-app-token-njbrl
And applying together:
(๑•ᴗ•)⊃━~/g/irsa-reproducer━☆゚ kubectl apply -f reproducer.yaml
Results in the directly created pod receiving the injected config, but sometimes not the pod created by the deployment:
(๑•ᴗ•)⊃━~/g/irsa-reproducer━☆゚ kubectl get pods -n test-app -o jsonpath='{range .items[*]}{@.metadata.name}{" (serviceAccount: "}{@.spec.serviceAccount}{") "}{@.spec.volumes[*].name}{"\n"}{end}'
test-deploy-6794999d9-cqkwm (serviceAccount: test-app) test-app-token-njbrl
test-pod (serviceAccount: test-app) aws-iam-token test-app-token-njbrl
Anything else we need to know?:
This happens much more regularity (almost guaranteed) on an EKS cluster in one of our AWS accounts, but not in other accounts.
Feels somewhat similar to https://github.com/godaddy/kubernetes-external-secrets/issues/419
Environment:
- AWS Region: eu-west-1
- EKS Platform version:
eks.2 - Kubernetes version:
1.17 - Webhook Version:
@kragniz Did you see specific errors in kubectl get events when this appears ?
Are you working in EKS ? normally you do not need to specify the volumeMounts in EKS: example
@allamand volumeMounts doesn't need to be added, but I included it to be explicit about which mounts were being added.
Here's another instance (but without the explicit volumes for the service token):
apiVersion: v1
kind: Pod
metadata:
name: test-pod
namespace: test-app
spec:
serviceAccount: test-app
serviceAccountName: test-app
containers:
- name: hello
image: busybox
command: ['sh', '-c', 'sleep 3600']
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-deploy
namespace: test-app
spec:
replicas: 1
selector:
matchLabels:
app: test-deploy
template:
metadata:
labels:
app: test-deploy
spec:
serviceAccount: test-app
serviceAccountName: test-app
containers:
- name: hello
image: busybox
command: ['sh', '-c', 'sleep 3600']
When watching kubectl get events -A -w while applying:
test-app 0s Normal Scheduled pod/test-pod Successfully assigned test-app/test-pod to ip-172-31-3-158.eu-west-1.compute.internal
test-app 0s Normal ScalingReplicaSet deployment/test-deploy Scaled up replica set test-deploy-fdfc554f to 1
test-app 0s Normal SuccessfulCreate replicaset/test-deploy-fdfc554f Created pod: test-deploy-fdfc554f-sd7nb
test-app 0s Normal Scheduled pod/test-deploy-fdfc554f-sd7nb Successfully assigned test-app/test-deploy-fdfc554f-sd7nb to ip-172-31-18-62.eu-west-1.compute.internal
test-app 0s Normal Pulling pod/test-pod Pulling image "busybox"
test-app 0s Normal Pulled pod/test-pod Successfully pulled image "busybox"
test-app 0s Normal Created pod/test-pod Created container hello
test-app 0s Normal Started pod/test-pod Started container hello
test-app 1s Normal Pulling pod/test-deploy-fdfc554f-sd7nb Pulling image "busybox"
test-app 0s Normal Pulled pod/test-deploy-fdfc554f-sd7nb Successfully pulled image "busybox"
test-app 0s Normal Created pod/test-deploy-fdfc554f-sd7nb Created container hello
test-app 0s Normal Started pod/test-deploy-fdfc554f-sd7nb Started container hello
In this instance, config failed to be injected into either of the pods:
(๑•ᴗ•)⊃━~/g/irsa-reproducer━☆゚ kubectl get pods -n test-app -o jsonpath='{range .items[*]}{@.metadata.name}{" (serviceAccount: "}{@.spec.serviceAccount}{") "}{@.spec.volumes[*].name}{"\n"}{end}'
test-deploy-fdfc554f-sd7nb (serviceAccount: test-app) test-app-token-njbrl
test-pod (serviceAccount: test-app) test-app-token-njbrl
Is there no way to obtain logs from the webhook that's deployed with EKS clusters?
If you have access to audit logs and you can find the Pod creation event, it should have a mutation decision. Can you confirm if the webhook was called and mutated was false? https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#mutating-webhook-auditing-annotations
We currently have the same issue. I just activated the audit logs at the eks cluster control plane but i cant see any event which is related to the webhook. Do you have an example how such a event look like ?
Hi, we've been seeing the issue described above. Is there a simple set of steps we can follow to help have this issue debugged?
It was some kind of API Problem on the master nodes. We fixed it by updating the EKS Control Plane to the next Version. The Support recommended to open a support ticket that they can restart the Kubernetes API, because you do no have access to the api logs of Mutating Webhooks.
We currently have the same issue using EKS 1.16, although is not only isolated to pods created by deployments but also individual pods. It happens randomly. I have checked audit logs and we have mutated: false
{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"","mutation.webhook.admission.k8s.io/round_0_index_0":"{\"configuration\":\"istio-sidecar-injector\",\"webhook\":\"sidecar-injector.istio.io\",\"mutated\":false}","mutation.webhook.admission.k8s.io/round_0_index_1":"{\"configuration\":\"pod-identity-webhook\",\"webhook\":\"iam-for-pods.amazonaws.com\",\"mutated\":false}","mutation.webhook.admission.k8s.io/round_0_index_3":"{\"configuration\":\"vpc-resource-mutating-webhook\",\"webhook\":\"mpod.vpc.k8s.aws\",\"mutated\":false}"
In kubelet, we often find this error for the containers that fail to project the token:
579 kuberuntime_manager.go:935] getPodContainerStatuses for pod "test-container_default(e494a824-8028-4775-a966-6153ed26dd6a)" failed: rpc error: code = Unknown desc = Error: No such container: e2c69b28df377c75fbe0c46d506da495e754046285f27df47095b79d56e0d2e5
Any idea how I can continue to troubleshoot this? Is there any way to get direct logs from the webhook, as in EKS setups the pod is not accessible, well at least I couldn't find it but I have the mutator
kubectl get mutatingwebhookconfigurations -o wide NAME CREATED AT istio-sidecar-injector 2020-11-11T15:30:39Z pod-identity-webhook 2020-06-12T16:09:24Z
Any help would be greatly appreciated
I can confirm that I run into this as well. I've only seen it on deployments. I haven't been able to check the logs to try and troubleshoot yet.