amazon-eks-pod-identity-webhook
amazon-eks-pod-identity-webhook copied to clipboard
Incompatible with grpc health probes
What happened:
gRPC probes were introduced in Kubernetes 1.24, adding a new field grpc
to Probe
(used in readinessProbe
and livenessProbe
). The pod identity webhook seems to be incompatible with this. Pods with a service account with the eks.amazonaws.com/role-arn
annotation can't be created:
The Pod "test-grpc" is invalid: spec.containers[0].readinessProbe: Required value: must specify a handler type
What you expected to happen:
gRPC probes working, and the pod identity webhook working
How to reproduce it (as minimally and precisely as possible):
Observe that this example pod can be deployed and works as expected:
apiVersion: v1
kind: Pod
metadata:
name: test-grpc
spec:
containers:
- name: agnhost
image: k8s.gcr.io/e2e-test-images/agnhost:2.35
command: ["/agnhost", "grpc-health-checking"]
ports:
- containerPort: 5000
- containerPort: 8080
readinessProbe:
grpc:
port: 5000
Create a Kubernetes service account my-sa
with the eks.amazonaws.com/role-arn
annotation set, and try to use it in a new pod:
apiVersion: v1
kind: Pod
metadata:
name: test-grpc
spec:
serviceAccountName: my-sa
containers:
- name: agnhost
image: k8s.gcr.io/e2e-test-images/agnhost:2.35
command: ["/agnhost", "grpc-health-checking"]
ports:
- containerPort: 5000
- containerPort: 8080
readinessProbe:
grpc:
port: 5000
This error message is returned:
The Pod "test-grpc" is invalid: spec.containers[0].readinessProbe: Required value: must specify a handler type
Anything else we need to know?:
Environment:
- AWS Region: eu-north-1
- EKS Platform version: eks.3
- Kubernetes version: 1.24
- Webhook Version: ?
I have also just encountered this issue, and was wondering if there is a fix or a work around available yet?
Just to add to this, when checking the audit log, you can see the webhook patch contents, which has stripped the grpc: {}
element from the parent readinessProbe
{
"configuration": "pod-identity-webhook",
"webhook": "iam-for-pods.amazonaws.com",
"patch": [
{
"op": "add",
"path": "/spec/volumes/0",
"value": {
"name": "aws-iam-token",
"projected": {
"sources": [
{
"serviceAccountToken": {
"audience": "sts.amazonaws.com",
"expirationSeconds": 86400,
"path": "token"
}
}
]
}
}
},
{
"op": "add",
"path": "/spec/containers",
"value": [
{
"name": "<removed>",
"image": "<removed>",
"ports": [
{
"name": "http",
"containerPort": 80,
"protocol": "TCP"
}
],
"env": [
{
"name": "AWS_STS_REGIONAL_ENDPOINTS",
"value": "regional"
},
{
"name": "AWS_DEFAULT_REGION",
"value": "eu-west-1"
},
{
"name": "AWS_REGION",
"value": "eu-west-1"
},
{
"name": "AWS_ROLE_ARN",
"value": "arn:aws:iam::<removed>:role/<removed>"
},
{
"name": "AWS_WEB_IDENTITY_TOKEN_FILE",
"value": "/var/run/secrets/eks.amazonaws.com/serviceaccount/token"
}
],
"resources": {},
"volumeMounts": [
{
"name": "kube-api-access-dgm7x",
"readOnly": true,
"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
},
{
"name": "aws-iam-token",
"readOnly": true,
"mountPath": "/var/run/secrets/eks.amazonaws.com/serviceaccount"
}
],
"readinessProbe": {
"timeoutSeconds": 1,
"periodSeconds": 10,
"successThreshold": 1,
"failureThreshold": 3
},
"terminationMessagePath": "/dev/termination-log",
"terminationMessagePolicy": "File",
"imagePullPolicy": "IfNotPresent",
"securityContext": {}
}
]
}
],
"patchType": "JSONPatch"
}
Again, I have pulled down the code and added a test for GRPC and can replicate this on version v0.3.0, but it seems to be working with v0.4.0. Is there an easy way to verify what version is running with EKS? And is there a way to update this?
# make test
go test -coverprofile=coverage.out ./...
? github.com/aws/amazon-eks-pod-identity-webhook [no test files]
? github.com/aws/amazon-eks-pod-identity-webhook/hack/self-hosted [no test files]
? github.com/aws/amazon-eks-pod-identity-webhook/pkg [no test files]
ok github.com/aws/amazon-eks-pod-identity-webhook/pkg/cache 0.097s coverage: 41.7% of statements
ok github.com/aws/amazon-eks-pod-identity-webhook/pkg/cache/debug 0.578s coverage: 50.0% of statements
ok github.com/aws/amazon-eks-pod-identity-webhook/pkg/cert 0.007s coverage: 69.2% of statements
--- FAIL: TestUpdatePodSpec (0.01s)
--- FAIL: TestUpdatePodSpec/Pod_balajilovesoreos_in_file_testdata/rawPodWithGrpc.pod.yaml (0.00s)
handler_pod_test.go:162: Expected patch didn't match:
Got
[{"op":"add","path":"/spec/volumes","value":[{"name":"aws-iam-token","projected":{"sources":[{"serviceAccountToken":{"audience":"sts.amazonaws.com","expirationSeconds":86400,"path":"token"}}]}}]},{"op":"add","path":"/spec/containers","value":[{"name":"balajilovesoreos","image":"amazonlinux","env":[{"name":"AWS_ROLE_ARN","value":"arn:aws:iam::111122223333:role/s3-reader"},{"name":"AWS_WEB_IDENTITY_TOKEN_FILE","value":"/var/run/secrets/eks.amazonaws.com/serviceaccount/token"}],"resources":{},"volumeMounts":[{"name":"aws-iam-token","readOnly":true,"mountPath":"/var/run/secrets/eks.amazonaws.com/serviceaccount"}],"readinessProbe":{}}]}]
Wanted:
[{"op":"add","path":"/spec/volumes","value":[{"name":"aws-iam-token","projected":{"sources":[{"serviceAccountToken":{"audience":"sts.amazonaws.com","expirationSeconds":86400,"path":"token"}}]}}]},{"op":"add","path":"/spec/containers","value":[{"name":"balajilovesoreos","image":"amazonlinux","env":[{"name":"AWS_ROLE_ARN","value":"arn:aws:iam::111122223333:role/s3-reader"},{"name":"AWS_WEB_IDENTITY_TOKEN_FILE","value":"/var/run/secrets/eks.amazonaws.com/serviceaccount/token"}],"resources":{},"volumeMounts":[{"name":"aws-iam-token","readOnly":true,"mountPath":"/var/run/secrets/eks.amazonaws.com/serviceaccount"}],"readinessProbe":{"grpc":{"port":80,"service":""}}}]}]
E0201 17:47:01.970551 51933 handler.go:453] Content-Type=application/xml, expected application/json
E0201 17:47:01.970974 51933 handler.go:461] Can't decode body: couldn't get version/kind; json parse error: unexpected end of JSON input
E0201 17:47:01.971337 51933 handler.go:374] Could not unmarshal raw object: json: cannot unmarshal string into Go value of type v1.Pod
E0201 17:47:01.971366 51933 handler.go:375] Object: "\"metadata\":{\"name\":\"fake\""
FAIL
The amazon-eks-pod-identity-webhook
runs on the Control Plane in EKS and it's managed by EKS so you won't be able to manually update it on your end.
Yeah I'm aware, as I asked above I was hoping there might be a work around or a different fix. As it currently stands EKS 1.24 has GRPC health checks broken which is a major issue.
@lareeth Do you mind filing it here - https://github.com/aws/containers-roadmap/issues (if you haven't already raised it with EKS folks, do you have a ticket?)
@lareeth A workaround I tested is to install the webhook manually into the cluster. This will create a pod-identity-webhook pod running in the dataplane - outside of EKS management so you will be responsible for monitoring it.
This is of course not ideal but should unblock you from carrying out further testing. Once the new version of the webhook is released onto EKS, you can revert back to using the EKS managed pod-identity-webhook.
@lareeth Do you mind filing it here - https://github.com/aws/containers-roadmap/issues (if you haven't already raised it with EKS folks, do you have a ticket?)
I'll raise a ticket there and see what they say.
@lareeth A workaround I tested is to install the webhook manually into the cluster. This will create a pod-identity-webhook pod running in the dataplane - outside of EKS management so you will be responsible for monitoring it.
This is of course not ideal but should unblock you from carrying out further testing. Once the new version of the webhook is released onto EKS, you can revert back to using the EKS managed pod-identity-webhook.
I'll give this a try, we are using Flux so it should be easy to revert once it's fixed. Thanks for the suggestion
Another workaround - which is a bit more radical than the one proposed above - is to instead of using the EKS Pod Identity (eks.amazonaws.com/role-arn
) with a service account - is to switch to enable kube2iam in a node level to the namespace of your deployment, remove the service account and then add the role ( 'iam.amazonaws.com/role': yourPodRoleArn
) to the annotations
of the pod template. It's not fancy, but it circumvents the issue entirely.
same issue still in EKS v1.27.1-eks-2f008fe
same issue still in EKS v1.27.1-eks-2f008fe
@soasurs please open a service ticket and ask them to investigate