amazon-eks-pod-identity-webhook icon indicating copy to clipboard operation
amazon-eks-pod-identity-webhook copied to clipboard

Incompatible with grpc health probes

Open mikberg opened this issue 2 years ago • 11 comments

What happened:

gRPC probes were introduced in Kubernetes 1.24, adding a new field grpc to Probe (used in readinessProbe and livenessProbe). The pod identity webhook seems to be incompatible with this. Pods with a service account with the eks.amazonaws.com/role-arn annotation can't be created:

The Pod "test-grpc" is invalid: spec.containers[0].readinessProbe: Required value: must specify a handler type

What you expected to happen:

gRPC probes working, and the pod identity webhook working

How to reproduce it (as minimally and precisely as possible):

Observe that this example pod can be deployed and works as expected:

apiVersion: v1
kind: Pod
metadata:
  name: test-grpc
spec:
  containers:
  - name: agnhost
    image: k8s.gcr.io/e2e-test-images/agnhost:2.35
    command: ["/agnhost", "grpc-health-checking"]
    ports:
    - containerPort: 5000
    - containerPort: 8080
    readinessProbe:
      grpc:
        port: 5000

Create a Kubernetes service account my-sa with the eks.amazonaws.com/role-arn annotation set, and try to use it in a new pod:

apiVersion: v1
kind: Pod
metadata:
  name: test-grpc
spec:
  serviceAccountName: my-sa
  containers:
  - name: agnhost
    image: k8s.gcr.io/e2e-test-images/agnhost:2.35
    command: ["/agnhost", "grpc-health-checking"]
    ports:
    - containerPort: 5000
    - containerPort: 8080
    readinessProbe:
      grpc:
        port: 5000

This error message is returned:

The Pod "test-grpc" is invalid: spec.containers[0].readinessProbe: Required value: must specify a handler type

Anything else we need to know?:

Environment:

  • AWS Region: eu-north-1
  • EKS Platform version: eks.3
  • Kubernetes version: 1.24
  • Webhook Version: ?

mikberg avatar Dec 23 '22 13:12 mikberg

I have also just encountered this issue, and was wondering if there is a fix or a work around available yet?

lareeth avatar Feb 01 '23 17:02 lareeth

Just to add to this, when checking the audit log, you can see the webhook patch contents, which has stripped the grpc: {} element from the parent readinessProbe

{
	"configuration": "pod-identity-webhook",
	"webhook": "iam-for-pods.amazonaws.com",
	"patch": [
		{
			"op": "add",
			"path": "/spec/volumes/0",
			"value": {
				"name": "aws-iam-token",
				"projected": {
					"sources": [
						{
							"serviceAccountToken": {
								"audience": "sts.amazonaws.com",
								"expirationSeconds": 86400,
								"path": "token"
							}
						}
					]
				}
			}
		},
		{
			"op": "add",
			"path": "/spec/containers",
			"value": [
				{
					"name": "<removed>",
					"image": "<removed>",
					"ports": [
						{
							"name": "http",
							"containerPort": 80,
							"protocol": "TCP"
						}
					],
					"env": [
						{
							"name": "AWS_STS_REGIONAL_ENDPOINTS",
							"value": "regional"
						},
						{
							"name": "AWS_DEFAULT_REGION",
							"value": "eu-west-1"
						},
						{
							"name": "AWS_REGION",
							"value": "eu-west-1"
						},
						{
							"name": "AWS_ROLE_ARN",
							"value": "arn:aws:iam::<removed>:role/<removed>"
						},
						{
							"name": "AWS_WEB_IDENTITY_TOKEN_FILE",
							"value": "/var/run/secrets/eks.amazonaws.com/serviceaccount/token"
						}
					],
					"resources": {},
					"volumeMounts": [
						{
							"name": "kube-api-access-dgm7x",
							"readOnly": true,
							"mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
						},
						{
							"name": "aws-iam-token",
							"readOnly": true,
							"mountPath": "/var/run/secrets/eks.amazonaws.com/serviceaccount"
						}
					],
					"readinessProbe": {
						"timeoutSeconds": 1,
						"periodSeconds": 10,
						"successThreshold": 1,
						"failureThreshold": 3
					},
					"terminationMessagePath": "/dev/termination-log",
					"terminationMessagePolicy": "File",
					"imagePullPolicy": "IfNotPresent",
					"securityContext": {}
				}
			]
		}
	],
	"patchType": "JSONPatch"
}

lareeth avatar Feb 01 '23 17:02 lareeth

Again, I have pulled down the code and added a test for GRPC and can replicate this on version v0.3.0, but it seems to be working with v0.4.0. Is there an easy way to verify what version is running with EKS? And is there a way to update this?

# make test
go test -coverprofile=coverage.out ./...
?       github.com/aws/amazon-eks-pod-identity-webhook  [no test files]
?       github.com/aws/amazon-eks-pod-identity-webhook/hack/self-hosted [no test files]
?       github.com/aws/amazon-eks-pod-identity-webhook/pkg      [no test files]
ok      github.com/aws/amazon-eks-pod-identity-webhook/pkg/cache        0.097s  coverage: 41.7% of statements
ok      github.com/aws/amazon-eks-pod-identity-webhook/pkg/cache/debug  0.578s  coverage: 50.0% of statements
ok      github.com/aws/amazon-eks-pod-identity-webhook/pkg/cert 0.007s  coverage: 69.2% of statements
--- FAIL: TestUpdatePodSpec (0.01s)
    --- FAIL: TestUpdatePodSpec/Pod_balajilovesoreos_in_file_testdata/rawPodWithGrpc.pod.yaml (0.00s)
        handler_pod_test.go:162: Expected patch didn't match:
            Got
                [{"op":"add","path":"/spec/volumes","value":[{"name":"aws-iam-token","projected":{"sources":[{"serviceAccountToken":{"audience":"sts.amazonaws.com","expirationSeconds":86400,"path":"token"}}]}}]},{"op":"add","path":"/spec/containers","value":[{"name":"balajilovesoreos","image":"amazonlinux","env":[{"name":"AWS_ROLE_ARN","value":"arn:aws:iam::111122223333:role/s3-reader"},{"name":"AWS_WEB_IDENTITY_TOKEN_FILE","value":"/var/run/secrets/eks.amazonaws.com/serviceaccount/token"}],"resources":{},"volumeMounts":[{"name":"aws-iam-token","readOnly":true,"mountPath":"/var/run/secrets/eks.amazonaws.com/serviceaccount"}],"readinessProbe":{}}]}]
            Wanted:
                [{"op":"add","path":"/spec/volumes","value":[{"name":"aws-iam-token","projected":{"sources":[{"serviceAccountToken":{"audience":"sts.amazonaws.com","expirationSeconds":86400,"path":"token"}}]}}]},{"op":"add","path":"/spec/containers","value":[{"name":"balajilovesoreos","image":"amazonlinux","env":[{"name":"AWS_ROLE_ARN","value":"arn:aws:iam::111122223333:role/s3-reader"},{"name":"AWS_WEB_IDENTITY_TOKEN_FILE","value":"/var/run/secrets/eks.amazonaws.com/serviceaccount/token"}],"resources":{},"volumeMounts":[{"name":"aws-iam-token","readOnly":true,"mountPath":"/var/run/secrets/eks.amazonaws.com/serviceaccount"}],"readinessProbe":{"grpc":{"port":80,"service":""}}}]}]
E0201 17:47:01.970551   51933 handler.go:453] Content-Type=application/xml, expected application/json
E0201 17:47:01.970974   51933 handler.go:461] Can't decode body: couldn't get version/kind; json parse error: unexpected end of JSON input
E0201 17:47:01.971337   51933 handler.go:374] Could not unmarshal raw object: json: cannot unmarshal string into Go value of type v1.Pod
E0201 17:47:01.971366   51933 handler.go:375] Object: "\"metadata\":{\"name\":\"fake\""
FAIL

lareeth avatar Feb 01 '23 17:02 lareeth

The amazon-eks-pod-identity-webhook runs on the Control Plane in EKS and it's managed by EKS so you won't be able to manually update it on your end.

chickenbeef avatar Feb 04 '23 11:02 chickenbeef

Yeah I'm aware, as I asked above I was hoping there might be a work around or a different fix. As it currently stands EKS 1.24 has GRPC health checks broken which is a major issue.

lareeth avatar Feb 04 '23 12:02 lareeth

@lareeth Do you mind filing it here - https://github.com/aws/containers-roadmap/issues (if you haven't already raised it with EKS folks, do you have a ticket?)

dims avatar Feb 04 '23 12:02 dims

@lareeth A workaround I tested is to install the webhook manually into the cluster. This will create a pod-identity-webhook pod running in the dataplane - outside of EKS management so you will be responsible for monitoring it.

This is of course not ideal but should unblock you from carrying out further testing. Once the new version of the webhook is released onto EKS, you can revert back to using the EKS managed pod-identity-webhook.

chickenbeef avatar Feb 04 '23 13:02 chickenbeef

@lareeth Do you mind filing it here - https://github.com/aws/containers-roadmap/issues (if you haven't already raised it with EKS folks, do you have a ticket?)

I'll raise a ticket there and see what they say.

@lareeth A workaround I tested is to install the webhook manually into the cluster. This will create a pod-identity-webhook pod running in the dataplane - outside of EKS management so you will be responsible for monitoring it.

This is of course not ideal but should unblock you from carrying out further testing. Once the new version of the webhook is released onto EKS, you can revert back to using the EKS managed pod-identity-webhook.

I'll give this a try, we are using Flux so it should be easy to revert once it's fixed. Thanks for the suggestion

lareeth avatar Feb 04 '23 13:02 lareeth

Another workaround - which is a bit more radical than the one proposed above - is to instead of using the EKS Pod Identity (eks.amazonaws.com/role-arn) with a service account - is to switch to enable kube2iam in a node level to the namespace of your deployment, remove the service account and then add the role ( 'iam.amazonaws.com/role': yourPodRoleArn) to the annotations of the pod template. It's not fancy, but it circumvents the issue entirely.

thiduzz avatar Feb 16 '23 07:02 thiduzz

same issue still in EKS v1.27.1-eks-2f008fe

soasurs avatar Sep 05 '23 11:09 soasurs

same issue still in EKS v1.27.1-eks-2f008fe

@soasurs please open a service ticket and ask them to investigate

dims avatar Sep 05 '23 11:09 dims