vault-secrets-webhook
vault-secrets-webhook copied to clipboard
Authentication to ECR with kube2iam fails since v1.21.1
Preflight Checklist
- [X] I have searched the issue tracker for an issue that matches the one I want to file, without success.
- [X] I am not looking for support or already pursued the available support channels without success.
- [X] I agree to follow the Code of Conduct.
Vault Secrets Webhook Version
1.21.1
Installation Type
Official Helm chart
Bank-Vaults Version
No response
Kubernetes Version
1.27
Kubernetes Distribution/Provisioner
EKS
Expected Behavior
Bank Vaults Webhook should be able to download the manifest from ECR
Actual Behavior
The webhook fails with the following error:
{"time":"2024-02-26T13:23:44.134786476Z","level":"ERROR","msg":"Admission review error: could not mutate object: cannot fetch image descriptor: GET https://<obfuscated>.dkr.ecr.us-east-1.amazonaws.com/v2/<obfuscated>/manifests/sha256:<obfuscated>: unexpected status code 401 Unauthorized: Not Authorized\n","app":"vault-secrets-webhook","svc":"http.Handler","op":"create","ns":"<obfuscated>","request-id":"511a2a2c-566b-48af-b4c9-e8133de50f0e","kind":"v1/Pod","name":"<obfuscated>","path":"/pods","webhook-id":"vault-secrets-pods","dry-run":false,"webhook-kind":"mutating","wh-version":"v1beta1","trace-id":""}
Steps To Reproduce
- Use Kube2IAM for IAM authentication and add
iam.amazonaws.com/role: <role-arn>to the pod annotation of the webhook deployment - Schedule a pod that requires the webhook to download the manifest from ECR. Webhook will fail to download the manifest, blocking scheduling of a pod
- Downgrade to v1.21.0, webhook is able to download manifest again and pods can be scheduled again
Configuration
**Chart Values**
podAnnotations:
iam.amazonaws.com/role: "<role-arn>"
The IAM role uses the `arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly` managed policy
Logs
No response
Additional Information
The webhook had no problem downloading from ECR before upgrading to v1.21.1, so we believe that something must have changed regarding the authentication. We looked at the logs of kube2iam and didn't see anything unusual there.
cc @universam1
adding the question, if there should be an option to allow a failure during image fetching in order to unblock the pod to be created anyways? From operational perspective, a blocking webhook is not desired behaviour.
My first guess would be that this PR could be the culprit, can you spot something off in how Vault API's auth package is used or how it handles AWS IAM authentication?
Edit: on second look this ECR login failure is a separate issue from auth in the vault-sdk package.
FYI I encountered same issue on EKS and using IRSA after trying to upgrade to 1.21.1, also can't get image descriptor from ecr due to 401
Thanks for reporting! We are investigating this and will provide a hotfix in the upcoming days.
Hi @AndreasSko, @94DanielBrown, could you please share what other annotations are you using on the pods, and whether you are using a default image pull secret for private registry access or not? Thanks!
Hi @akijakya No interesting annotations:
metadata:
annotations:
checksum/config: eb66f509b2d3806f8b5cb57f5a7791b72bdc83d6d351d161509db4fa5c230684
kubectl.kubernetes.io/restartedAt: "2024-02-28T13:52:18Z"
The pod is using an image pull secret called image-pull-secret which has dockerhub creds in it. But to access AWS ECR the pod is configured with an AWS ROLE through IRSA. This role has the AWS managed policy AmazonEC2ContainerRegistryReadOnly
Thanks
I upgraded from v1.14.3 to v1.19.0 first, and I ran into this issue immediately - so for me, I am not sure its tied to 1.21.1.
Similar environment to OP.
//EDIT: I rolled back to v1.21.0 and still getting it too.
time=2024-03-01T22:55:28.541Z level=INFO msg="Listening on https://:8443" app=vault-secrets-webhook
time=2024-03-01T22:55:28.541Z level=INFO msg="watching directory for changes: /var/serving-cert/" app=vault-secrets-webhook
time=2024-03-01T22:55:58.461Z level=ERROR msg="Admission review request failed" app=vault-secrets-webhook op=create kind=v1/Pod ns=default name="" webhook-id=vault-secrets-pods request-id=150ed1d7-31f8-459e-9dca-bd6afc50bb99 wh-version=v1beta1 trace-id="" webhook-kind=mutating dry-run=false path=/pods error="GET https://xxx.dkr.ecr.us-east-1.amazonaws.com/v2/pou-service/manifests/v1.0.07-test3: unexpected status code 401 Unauthorized: Not Authorized\n\ncannot fetch image descriptor\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.getImageConfig\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/registry.go:196\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*Registry).GetImageConfig\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/registry.go:152\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*MutatingWebhook).mutateContainers\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/pod.go:275\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*MutatingWebhook).MutatePod\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/pod.go:76\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*MutatingWebhook).VaultSecretsMutator\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/webhook.go:78\nmain.main.ErrorLoggerMutator.func6\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/webhook.go:320\ngithub.com/slok/kubewebhook/v2/pkg/webhook/mutating.MutatorFunc.Mutate\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/mutating/mutator.go:45\ngithub.com/slok/kubewebhook/v2/pkg/webhook/mutating.mutatingWebhook.mutatingAdmissionReview\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/mutating/webhook.go:116\ngithub.com/slok/kubewebhook/v2/pkg/webhook/mutating.mutatingWebhook.Review\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/mutating/webhook.go:104\ngithub.com/slok/kubewebhook/v2/pkg/webhook.measuredWebhook.Review\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/metrics.go:113\ngithub.com/slok/kubewebhook/v2/pkg/http.handler.ServeHTTP\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/http/handler.go:148\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2683\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"
time=2024-03-01T22:55:58.462Z level=ERROR msg="Admission review error: could not mutate object: cannot fetch image descriptor: GET https://xxx.dkr.ecr.us-east-1.amazonaws.com/v2/pou-service/manifests/v1.0.07-test3: unexpected status code 401 Unauthorized: Not Authorized\n" app=vault-secrets-webhook svc=http.Handler name="" webhook-id=vault-secrets-pods request-id=150ed1d7-31f8-459e-9dca-bd6afc50bb99 wh-version=v1beta1 trace-id="" op=create kind=v1/Pod ns=default webhook-kind=mutating dry-run=false path=/pods
I upgraded from v1.14.3 to v1.19.0 first, and I ran into this issue immediately - so for me, I am not sure its tied to 1.21.1.
Similar environment to OP.
//EDIT: I rolled back to v1.21.0 and still getting it too.
time=2024-03-01T22:55:28.541Z level=INFO msg="Listening on https://:8443" app=vault-secrets-webhook time=2024-03-01T22:55:28.541Z level=INFO msg="watching directory for changes: /var/serving-cert/" app=vault-secrets-webhook time=2024-03-01T22:55:58.461Z level=ERROR msg="Admission review request failed" app=vault-secrets-webhook op=create kind=v1/Pod ns=default name="" webhook-id=vault-secrets-pods request-id=150ed1d7-31f8-459e-9dca-bd6afc50bb99 wh-version=v1beta1 trace-id="" webhook-kind=mutating dry-run=false path=/pods error="GET https://xxx.dkr.ecr.us-east-1.amazonaws.com/v2/pou-service/manifests/v1.0.07-test3: unexpected status code 401 Unauthorized: Not Authorized\n\ncannot fetch image descriptor\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.getImageConfig\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/registry.go:196\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*Registry).GetImageConfig\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/registry.go:152\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*MutatingWebhook).mutateContainers\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/pod.go:275\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*MutatingWebhook).MutatePod\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/pod.go:76\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*MutatingWebhook).VaultSecretsMutator\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/webhook.go:78\nmain.main.ErrorLoggerMutator.func6\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/webhook.go:320\ngithub.com/slok/kubewebhook/v2/pkg/webhook/mutating.MutatorFunc.Mutate\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/mutating/mutator.go:45\ngithub.com/slok/kubewebhook/v2/pkg/webhook/mutating.mutatingWebhook.mutatingAdmissionReview\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/mutating/webhook.go:116\ngithub.com/slok/kubewebhook/v2/pkg/webhook/mutating.mutatingWebhook.Review\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/mutating/webhook.go:104\ngithub.com/slok/kubewebhook/v2/pkg/webhook.measuredWebhook.Review\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/metrics.go:113\ngithub.com/slok/kubewebhook/v2/pkg/http.handler.ServeHTTP\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/http/handler.go:148\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2683\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695" time=2024-03-01T22:55:58.462Z level=ERROR msg="Admission review error: could not mutate object: cannot fetch image descriptor: GET https://xxx.dkr.ecr.us-east-1.amazonaws.com/v2/pou-service/manifests/v1.0.07-test3: unexpected status code 401 Unauthorized: Not Authorized\n" app=vault-secrets-webhook svc=http.Handler name="" webhook-id=vault-secrets-pods request-id=150ed1d7-31f8-459e-9dca-bd6afc50bb99 wh-version=v1beta1 trace-id="" op=create kind=v1/Pod ns=default webhook-kind=mutating dry-run=false path=/pods
Thats in my first pod.
My second pod is flooded with non-stop logs, any idea why?
time=2024-03-01T22:56:13.701Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler request-id=5eb2f533-6732-4d7a-be59-e93f8fa6fa85 op=update kind=v1/Secret dry-run=false name=webhook-certificate path=/secrets trace-id="" webhook-kind=mutating ns=default webhook-id=vault-secrets-secret wh-version=v1beta1 duration=1.135604ms
time=2024-03-01T22:56:13.758Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler wh-version=v1beta1 dry-run=false ns=default name=webhook-certificate trace-id="" request-id=3958bf84-01fe-482d-ba79-2c02a80c7762 kind=v1/Secret path=/secrets webhook-id=vault-secrets-secret webhook-kind=mutating op=update duration=1.168143ms
time=2024-03-01T22:56:14.030Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler ns=default path=/secrets webhook-id=vault-secrets-secret webhook-kind=mutating dry-run=false kind=v1/Secret name=webhook-certificate request-id=5158f8f5-b38f-4fce-b7ad-d953513e94fd trace-id="" op=update wh-version=v1beta1 duration=1.892919ms
time=2024-03-01T22:56:14.141Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler dry-run=false kind=v1/Secret webhook-kind=mutating wh-version=v1beta1 name=webhook-certificate trace-id="" request-id=e0e8cbd8-25f5-476e-bca1-26cfd33c4eb4 op=update webhook-id=vault-secrets-secret ns=default path=/secrets duration=1.353117ms
time=2024-03-01T22:56:14.210Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler name=webhook-certificate path=/secrets webhook-id=vault-secrets-secret ns=default op=update kind=v1/Secret trace-id="" webhook-kind=mutating request-id=d08d661e-4b1c-4834-bca9-3b831772dc85 dry-run=false wh-version=v1beta1 duration=1.121424ms
time=2024-03-01T22:56:14.279Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler request-id=686b306b-74cc-41fb-9194-751f0b9f5eaf kind=v1/Secret path=/secrets trace-id="" name=webhook-certificate webhook-id=vault-secrets-secret webhook-kind=mutating op=update wh-version=v1beta1 dry-run=false ns=default duration=1.199005ms
time=2024-03-01T22:56:14.428Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler path=/secrets request-id=7919cd36-6631-4158-bfc9-57ad9a59dfe2 wh-version=v1beta1 name=webhook-certificate op=update trace-id="" webhook-kind=mutating dry-run=false ns=default webhook-id=vault-secrets-secret kind=v1/Secret duration=1.068072ms
time=2024-03-01T22:56:14.636Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler webhook-id=vault-secrets-secret op=update trace-id="" webhook-kind=mutating request-id=c08a2532-f338-4670-b2c1-d11258571720 dry-run=false ns=default path=/secrets kind=v1/Secret name=webhook-certificate wh-version=v1beta1 duration=1.114294ms
time=2024-03-01T22:56:14.759Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler trace-id="" webhook-id=vault-secrets-secret wh-version=v1beta1 webhook-kind=mutating ns=default name=webhook-certificate op=update dry-run=false path=/secrets request-id=e5980b14-d464-4037-ba7e-014bd70d4d0d kind=v1/Secret duration=1.185574ms
time=2024-03-01T22:56:15.064Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler trace-id="" op=update wh-version=v1beta1 ns=default request-id=7c44e11f-858a-4c9f-8e66-0f3068ef9d01 dry-run=false name=webhook-certificate kind=v1/Secret path=/secrets webhook-id=vault-secrets-secret webhook-kind=mutating duration=1.151295ms
time=2024-03-01T22:56:15.145Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler trace-id="" op=update request-id=8a29244e-2440-4de6-bf75-f97e27b3928d path=/secrets dry-run=false kind=v1/Secret webhook-kind=mutating wh-version=v1beta1 webhook-id=vault-secrets-secret ns=default name=webhook-certificate duration=1.144403ms
time=2024-03-01T22:56:15.262Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler path=/secrets request-id=013bf26a-7ebf-40a3-ae94-f099fb4c868f dry-run=false ns=default webhook-id=vault-secrets-secret webhook-kind=mutating trace-id="" op=update wh-version=v1beta1 kind=v1/Secret name=webhook-certificate duration=1.319957ms
time=2024-03-01T22:56:15.387Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler name=webhook-certificate path=/secrets webhook-id=vault-secrets-secret webhook-kind=mutating request-id=689a0fc6-b44d-4f9f-b89f-6aaf68f31047 op=update wh-version=v1beta1 kind=v1/Secret ns=default trace-id="" dry-run=false duration=1.142183ms
time=2024-03-01T22:56:15.597Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler ns=default name=webhook-certificate trace-id="" webhook-kind=mutating op=update request-id=aa038788-945c-477b-a970-301c0a5841c9 webhook-id=vault-secrets-secret kind=v1/Secret dry-run=false wh-version=v1beta1 path=/secrets duration=1.289527ms
time=2024-03-01T22:56:15.829Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler name=webhook-certificate webhook-kind=mutating path=/secrets trace-id="" request-id=a7d526e9-fadd-48a5-8b6f-900295fdac53 ns=default wh-version=v1beta1 dry-run=false kind=v1/Secret webhook-id=vault-secrets-secret op=update duration=1.885529ms
time=2024-03-01T22:56:15.939Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler request-id=89820a84-c679-4a54-b625-dad922f8421a op=update dry-run=false ns=default path=/secrets trace-id="" wh-version=v1beta1 kind=v1/Secret name=webhook-certificate webhook-id=vault-secrets-secret webhook-kind=mutating duration=1.365557ms
time=2024-03-01T22:56:16.226Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler webhook-kind=mutating op=update dry-run=false ns=default name=webhook-certificate kind=v1/Secret path=/secrets request-id=ee38609c-dd9f-4119-980b-bef2d353921b wh-version=v1beta1 trace-id="" webhook-id=vault-secrets-secret duration=1.193114ms
time=2024-03-01T22:56:16.308Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler path=/secrets webhook-kind=mutating kind=v1/Secret webhook-id=vault-secrets-secret request-id=ae0e14a7-f52c-4345-8433-61e606f20bef name=webhook-certificate dry-run=false ns=default wh-version=v1beta1 trace-id="" op=update duration=1.288636ms
time=2024-03-01T22:56:16.381Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler trace-id="" op=update dry-run=false webhook-id=vault-secrets-secret webhook-kind=mutating name=webhook-certificate wh-version=v1beta1 ns=default request-id=1492b821-8800-4396-a436-15240d4c4ee2 kind=v1/Secret path=/secrets duration=1.43912ms
time=2024-03-01T22:56:16.560Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler ns=default wh-version=v1beta1 op=update trace-id="" webhook-id=vault-secrets-secret webhook-kind=mutating kind=v1/Secret request-id=c62b4adb-2e13-44d6-b71b-30a154665004 dry-run=false name=webhook-certificate path=/secrets duration=1.195635ms
time=2024-03-01T22:56:16.702Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler trace-id="" op=update kind=v1/Secret path=/secrets webhook-id=vault-secrets-secret webhook-kind=mutating request-id=6e8742df-469f-4fe1-a8ab-797128521b84 dry-run=false ns=default wh-version=v1beta1 name=webhook-certificate duration=1.266616ms
time=2024-03-01T22:56:16.976Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler wh-version=v1beta1 name=webhook-certificate request-id=1bbe7d9f-0f56-47ad-a409-585cd8acd6da path=/secrets webhook-id=vault-secrets-secret op=update dry-run=false ns=default trace-id="" webhook-kind=mutating kind=v1/Secret duration=1.212175ms
time=2024-03-01T22:56:17.119Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler trace-id="" dry-run=false kind=v1/Secret ns=default path=/secrets op=update name=webhook-certificate webhook-id=vault-secrets-secret webhook-kind=mutating request-id=073e9a95-ec94-4ca8-9a61-698fb9f5f577 wh-version=v1beta1 duration=1.154193ms
time=2024-03-01T22:56:17.212Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler request-id=8ade6ef0-4ec2-45c3-a9ae-fca28b5f26e4 kind=v1/Secret path=/secrets webhook-id=vault-secrets-secret webhook-kind=mutating wh-version=v1beta1 ns=default name=webhook-certificate trace-id="" dry-run=false op=update duration=1.320187ms
time=2024-03-01T22:56:17.409Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler ns=default webhook-kind=mutating webhook-id=vault-secrets-secret request-id=03378670-6ef5-4988-9bb0-ca3cc5c37eef dry-run=false kind=v1/Secret name=webhook-certificate path=/secrets trace-id="" op=update wh-version=v1beta1 duration=1.227225ms
time=2024-03-01T22:56:17.520Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler dry-run=false kind=v1/Secret op=update path=/secrets trace-id="" webhook-id=vault-secrets-secret webhook-kind=mutating wh-version=v1beta1 ns=default request-id=e1ea4f94-297a-4949-bdbd-b0612e38aa08 name=webhook-certificate duration=1.244055ms
time=2024-03-01T22:56:17.769Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler webhook-id=vault-secrets-secret wh-version=v1beta1 kind=v1/Secret webhook-kind=mutating dry-run=false path=/secrets op=update request-id=2ac0ac90-ccd3-47af-a546-3a36b546a192 ns=default name=webhook-certificate trace-id="" duration=1.209195ms
time=2024-03-01T22:56:18.021Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler dry-run=false name=webhook-certificate webhook-id=vault-secrets-secret request-id=5a28e055-9432-46b3-8b73-7949ccace4e8 kind=v1/Secret wh-version=v1beta1 op=update path=/secrets webhook-kind=mutating ns=default trace-id="" duration=1.135993ms
time=2024-03-01T22:56:18.360Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler kind=v1/Secret dry-run=false webhook-id=vault-secrets-secret wh-version=v1beta1 request-id=92e7af0a-5ea6-43cc-9cf9-438a88081b81 op=update name=webhook-certificate webhook-kind=mutating ns=default path=/secrets trace-id="" duration=1.751737ms
time=2024-03-01T22:56:18.426Z level=INFO msg="Admission review request handled" app=vault-secrets-webhook svc=http.Handler kind=v1/Secret webhook-id=vault-secrets-secret dry-run=false name=webhook-certificate path=/secrets trace-id="" webhook-kind=mutating request-id=737ff9c1-d657-43cd-92e9-fed7962a757f op=update wh-version=v1beta1 ns=default duration=1.120402ms
OK so a finding, which coincides which what i have read.
None of my Deployment templates have a command: defined alongside the image, we have deferred to the container default.
Now when I add a command section, everything works as expected. (PER https://bank-vaults.dev/docs/mutating-webhook/configuration/#use-charts-without-explicit-containercommand-and-containerargs)
So the webhook goes to infer the command from the container registry, and that fails and so it doesn't mount the vault-env.
This is not the behavior in v1.14.3 - but perhaps my config. has always been broken - or this is a redesigned architecture requiring the pull secrets.
So now why doesn't this work?? Hmm.. "If your EC2 nodes have the ECR instance role, the webhook can request an ECR access token through that role automatically, instead of an explicit imagePullSecret"
Still on v1.21.0..
The only thing that is special on mine is my ECR is in the main AWS account whereas my EKS nodes run under a different account. Worker node has no problems pulling the image, just the webhook.
FYI - If I assign a pull secret to the application with vault references, or use the DEFAULT_ env - it does work fine but of course breaks after 12 hours.
TOKEN=`aws ecr get-authorization-token --region=us-east-1 --registry-ids XXmainXX --output text --query authorizationData[].authorizationToken | base64 --decode | cut -d: -f2`
kubectl delete secret default-ecr-token
kubectl create secret docker-registry default-ecr-token \
--docker-server=https://XXmainXX.dkr.ecr.us-east-1.amazonaws.com \
--docker-username=AWS \
--docker-password="${TOKEN}"
helm:
valuesObject:
env:
VAULT_ADDR: "http://vault.vault.svc.cluster.local:8200"
VAULT_PATH: kubernetes
VAULT_SKIP_VERIFY: true
DEFAULT_IMAGE_PULL_SECRET: default-ecr-token
DEFAULT_IMAGE_PULL_SECRET_NAMESPACE: default
DEFAULT_IMAGE_PULL_SECRET_SERVICE_ACCOUNT: vault-secrets-webhook
Same issues as everyone after deciding to give Kube2Iam a shot, still on v1.21.0
Pod annotation is there:
Name: vault-secrets-webhook-886cd6d6f-t77xp
Namespace: vault-secrets-webhook
Priority: 0
Service Account: vault-secrets-webhook
Node: ip-10-200-72-160.ec2.internal/10.200.72.160
Start Time: Mon, 04 Mar 2024 08:15:33 +0000
Labels: app.kubernetes.io/instance=vault-secrets-webhook
app.kubernetes.io/name=vault-secrets-webhook
pod-template-hash=886cd6d6f
security.banzaicloud.io/mutate=skip
Annotations: checksum/config: dae52915272195a437ad21a384707864caee1656a67d5ad2db90933cc0a44273
iam.amazonaws.com/role: arn:aws:iam::XXdevXX:role/k8s-us-east-1-dev-eks-webhook2iam2ecr
Kube2Iam logs: (look correct?)
time="2024-03-04T08:02:56Z" level=info msg="base ARN autodetected, arn:aws:iam::XXdevXX:role/"
time="2024-03-04T08:02:56Z" level=info msg="Using instance IAMRole arn:aws:iam::XXdevXX:role/k8s-us-east-1-dev-worker_assume_role as default"
time="2024-03-04T08:02:56Z" level=info msg="Listening on port 8181"
time="2024-03-04T08:16:36Z" level=info msg="PUT /latest/api/token (200) took 0.960256 ms" req.method=PUT req.path=/latest/api/token req.remote=10.200.56.57 res.duration=0.960256 res.status=200
time="2024-03-04T08:16:36Z" level=info msg="GET /latest/meta-data/iam/security-credentials/ (200) took 0.009750 ms" req.method=GET req.path=/latest/meta-data/iam/security-credentials/ req.remote=10.200.56.57 res.duration=0.00975 res.status=200
time="2024-03-04T08:16:36Z" level=info msg="GET /latest/meta-data/iam/security-credentials/k8s-us-east-1-dev-eks-webhook2iam2ecr (200) took 0.035931 ms" req.method=GET req.path=/latest/meta-data/iam/security-credentials/k8s-us-east-1-dev-eks-webhook2iam2ecr req.remote=10.200.56.57 res.duration=0.035931 res.status=200
ECR Repo permissions look fine:
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "dev permissions",
"Effect": "Allow",
"Principal": {
"AWS": [
"arn:aws:iam::XXdevXX:role/ops-admin",
"arn:aws:iam::XXmainXX:role/ops-admin",
"arn:aws:iam::XXdevXX:role/dev-admin",
"arn:aws:iam::XXmainXX:role/dev-admin",
"arn:aws:iam::XXdevXX:role/k8s-us-east-1-dev-eks-webhook2iam2ecr",
"arn:aws:iam::XXdevXX:role/k8s-us-east-1-dev-worker_assume_role",
]
},
"Action": [
"ecr:BatchCheckLayerAvailability",
"ecr:BatchGetImage",
"ecr:CompleteLayerUpload",
"ecr:DescribeImages",
"ecr:DescribeRepositories",
"ecr:GetDownloadUrlForLayer",
"ecr:GetLifecyclePolicy",
"ecr:GetLifecyclePolicyPreview",
"ecr:GetRepositoryPolicy",
"ecr:InitiateLayerUpload",
"ecr:ListImages",
"ecr:PutImage",
"ecr:UploadLayerPart"
]
}
]
}
kube2iam-trust.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::XXdevXX:role/k8s-us-east-1-dev-worker_assume_role"
},
"Action": "sts:AssumeRole"
}
]
}
aws iam create-role --role-name k8s-us-east-1-dev-eks-webhook2iam2ecr --assume-role-policy-document file://kube2iam-trust.json
aws iam attach-role-policy --role-name k8s-us-east-1-dev-eks-webhook2iam2ecr --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
Hi @gxpd-jjh, thanks for the details, we'll try to resolve this issue this week!
Hi @gxpd-jjh, thanks for the details, we'll try to resolve this issue this week!
Any luck?
Or tips to help me further debug why its not behaving in our cross-account ECR setup?
Our production is on v1.14.3 and its still behaving properly.
ECR authentication via IAM roles for service accounts (OIDC auth) is also broken with 1.21.1. Downgrading to 1.20.0 fixes the problem for me.
ECR authentication via IAM roles for service accounts (OIDC auth) is also broken with 1.21.1. Downgrading to 1.20.0 fixes the problem for me.
Got excited as I never went back to v1.20.0 after all my testing, but still isn't working for me. Looking back I see even v1.19.x didn't work.
What else are you using (ie: kube2iam ,etc.) in setup? or is it your nodes have a Worker Role, that do the trust sts:assumeRole to the role with ECR Repo permissions?
ECR authentication via IAM roles for service accounts (OIDC auth) is also broken with 1.21.1. Downgrading to 1.20.0 fixes the problem for me.
Got excited as I never went back to v1.20.0 after all my testing, but still isn't working for me. Looking back I see even v1.19.x didn't work.
What else are you using (ie: kube2iam ,etc.) in setup? or is it your nodes have a Worker Role, that do the trust sts:assumeRole to the role with ECR Repo permissions?
Just OIDC federation: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
ECR authentication via IAM roles for service accounts (OIDC auth) is also broken with 1.21.1. Downgrading to 1.20.0 fixes the problem for me.
Got excited as I never went back to v1.20.0 after all my testing, but still isn't working for me. Looking back I see even v1.19.x didn't work. What else are you using (ie: kube2iam ,etc.) in setup? or is it your nodes have a Worker Role, that do the trust sts:assumeRole to the role with ECR Repo permissions?
Just OIDC federation: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
Same AFAIK. Did you have any annotations/etc. for the webhook specifically to tell it to do OIDC? (I upgraded from v1.14 and ran into trouble immediately)
What I have found:
The image itself is able to be pulled from ECR using OIDC.
But, the webhook needs to find the image CMD/ENTRYPOINT when it is not overriden in the template/chart -- and that ECR/Image lookup breaks. From blackbox view, it feels like it only knows Pull Secrets, and if they are not there then it fails.
ECR authentication via IAM roles for service accounts (OIDC auth) is also broken with 1.21.1. Downgrading to 1.20.0 fixes the problem for me.
Got excited as I never went back to v1.20.0 after all my testing, but still isn't working for me. Looking back I see even v1.19.x didn't work. What else are you using (ie: kube2iam ,etc.) in setup? or is it your nodes have a Worker Role, that do the trust sts:assumeRole to the role with ECR Repo permissions?
Just OIDC federation: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
Same AFAIK. Did you have any annotations/etc. for the webhook specifically to tell it to do OIDC? (I upgraded from v1.14 and ran into trouble immediately)
What I have found: The image itself is able to be pulled from ECR using OIDC. But, the webhook needs to find the image CMD/ENTRYPOINT when it is not overriden in the template/chart -- and that ECR/Image lookup breaks. From blackbox view, it feels like it only knows Pull Secrets, and if they are not there then it fails.
All I have is a serviceaccount with the eks.amazonaws.com/role-arn annotation. EKS will then inject the needed ENVs and mounts that should then get picked up by the SDK.
All I have is a serviceaccount with the
eks.amazonaws.com/role-arnannotation. EKS will then inject the needed ENVs and mounts that should then get picked up by the SDK.
Thank you so far btw!!
Mine looks similar..
Environment:
TLS_CERT_FILE: /var/serving-cert/tls.crt
TLS_PRIVATE_KEY_FILE: /var/serving-cert/tls.key
LISTEN_ADDRESS: :8443
VAULT_ENV_IMAGE: ghcr.io/bank-vaults/vault-env:v1.20.1
VAULT_ADDR: http://vault.vault.svc.cluster.local:8200
VAULT_ENV_MEMORY_LIMIT: 100Mi
VAULT_ENV_MEMORY_REQUEST: 50Mi
VAULT_IMAGE: vault:1.13.2
VAULT_PATH: kubernetes
VAULT_SKIP_VERIFY: true
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_DEFAULT_REGION: us-east-1
AWS_REGION: us-east-1
AWS_ROLE_ARN: arn:aws:iam::XXdevXX:role/k8s-us-east-1-dev-eks-webhook2ecr
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dfhf7 (ro)
/var/serving-cert from serving-cert (rw)
The fact the image in ECR gets pulled using similar setup is what has me confused.
Still getting:
time=2024-03-29T17:14:19.797Z level=ERROR msg="Admission review request failed" app=vault-secrets-webhook ns=default name="" path=/pods webhook-id=vault-secrets-pods dry-run=false trace-id="" wh-version=v1beta1 kind=v1/Pod request-id=e4d3540b-d297-43e1-a468-65c096eba4d8 op=create webhook-kind=mutating error="GET https://MAINECR.dkr.ecr.us-east-1.amazonaws.com/v2/search-service/manifests/v1.0.59: unexpected status code 401 Unauthorized: Not Authorized\n\ncannot fetch image descriptor\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.getImageConfig\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/registry.go:196\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*Registry).GetImageConfig\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/registry.go:152\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*MutatingWebhook).mutateContainers\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/pod.go:275\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*MutatingWebhook).MutatePod\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/pod.go:76\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*MutatingWebhook).VaultSecretsMutator\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/webhook.go:78\nmain.main.ErrorLoggerMutator.func6\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/webhook.go:320\ngithub.com/slok/kubewebhook/v2/pkg/webhook/mutating.MutatorFunc.Mutate\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/mutating/mutator.go:45\ngithub.com/slok/kubewebhook/v2/pkg/webhook/mutating.mutatingWebhook.mutatingAdmissionReview\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/mutating/webhook.go:116\ngithub.com/slok/kubewebhook/v2/pkg/webhook/mutating.mutatingWebhook.Review\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/mutating/webhook.go:104\ngithub.com/slok/kubewebhook/v2/pkg/webhook.measuredWebhook.Review\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/metrics.go:113\ngithub.com/slok/kubewebhook/v2/pkg/http.handler.ServeHTTP\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/http/handler.go:148\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2683\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:3137\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2039\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1695"
time=2024-03-29T17:14:19.797Z level=ERROR msg="Admission review error: could not mutate object: cannot fetch image descriptor: GET https://MAINECR.dkr.ecr.us-east-1.amazonaws.com/v2/search-service/manifests/v1.0.59: unexpected status code 401 Unauthorized: Not Authorized\n" app=vault-secrets-webhook svc=http.Handler wh-version=v1beta1 kind=v1/Pod request-id=e4d3540b-d297-43e1-a468-65c096eba4d8 op=create webhook-kind=mutating ns=default name="" path=/pods webhook-id=vault-secrets-pods dry-run=false trace-id=""
I've now tried with my main EKS Worker account (less secure) that would be used when the Node pulls the image -- and this new IAM Role I created specifically for the webhook.
One other thing I just noticed thats weird - even though the Webhook has the annotation - IAM is saying there has been no Activity. Makes me think again something is misconfigured where this Webhook doesnt know to use it
One other thing I just noticed thats weird - even though the Webhook has the annotation - IAM is saying there has been no Activity. Makes me think again something is misconfigured where this Webhook doesnt know to use it
@Tolsto Can you do me a favor and check if your Role shows Activity in IAM screen?
One other thing I just noticed thats weird - even though the Webhook has the annotation - IAM is saying there has been no Activity. Makes me think again something is misconfigured where this Webhook doesnt know to use it
@Tolsto Can you do me a favor and check if your Role shows Activity in IAM screen?
yes, it does.
FWIW - I'm also not seeing anything in my CloudTrail logs to even show the Webhook tried to assume identity.
time=2024-04-01T22:31:41.248Z level=ERROR msg="Admission review request failed" app=vault-secrets-webhook kind=v1/Pod path=/pods name="" webhook-id=vault-secrets-pods request-id=be8f20be-3303-40da-9bb3-03794e4c1a9c dry-run=false trace-id="" webhook-kind=mutating op=create wh-version=v1beta1 ns=entry-frontend-dev error="Get \"https://image-registry.openshift-image-registry.svc:5000/v2/\": tls: failed to verify certificate: x509: certificate signed by unknown authority\ncannot fetch image descriptor\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.getImageConfig\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/registry.go:197\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*Registry).GetImageConfig\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/registry.go:152\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*MutatingWebhook).mutateContainers\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/pod.go:273\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*MutatingWebhook).MutatePod\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/pod.go:74\ngithub.com/bank-vaults/vault-secrets-webhook/pkg/webhook.(*MutatingWebhook).VaultSecretsMutator\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/webhook.go:76\nmain.main.ErrorLoggerMutator.func6\n\t/usr/local/src/vault-secrets-webhook/pkg/webhook/webhook.go:318\ngithub.com/slok/kubewebhook/v2/pkg/webhook/mutating.MutatorFunc.Mutate\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/mutating/mutator.go:45\ngithub.com/slok/kubewebhook/v2/pkg/webhook/mutating.mutatingWebhook.mutatingAdmissionReview\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/mutating/webhook.go:116\ngithub.com/slok/kubewebhook/v2/pkg/webhook/mutating.mutatingWebhook.Review\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/mutating/webhook.go:104\ngithub.com/slok/kubewebhook/v2/pkg/webhook.measuredWebhook.Review\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/webhook/metrics.go:113\ngithub.com/slok/kubewebhook/v2/pkg/http.handler.ServeHTTP\n\t/go/pkg/mod/github.com/slok/kubewebhook/[email protected]/pkg/http/handler.go:148\nnet/http.(*ServeMux).ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2514\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2938\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:2009\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"
time=2024-04-01T22:31:41.248Z level=ERROR msg="Admission review error: could not mutate object: cannot fetch image descriptor: Get \"https://image-registry.openshift-image-registry.svc:5000/v2/\": tls: failed to verify certificate: x509: certificate signed by unknown authority" app=vault-secrets-webhook svc=http.Handler op=create wh-version=v1beta1 ns=entry-frontend-dev trace-id="" webhook-kind=mutating webhook-id=vault-secrets-pods request-id=be8f20be-3303-40da-9bb3-03794e4c1a9c dry-run=false kind=v1/Pod path=/pods name=""
hitting the same i think with 1.21.0 and 1.21.1
edit: i think my solution is: https://github.com/bank-vaults/vault-secrets-webhook/issues/366#issuecomment-1999487458 :) so i think mine is not an issue as it is expected
sorry for the delay folks on this one, we have been very busy with the new generic webhook implementation. we hope to have a fix for this by the end of next week. please use a working webhook version and hold of upgrade until then.
Hello. I would like to add some context as well.
We don't use kube2iam, but this also seems to be affecting the use of node roles to download ECR images.
The rest of the report still applies, on v1.21.0 I see the normal success logs like
{"time":"2024-04-18T13:30:00.168870472Z","level":"INFO","msg":"Admission review request handled","app":"vault-secrets-webhook","svc":"http.Handler","request-id":"4792d4c2-7d4a-4551-9676-00988e125488","dry-run":false,"kind":"v1/Pod","wh-version":"v1beta1","path":"/pods","trace-id":"","webhook-id":"vault-secrets-pods","name":"","webhook-kind":"mutating","op":"create","ns":"roberto-test","duration":9003570}
But on v1.21.1 on the same setup I get these errors
{"time":"2024-04-18T05:04:23.584165593Z","level":"ERROR","msg":"Admission review request failed","app":"vault-secrets-webhook","trace-id":"","webhook-id":"vault-secrets-pods","name":"","webhook-kind":"mutating","op":"create","wh-version":"v1beta1","dry-run":false,"ns":"roberto-test","path":"/pods","kind":"v1/Pod","request-id":"339ebc8e-930d-41c4-9e73-3dadd2548023","error":"cannot fetch image descriptor: GET https://797740695898.dkr.ecr.us-east-1.amazonaws.com/v2/roberto-sample-app/manifests/test: unexpected status code 401 Unauthorized: Not Authorized\n"}
Looking at the diffs for these versions, I could not find a culprit right away. I tried reverting the update of the github.com/google/go-containerregistry v0.19.0 version to v1.16.1 locally and that didn't fix the error. I also tried reverting the changes in the pkg/webhook/registry.go file without success.
Hello. I would like to add some context as well.
We don't use kube2iam, but this also seems to be affecting the use of node roles to download ECR images.
The rest of the report still applies, on
v1.21.0I see the normal success logs like{"time":"2024-04-18T13:30:00.168870472Z","level":"INFO","msg":"Admission review request handled","app":"vault-secrets-webhook","svc":"http.Handler","request-id":"4792d4c2-7d4a-4551-9676-00988e125488","dry-run":false,"kind":"v1/Pod","wh-version":"v1beta1","path":"/pods","trace-id":"","webhook-id":"vault-secrets-pods","name":"","webhook-kind":"mutating","op":"create","ns":"roberto-test","duration":9003570}But on
v1.21.1on the same setup I get these errors{"time":"2024-04-18T05:04:23.584165593Z","level":"ERROR","msg":"Admission review request failed","app":"vault-secrets-webhook","trace-id":"","webhook-id":"vault-secrets-pods","name":"","webhook-kind":"mutating","op":"create","wh-version":"v1beta1","dry-run":false,"ns":"roberto-test","path":"/pods","kind":"v1/Pod","request-id":"339ebc8e-930d-41c4-9e73-3dadd2548023","error":"cannot fetch image descriptor: GET https://797740695898.dkr.ecr.us-east-1.amazonaws.com/v2/roberto-sample-app/manifests/test: unexpected status code 401 Unauthorized: Not Authorized\n"}Looking at the diffs for these versions, I could not find a culprit right away. I tried reverting the update of the
github.com/google/go-containerregistry v0.19.0version tov1.16.1locally and that didn't fix the error. I also tried reverting the changes in thepkg/webhook/registry.gofile without success.
Pretty much what is happening to me. Try v1.14.3 - thats the last one that worked. I have a thousand replies in this issue if you want to ctrl-f me in this issue.
FYI - It's metadata parsing that is failing as the Template doesn't override the cmd/entrypoint. The image still does eventually run but no secrets and sidecar pod.
I haven't tested it myself, but I suspect if you were to override the cmd/entrypoint - it will work.
Try v1.14.3 - thats the last one that worked.
hmm, maybe our issues are not the same then, for me v1.21.0 works fine, only v1.21.1 breaks in this way.
I'll try bisecting the commits to figure out the culprit.
if you were to override the cmd/entrypoint - it will work.
Yes, that works because in this case vsw doesn't even reach out to the registry.
