gatekeeper
gatekeeper copied to clipboard
installation with helm stucks till reaching timeout
What steps did you take and what happened:
Good day ,
I have tried to deploy gatekeeper using helm charts as :
helm install -n gatekeeper-system gatekeeper gatekeeper/gatekeeper
it keeps stucking, and helm status shows "pending-install" .. then it fails after reaching timeout .
What did you expect to happen: gatekeeper is installed and running
Anything else you would like to add: by using --debug flag , it shows that it stuck where a job.bath starts , this job starts a pod "gatekeeper-update-namespace-label--1-z74vh" which adds labels to the gatekeeper namespace . This pod keeps starting then turns into error status . by showing the logs of this pod , it seems to show the below error :
I0302 08:46:35.941139 1 request.go:668] Waited for 1.192226109s due to client-side throttling, not priority and fairness, request: GET:https://10.43.0.1:443/apis/cert-manager.io/v1beta1?timeout=32s
Error from server (InternalError): Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": Post "https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s": context deadline exceeded
This shows a problem violating "API Priority and Fairness" The installation is fresh , no older version installed.
The interesting fact is that, the installation method (per official document) with :
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/release-3.7/deploy/gatekeeper.yaml
never use the former jobs.batch "gatekeeper-update-namespace-label", which cased the problem with the installation with helm .
kubernetes is based on rke2 , version 1.22.3
Environment:
- Gatekeeper version: 3.7
- Kubernetes version: (use
kubectl version): v1.22.3+rke2r1 - Kubectl version : 1.22.4
- Helm version : 3.6
We've seen the same behavior with k3s 1.21.4, gatekeeper 3.7.0, and helm 3.6.
any idea ?
I have this problem too on CD solution we use which only has Helm v3.1.2. It doesn't happen when I use Helm v3.8.0 on a different environment. But it looks like a different reason than yours.
I have this in the event logs of the cluster. From what I can see, helm creates the job before it creates the service account and job never starts, so helm timeouts at the end. So we can just blame old Helm for this.
48s Warning FailedCreate job/gatekeeper-update-namespace-label Error creating: pods "gatekeeper-update-namespace-label-" is forbidden: error looking up service account gatekeeper/gatekeeper-update-namespace-label: serviceaccount "gatekeeper-update-namespace-label" not found
A workaround I can think of is to disable that post-install update task, chart's values has the option for it, and update label manually via CLI/IAC or whatever you have. I haven't tested it yet though.
I am also facing same issue while installing gatekeeper via helm v3.8.1
Same problem with Helm Version v3.0.1 install gatekeeper using helm chart
Here are the steps I tried to repro the issue:
- Create a fresh 1.22.4 cluster with
kind. - Install helm 3.6.0
helm install --create-namespace -n gatekeeper-system gatekeeper gatekeeper/gatekeeper --version 3.7.0 --debug- Successfully installed gatekeeper.
I repeated the steps above with helm 3.7.0, 3.8.0, and 3.9.0 and I wasn't able to repro the issue.
When you said fresh install, did you make sure the gatekeeper-system namespace is deleted? It'd be nice if we can have the entire helm install debug log. I'd also recommend upgrading helm version.
Same problem with Helm Version v3.0.1 install gatekeeper using helm chart
I'd recommend upgrading your helm version since v3.0.1 is too old.
@chewong In my experience this is an intermittent problem. At one point I was seeing failures maybe 1 in 10 times.
I've had some luck adding --timeout 10m to the helm install command.
At one point I got the impression that the update-namespace-label pod was starting, attempting to do whatever it does, failing and then restarting relying on k8s crash loop backoff. Since that backs off before restarting it doesn't take too many failures before the next restart attempt happens after the timeout threshold.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
Would be nice to see this resolved.
dongillies@bl-mbp16-a3041:[a-leo]~$ helm version version.BuildInfo{Version:"v3.9.4", GitCommit:"dbc6d8e20fe1d58d50e6ed30f09a04a77e4c68db", GitTreeState:"clean", GoVersion:"go1.19"}
kubernetes v1.21 (aws)
commandline install fails with gatekeeper v3.7.0, v3.8.0, v3.9.0. Here is a log from v3.7.0.
kubectl delete namespace gatekeeper-system gatekeeper-policy-manager
kubectl delete crd -l gatekeeper.sh/system=yes
helm install -n gatekeeper-system --version v3.7.0 gatekeeper gatekeeper/gatekeeper --create-namespace --debug
$ k logs gatekeeper-update-namespace-label-nsw4q -n gatekeeper-system
I0927 22:02:03.617574 1 request.go:668] Waited for 1.162965264s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/templates.gatekeeper.sh/v1alpha1?timeout=32s
I0927 22:02:13.618160 1 request.go:668] Waited for 11.162924478s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/cert-manager.io/v1?timeout=32s
Error from server (InternalError): Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": Post "https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s": Address is not allowed
$ helm install -n gatekeeper-system --version v3.7.0 gatekeeper gatekeeper/gatekeeper --create-namespace --debug
install.go:178: [debug] Original chart version: "v3.7.0"
install.go:195: [debug] CHART PATH: /Users/dongillies/Library/Caches/helm/repository/gatekeeper-3.7.0.tgz
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
install.go:165: [debug] Clearing discovery cache
wait.go:48: [debug] beginning wait for 9 resources with timeout of 1m0s
W0927 14:59:31.729782 35184 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ServiceAccount
client.go:339: [debug] serviceaccounts "gatekeeper-admin-upgrade-crds" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ClusterRole
client.go:339: [debug] clusterroles.rbac.authorization.k8s.io "gatekeeper-admin-upgrade-crds" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ClusterRoleBinding
client.go:339: [debug] clusterrolebindings.rbac.authorization.k8s.io "gatekeeper-admin-upgrade-crds" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-update-crds-hook" Job
client.go:339: [debug] jobs.batch "gatekeeper-update-crds-hook" not found
client.go:128: [debug] creating 1 resource(s)
client.go:540: [debug] Watching for changes to Job gatekeeper-update-crds-hook with timeout of 5m0s
client.go:568: [debug] Add/Modify event for gatekeeper-update-crds-hook: ADDED
client.go:607: [debug] gatekeeper-update-crds-hook: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for gatekeeper-update-crds-hook: MODIFIED
client.go:607: [debug] gatekeeper-update-crds-hook: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for gatekeeper-update-crds-hook: MODIFIED
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ServiceAccount
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ClusterRole
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ClusterRoleBinding
client.go:310: [debug] Starting delete for "gatekeeper-update-crds-hook" Job
client.go:128: [debug] creating 14 resource(s)
W0927 14:59:39.142965 35184 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
client.go:310: [debug] Starting delete for "gatekeeper-update-namespace-label" ServiceAccount
client.go:339: [debug] serviceaccounts "gatekeeper-update-namespace-label" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-update-namespace-label" Role
client.go:339: [debug] roles.rbac.authorization.k8s.io "gatekeeper-update-namespace-label" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-update-namespace-label" RoleBinding
client.go:339: [debug] rolebindings.rbac.authorization.k8s.io "gatekeeper-update-namespace-label" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-update-namespace-label" Job
client.go:339: [debug] jobs.batch "gatekeeper-update-namespace-label" not found
client.go:128: [debug] creating 1 resource(s)
client.go:540: [debug] Watching for changes to Job gatekeeper-update-namespace-label with timeout of 5m0s
client.go:568: [debug] Add/Modify event for gatekeeper-update-namespace-label: ADDED
client.go:607: [debug] gatekeeper-update-namespace-label: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for gatekeeper-update-namespace-label: MODIFIED
client.go:607: [debug] gatekeeper-update-namespace-label: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
helm.go:84: [debug] failed post-install: timed out waiting for the condition
INSTALLATION FAILED
main.newInstallCmd.func2
helm.sh/helm/v3/cmd/helm/install.go:127
github.com/spf13/cobra.(*Command).execute
github.com/spf13/[email protected]/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/[email protected]/command.go:974
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/[email protected]/command.go:902
main.main
helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
runtime/proc.go:250
runtime.goexit
runtime/asm_amd64.s:1594
We are installing gatekeeper via terraform cloud. Terraform cloud has a helm provider with a 15 minutes timeout. We are installing roughly 130 resources including 50+ helm charts. Once we perform a terraform plan, the 15-minutes clock is ticking. Typically we don't notice that the terraform workspace is ready to apply until 1-2 mins later. So, that means the clock is ticking to get 130 resources including 50 charts installed in 13 minutes. For gatekeeper to take 5+ minutes to install correctly, is unreasonable in our environment. If adding extra time actually works its the slowest helm chart we have ever seen. It hasn't been working for several months for us.
@dwgillies-bluescape
It looks like you're deleting the namespace and trying to recreate it, leaving the ValidatingWebhookConfiguration (which points to a non-existent service due to the deleted namespace) in place.
This log line shows the failure calling the fail-closed webhook:
Error from server (InternalError): Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": Post "https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s": Address is not allowed
Deleting Gatekeeper's ValidatingWebhookConfiguration would eliminate the deadlock in your example (Helm would recreate it).
When using helm to install, please also use helm delete to remove ALL the resources deployed by this helm chart such that resources like ValidatingWebhookConfiguration won't be left behind. https://open-policy-agent.github.io/gatekeeper/website/docs/install#using-helm
Can you pls also give v3.9.0 a try since https://github.com/open-policy-agent/gatekeeper/pull/2052 checks for gatekeeper-webhook API availability as an initContainer to the gatekeeper-update-namespace-label job such that the namespace label container only runs after it confirms the webhook is available.
In cases where the pod is present and bootstrapping, the delays are likely due to readiness. /readyz wont show true until all data is cached, constraint templates observed, and whatever other bootstrapping is complete. If /readyz is returning false, then K8s LBs wont route traffic to the Service, which causes calls to that webhook to fail.
The namespace label check webhook technically does not have any bootstrapping dependencies (assuming a TLS cert is present), so it could be possible to host it on a separate pod (this config is not in the Helm chart), at the cost of some wasted resources.
Other solutions (assuming Rita's post, which showed up as I wrote this, doesn't work):
-
create Gatekeeper's namespace outside of the Helm chart (including the appropriate label). Unfortunately it doesn't look like we can apply the [ignore label](Error from server (InternalError): Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": Post "https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s": Address is not allowed) in Helm until after G8r is installed, so this can't be fixed by changing the chart.
-
Set
.Values.validatingWebhookCheckIgnoreFailurePolicytoIgnore, note that this makes the ability to create/update namespaces a privileged operation that can bypass policy. -
Limit G8r's dependency set such that the bootstrapping happens in an acceptable timeframe (note that if you are experiencing client throttling due to the # of unique GroupVersions in your cluster causing the client to throttle the discovery API, the latency may not be fixable without an update to disable/configure throttling on our end)
We are running istio and cilium on a govcloud kubernetes cluster on aws. Installation works on a non-istio similar cluster. The gatekeeper-update-namespace-label pod finishes in about 5-10 secs in THAT cluster.
@maxsmythe I have added your suggested commands to delete the hooks. @ritazh I have added your suggestion to delete the helm chart, use v3.9.0, and I also delete the namespace. the init container is succeeding its just the openpolicyagent/gatekeeper-crds:v3.9.0 container that fails.
kubectl delete mutatingwebhookconfigurations gatekeeper-mutating-webhook-configuration
kubectl delete validatingwebhookconfigurations gatekeeper-validating-webhook-configuration
kubectl delete crd -l gatekeeper.sh/system=yes
helm delete gatekeeper -n gatekeeper-system
kubectl delete namespace gatekeeper-system
helm install -n gatekeeper-system --version v3.9.0 gatekeeper gatekeeper/gatekeeper --create-namespace --debug --timeout 900s
install.go:178: [debug] Original chart version: "v3.9.0"
install.go:195: [debug] CHART PATH: /Users/dongillies/Library/Caches/helm/repository/gatekeeper-3.9.0.tgz
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
client.go:128: [debug] creating 1 resource(s)
install.go:165: [debug] Clearing discovery cache
wait.go:48: [debug] beginning wait for 9 resources with timeout of 1m0s
W0927 19:31:41.315993 47872 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ServiceAccount
client.go:339: [debug] serviceaccounts "gatekeeper-admin-upgrade-crds" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ClusterRole
client.go:339: [debug] clusterroles.rbac.authorization.k8s.io "gatekeeper-admin-upgrade-crds" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ClusterRoleBinding
client.go:339: [debug] clusterrolebindings.rbac.authorization.k8s.io "gatekeeper-admin-upgrade-crds" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-update-crds-hook" Job
client.go:339: [debug] jobs.batch "gatekeeper-update-crds-hook" not found
client.go:128: [debug] creating 1 resource(s)
client.go:540: [debug] Watching for changes to Job gatekeeper-update-crds-hook with timeout of 15m0s
client.go:568: [debug] Add/Modify event for gatekeeper-update-crds-hook: ADDED
client.go:607: [debug] gatekeeper-update-crds-hook: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for gatekeeper-update-crds-hook: MODIFIED
client.go:607: [debug] gatekeeper-update-crds-hook: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for gatekeeper-update-crds-hook: MODIFIED
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ServiceAccount
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ClusterRole
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ClusterRoleBinding
client.go:310: [debug] Starting delete for "gatekeeper-update-crds-hook" Job
client.go:128: [debug] creating 14 resource(s)
W0927 19:31:48.767738 47872 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
client.go:310: [debug] Starting delete for "gatekeeper-update-namespace-label" ServiceAccount
client.go:339: [debug] serviceaccounts "gatekeeper-update-namespace-label" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-update-namespace-label" Role
client.go:339: [debug] roles.rbac.authorization.k8s.io "gatekeeper-update-namespace-label" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-update-namespace-label" RoleBinding
client.go:339: [debug] rolebindings.rbac.authorization.k8s.io "gatekeeper-update-namespace-label" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-update-namespace-label" Job
client.go:339: [debug] jobs.batch "gatekeeper-update-namespace-label" not found
client.go:128: [debug] creating 1 resource(s)
client.go:540: [debug] Watching for changes to Job gatekeeper-update-namespace-label with timeout of 15m0s
client.go:568: [debug] Add/Modify event for gatekeeper-update-namespace-label: ADDED
client.go:607: [debug] gatekeeper-update-namespace-label: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for gatekeeper-update-namespace-label: MODIFIED
client.go:607: [debug] gatekeeper-update-namespace-label: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
Error: INSTALLATION FAILED: failed post-install: job failed: BackoffLimitExceeded
helm.go:84: [debug] failed post-install: job failed: BackoffLimitExceeded
INSTALLATION FAILED
main.newInstallCmd.func2
helm.sh/helm/v3/cmd/helm/install.go:127
github.com/spf13/cobra.(*Command).execute
github.com/spf13/[email protected]/command.go:856
github.com/spf13/cobra.(*Command).ExecuteC
github.com/spf13/[email protected]/command.go:974
github.com/spf13/cobra.(*Command).Execute
github.com/spf13/[email protected]/command.go:902
main.main
helm.sh/helm/v3/cmd/helm/helm.go:83
runtime.main
runtime/proc.go:250
runtime.goexit
runtime/asm_amd64.s:1594
# gatekeeper-update-namespace-label still fails, after 5 attempts, in 6 mins
Error from server (InternalError): Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": Post "https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s": Address is not allowed
Error from server (InternalError): Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": Post "https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s": Address is not allowed
Error from server (InternalError): Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": Post "https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s": Address is not allowed
Error from server (InternalError): Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": Post "https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s": Address is not allowed
Error from server (InternalError): Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": Post "https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s": Address is not allowed
dongillies@bl-mbp16-a3041:[a-gstg1]~/repo/terraform-helm-k8s-generic$ helm list -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
...
gatekeeper gatekeeper-system 1 2022-09-27 19:10:51.567846 -0700 PDT failed gatekeeper-3.9.0 v3.9.0
$ kubectl get events --sort-by=.metadata.creationTimestamp -A
...
gatekeeper-system 10m Normal Scheduled pod/gatekeeper-update-namespace-label-zgkts Successfully assigned gatekeeper-system/gatekeeper-update-namespace-label-zgkts to ip-10-64-124-100.us-gov-west-1.compute.internal
gatekeeper-system 10m Normal Pulled pod/gatekeeper-controller-manager-5447bbd765-js5cd Container image "openpolicyagent/gatekeeper:v3.9.0" already present on machine
gatekeeper-system 10m Normal Pulled pod/gatekeeper-controller-manager-5447bbd765-dzx74 Container image "openpolicyagent/gatekeeper:v3.9.0" already present on machine
gatekeeper-system 10m Normal Created pod/gatekeeper-controller-manager-5447bbd765-js5cd Created container manager
gatekeeper-system 10m Normal Started pod/gatekeeper-controller-manager-5447bbd765-js5cd Started container manager
gatekeeper-system 10m Normal Started pod/gatekeeper-controller-manager-5447bbd765-dzx74 Started container manager
gatekeeper-system 10m Warning Unhealthy pod/gatekeeper-controller-manager-5447bbd765-js5cd Readiness probe failed: Get "http://10.2.17.243:9090/readyz": dial tcp 10.2.17.243:9090: connect: connection refused
gatekeeper-system 9m46s Normal Pulled pod/gatekeeper-update-namespace-label-zgkts Container image "curlimages/curl:7.83.1" already present on machine
gatekeeper-system 10m Warning Unhealthy pod/gatekeeper-controller-manager-5447bbd765-dzx74 Readiness probe failed: Get "http://10.2.29.96:9090/readyz": dial tcp 10.2.29.96:9090: connect: connection refused
gatekeeper-system 9m46s Normal Started pod/gatekeeper-update-namespace-label-zgkts Started container webhook-probe-post
gatekeeper-system 9m46s Normal Created pod/gatekeeper-update-namespace-label-zgkts Created container webhook-probe-post
gatekeeper-system 10m Normal Started pod/gatekeeper-audit-7df9d49f9c-f7g62 Started container manager
gatekeeper-system 10m Normal Created pod/gatekeeper-audit-7df9d49f9c-f7g62 Created container manager
gatekeeper-system 10m Normal Pulled pod/gatekeeper-audit-7df9d49f9c-f7g62 Container image "openpolicyagent/gatekeeper:v3.9.0" already present on machine
gatekeeper-system 10m Normal Pulled pod/gatekeeper-controller-manager-5447bbd765-qnkjk Container image "openpolicyagent/gatekeeper:v3.9.0" already present on machine
gatekeeper-system 10m Normal Created pod/gatekeeper-controller-manager-5447bbd765-qnkjk Created container manager
gatekeeper-system 10m Normal Started pod/gatekeeper-controller-manager-5447bbd765-qnkjk Started container manager
gatekeeper-system 10m Warning Unhealthy pod/gatekeeper-audit-7df9d49f9c-f7g62 Readiness probe failed: Get "http://10.2.21.135:9090/readyz": dial tcp 10.2.21.135:9090: connect: connection refused
gatekeeper-system 10m Warning Unhealthy pod/gatekeeper-controller-manager-5447bbd765-qnkjk Readiness probe failed: Get "http://10.2.21.12:9090/readyz": dial tcp 10.2.21.12:9090: connect: connection refused
gatekeeper-system 10m Warning BackOff pod/gatekeeper-update-namespace-label-zgkts Back-off restarting failed container
gatekeeper-system 9m2s Normal Pulled pod/gatekeeper-update-namespace-label-zgkts Container image "openpolicyagent/gatekeeper-crds:v3.9.0" already present on machine
gatekeeper-system 9m2s Normal Created pod/gatekeeper-update-namespace-label-zgkts Created container kubectl-label
gatekeeper-system 9m2s Normal Started pod/gatekeeper-update-namespace-label-zgkts Started container kubectl-label
gatekeeper-system 9m16s Warning BackOff pod/gatekeeper-update-namespace-label-zgkts Back-off restarting failed container
gatekeeper-system 8m20s Warning BackoffLimitExceeded job/gatekeeper-update-namespace-label Job has reached the specified backoff limit
gatekeeper-system 8m20s Normal SuccessfulDelete job/gatekeeper-update-namespace-label Deleted pod: gatekeeper-update-namespace-label-zgkts
# right about here I probably did a helm upgrade --install of gatekeeper to set the status to "success"
gatekeeper-system 7m16s Normal Scheduled pod/gatekeeper-update-crds-hook-5gb9b Successfully assigned gatekeeper-system/gatekeeper-update-crds-hook-5gb9b to ip-10-64-124-100.us-gov-west-1.compute.internal
gatekeeper-system 7m16s Normal SuccessfulCreate job/gatekeeper-update-crds-hook Created pod: gatekeeper-update-crds-hook-5gb9b
gatekeeper-system 7m14s Normal Pulled pod/gatekeeper-update-crds-hook-5gb9b Container image "openpolicyagent/gatekeeper-crds:v3.9.0" already present on machine
gatekeeper-system 7m14s Normal Created pod/gatekeeper-update-crds-hook-5gb9b Created container crds-upgrade
gatekeeper-system 7m14s Normal Started pod/gatekeeper-update-crds-hook-5gb9b Started container crds-upgrade
gatekeeper-system 7m12s Normal Completed job/gatekeeper-update-crds-hook Job completed
=======
our terraform and helm charts used to work, they used to install gatekeeper in february. There has been no change to this part of our configs since then, but now they are failing.
The weird thing is that in the "failed" state I can change the helm gatekeeper install status to "succeeded" but I think this is just a bug in your helm chart implementation since it's not running the hook that breaks installation the second time. And we cannot run a helm upgrade from terraform.
$ helm upgrade --install -n gatekeeper-system --version v3.9.0 gatekeeper gatekeeper/gatekeeper --create-namespace --debug --timeout 900s
history.go:56: [debug] getting history for release gatekeeper
upgrade.go:142: [debug] preparing upgrade for gatekeeper
upgrade.go:150: [debug] performing update for gatekeeper
upgrade.go:322: [debug] creating upgraded release for gatekeeper
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ServiceAccount
client.go:339: [debug] serviceaccounts "gatekeeper-admin-upgrade-crds" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ClusterRole
client.go:339: [debug] clusterroles.rbac.authorization.k8s.io "gatekeeper-admin-upgrade-crds" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ClusterRoleBinding
client.go:339: [debug] clusterrolebindings.rbac.authorization.k8s.io "gatekeeper-admin-upgrade-crds" not found
client.go:128: [debug] creating 1 resource(s)
client.go:310: [debug] Starting delete for "gatekeeper-update-crds-hook" Job
client.go:339: [debug] jobs.batch "gatekeeper-update-crds-hook" not found
client.go:128: [debug] creating 1 resource(s)
client.go:540: [debug] Watching for changes to Job gatekeeper-update-crds-hook with timeout of 15m0s
client.go:568: [debug] Add/Modify event for gatekeeper-update-crds-hook: ADDED
client.go:607: [debug] gatekeeper-update-crds-hook: Jobs active: 0, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for gatekeeper-update-crds-hook: MODIFIED
client.go:607: [debug] gatekeeper-update-crds-hook: Jobs active: 1, jobs failed: 0, jobs succeeded: 0
client.go:568: [debug] Add/Modify event for gatekeeper-update-crds-hook: MODIFIED
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ServiceAccount
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ClusterRole
client.go:310: [debug] Starting delete for "gatekeeper-admin-upgrade-crds" ClusterRoleBinding
client.go:310: [debug] Starting delete for "gatekeeper-update-crds-hook" Job
client.go:229: [debug] checking 14 resources for changes
client.go:521: [debug] Patch ResourceQuota "gatekeeper-critical-pods" in namespace gatekeeper-system
W0927 19:34:50.962344 47875 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W0927 19:34:50.997090 47875 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
client.go:512: [debug] Looks like there are no changes for PodSecurityPolicy "gatekeeper-admin"
W0927 19:34:51.030471 47875 warnings.go:70] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
client.go:521: [debug] Patch PodDisruptionBudget "gatekeeper-controller-manager" in namespace gatekeeper-system
client.go:512: [debug] Looks like there are no changes for ServiceAccount "gatekeeper-admin"
client.go:512: [debug] Looks like there are no changes for Secret "gatekeeper-webhook-server-cert"
client.go:521: [debug] Patch ClusterRole "gatekeeper-manager-role" in namespace
client.go:512: [debug] Looks like there are no changes for ClusterRoleBinding "gatekeeper-manager-rolebinding"
client.go:521: [debug] Patch Role "gatekeeper-manager-role" in namespace gatekeeper-system
client.go:512: [debug] Looks like there are no changes for RoleBinding "gatekeeper-manager-rolebinding"
client.go:512: [debug] Looks like there are no changes for Service "gatekeeper-webhook-service"
client.go:521: [debug] Patch Deployment "gatekeeper-audit" in namespace gatekeeper-system
client.go:521: [debug] Patch Deployment "gatekeeper-controller-manager" in namespace gatekeeper-system
client.go:521: [debug] Patch MutatingWebhookConfiguration "gatekeeper-mutating-webhook-configuration" in namespace
client.go:521: [debug] Patch ValidatingWebhookConfiguration "gatekeeper-validating-webhook-configuration" in namespace
upgrade.go:157: [debug] updating status for upgraded release for gatekeeper
Release "gatekeeper" has been upgraded. Happy Helming!
NAME: gatekeeper
LAST DEPLOYED: Tue Sep 27 19:34:40 2022
NAMESPACE: gatekeeper-system
STATUS: deployed
REVISION: 2
TEST SUITE: None
USER-SUPPLIED VALUES:
{}
COMPUTED VALUES:
audit:
affinity: {}
disableCertRotation: true
dnsPolicy: ClusterFirst
extraRules: []
healthPort: 9090
hostNetwork: false
metricsPort: 8888
nodeSelector:
kubernetes.io/os: linux
podSecurityContext:
fsGroup: 999
supplementalGroups:
- 999
priorityClassName: system-cluster-critical
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 999
runAsNonRoot: true
runAsUser: 1000
tolerations: []
writeToRAMDisk: false
auditChunkSize: 500
auditFromCache: false
auditInterval: 60
auditMatchKindOnly: false
constraintViolationsLimit: 20
controllerManager:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: gatekeeper.sh/operation
operator: In
values:
- webhook
topologyKey: kubernetes.io/hostname
weight: 100
disableCertRotation: false
dnsPolicy: ClusterFirst
exemptNamespacePrefixes: []
exemptNamespaces: []
extraRules: []
healthPort: 9090
hostNetwork: false
metricsPort: 8888
nodeSelector:
kubernetes.io/os: linux
podSecurityContext:
fsGroup: 999
supplementalGroups:
- 999
port: 8443
priorityClassName: system-cluster-critical
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 999
runAsNonRoot: true
runAsUser: 1000
tolerations: []
crds:
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
disableMutation: false
disableValidatingWebhook: false
disabledBuiltins:
- '{http.send}'
emitAdmissionEvents: false
emitAuditEvents: false
enableDeleteOperations: false
enableExternalData: false
enableRuntimeDefaultSeccompProfile: true
enableTLSHealthcheck: false
image:
crdRepository: openpolicyagent/gatekeeper-crds
pullPolicy: IfNotPresent
pullSecrets: []
release: v3.9.0
repository: openpolicyagent/gatekeeper
logDenies: false
logLevel: INFO
logMutations: false
metricsBackends:
- prometheus
mutatingWebhookCustomRules: {}
mutatingWebhookExemptNamespacesLabels: {}
mutatingWebhookFailurePolicy: Ignore
mutatingWebhookObjectSelector: {}
mutatingWebhookReinvocationPolicy: Never
mutatingWebhookTimeoutSeconds: 1
mutationAnnotations: false
pdb:
controllerManager:
minAvailable: 1
podAnnotations: {}
podCountLimit: 100
podLabels: {}
postInstall:
labelNamespace:
enabled: true
extraNamespaces: []
extraRules: []
image:
pullPolicy: IfNotPresent
pullSecrets: []
repository: openpolicyagent/gatekeeper-crds
tag: v3.9.0
probeWebhook:
enabled: true
httpTimeout: 2
image:
pullPolicy: IfNotPresent
pullSecrets: []
repository: curlimages/curl
tag: 7.83.1
insecureHTTPS: false
waitTimeout: 60
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 999
runAsNonRoot: true
runAsUser: 1000
postUpgrade:
labelNamespace:
enabled: false
extraNamespaces: []
image:
pullPolicy: IfNotPresent
pullSecrets: []
repository: openpolicyagent/gatekeeper-crds
tag: v3.9.0
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 999
runAsNonRoot: true
runAsUser: 1000
preUninstall:
deleteWebhookConfigurations:
enabled: false
extraRules: []
image:
pullPolicy: IfNotPresent
pullSecrets: []
repository: openpolicyagent/gatekeeper-crds
tag: v3.9.0
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 999
runAsNonRoot: true
runAsUser: 1000
psp:
enabled: true
rbac:
create: true
replicas: 3
resourceQuota: true
secretAnnotations: {}
service: {}
upgradeCRDs:
enabled: true
extraRules: []
tolerations: []
validatingWebhookCheckIgnoreFailurePolicy: Fail
validatingWebhookCustomRules: {}
validatingWebhookExemptNamespacesLabels: {}
validatingWebhookFailurePolicy: Ignore
validatingWebhookObjectSelector: {}
validatingWebhookTimeoutSeconds: 3
HOOKS:
---
# Source: gatekeeper/templates/namespace-post-install.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: gatekeeper-update-namespace-label
labels:
release: gatekeeper
heritage: Helm
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded,before-hook-creation
---
# Source: gatekeeper/templates/upgrade-crds-hook.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
release: gatekeeper
heritage: Helm
name: gatekeeper-admin-upgrade-crds
namespace: 'gatekeeper-system'
annotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: "hook-succeeded,before-hook-creation"
helm.sh/hook-weight: "1"
---
# Source: gatekeeper/templates/upgrade-crds-hook.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: gatekeeper-admin-upgrade-crds
labels:
release: gatekeeper
heritage: Helm
annotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: "hook-succeeded,before-hook-creation"
helm.sh/hook-weight: "1"
rules:
- apiGroups: ["apiextensions.k8s.io"]
resources: ["customresourcedefinitions"]
verbs: ["get", "create", "update", "patch"]
---
# Source: gatekeeper/templates/upgrade-crds-hook.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gatekeeper-admin-upgrade-crds
labels:
release: gatekeeper
heritage: Helm
annotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-delete-policy: "hook-succeeded,before-hook-creation"
helm.sh/hook-weight: "1"
subjects:
- kind: ServiceAccount
name: gatekeeper-admin-upgrade-crds
namespace: gatekeeper-system
roleRef:
kind: ClusterRole
name: gatekeeper-admin-upgrade-crds
apiGroup: rbac.authorization.k8s.io
---
# Source: gatekeeper/templates/namespace-post-install.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: gatekeeper-update-namespace-label
labels:
release: gatekeeper
heritage: Helm
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded,before-hook-creation
rules:
- apiGroups:
- ""
resources:
- namespaces
verbs:
- get
- update
- patch
resourceNames:
- gatekeeper-system
---
# Source: gatekeeper/templates/namespace-post-install.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: gatekeeper-update-namespace-label
labels:
release: gatekeeper
heritage: Helm
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded,before-hook-creation
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: gatekeeper-update-namespace-label
subjects:
- kind: ServiceAccount
name: gatekeeper-update-namespace-label
namespace: "gatekeeper-system"
---
# Source: gatekeeper/templates/namespace-post-install.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: gatekeeper-update-namespace-label
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
annotations:
"helm.sh/hook": post-install
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": hook-succeeded,before-hook-creation
spec:
template:
metadata:
annotations:
{}
labels:
app: 'gatekeeper'
release: 'gatekeeper'
spec:
restartPolicy: OnFailure
serviceAccount: gatekeeper-update-namespace-label
nodeSelector:
kubernetes.io/os: linux
volumes:
- name: cert
secret:
secretName: gatekeeper-webhook-server-cert
initContainers:
- name: webhook-probe-post
image: "curlimages/curl:7.83.1"
imagePullPolicy: IfNotPresent
args:
- "--retry"
- "99999"
- "--retry-max-time"
- "60"
- "--retry-delay"
- "1"
- "--max-time"
- "2"
- "--cacert"
- /certs/ca.crt
- "-v"
- "https://gatekeeper-webhook-service.gatekeeper-system.svc/v1/admitlabel?timeout=2s"
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 999
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /certs
name: cert
readOnly: true
containers:
- name: kubectl-label
image: "openpolicyagent/gatekeeper-crds:v3.9.0"
imagePullPolicy: IfNotPresent
args:
- label
- ns
- gatekeeper-system
- admission.gatekeeper.sh/ignore=no-self-managing
- --overwrite
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 999
runAsNonRoot: true
runAsUser: 1000
---
# Source: gatekeeper/templates/upgrade-crds-hook.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: gatekeeper-update-crds-hook
namespace: gatekeeper-system
labels:
app: gatekeeper
chart: gatekeeper
gatekeeper.sh/system: "yes"
heritage: Helm
release: gatekeeper
annotations:
helm.sh/hook: pre-install,pre-upgrade
helm.sh/hook-weight: "1"
helm.sh/hook-delete-policy: "hook-succeeded,before-hook-creation"
spec:
backoffLimit: 0
template:
metadata:
name: gatekeeper-update-crds-hook
annotations:
{}
spec:
serviceAccountName: gatekeeper-admin-upgrade-crds
restartPolicy: Never
containers:
- name: crds-upgrade
image: 'openpolicyagent/gatekeeper-crds:v3.9.0'
imagePullPolicy: 'IfNotPresent'
args:
- apply
- -f
- crds/
resources:
{}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 65532
runAsNonRoot: true
runAsUser: 65532
affinity:
null
nodeSelector:
kubernetes.io/os: linux
tolerations:
[]
MANIFEST:
---
# Source: gatekeeper/templates/gatekeeper-critical-pods-resourcequota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-critical-pods
namespace: 'gatekeeper-system'
spec:
hard:
pods: 100
scopeSelector:
matchExpressions:
- operator: In
scopeName: PriorityClass
values:
- system-cluster-critical
- system-cluster-critical
---
# Source: gatekeeper/templates/gatekeeper-admin-podsecuritypolicy.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: '*'
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-admin
spec:
allowPrivilegeEscalation: false
fsGroup:
ranges:
- max: 65535
min: 1
rule: MustRunAs
requiredDropCapabilities:
- ALL
runAsUser:
rule: MustRunAsNonRoot
seLinux:
rule: RunAsAny
supplementalGroups:
ranges:
- max: 65535
min: 1
rule: MustRunAs
volumes:
- configMap
- projected
- secret
- downwardAPI
- emptyDir
---
# Source: gatekeeper/templates/gatekeeper-controller-manager-poddisruptionbudget.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-controller-manager
namespace: 'gatekeeper-system'
spec:
minAvailable: 1
selector:
matchLabels:
app: 'gatekeeper'
chart: 'gatekeeper'
control-plane: controller-manager
gatekeeper.sh/operation: webhook
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
---
# Source: gatekeeper/templates/gatekeeper-admin-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-admin
namespace: 'gatekeeper-system'
---
# Source: gatekeeper/templates/gatekeeper-webhook-server-cert-secret.yaml
apiVersion: v1
kind: Secret
metadata:
annotations:
{}
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-webhook-server-cert
namespace: 'gatekeeper-system'
---
# Source: gatekeeper/templates/gatekeeper-manager-role-clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: null
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-manager-role
rules:
- apiGroups:
- '*'
resources:
- '*'
verbs:
- get
- list
- watch
- apiGroups:
- admissionregistration.k8s.io
resourceNames:
- gatekeeper-mutating-webhook-configuration
resources:
- mutatingwebhookconfigurations
verbs:
- get
- list
- patch
- update
- watch
- apiGroups:
- apiextensions.k8s.io
resources:
- customresourcedefinitions
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- config.gatekeeper.sh
resources:
- configs
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- config.gatekeeper.sh
resources:
- configs/status
verbs:
- get
- patch
- update
- apiGroups:
- constraints.gatekeeper.sh
resources:
- '*'
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- externaldata.gatekeeper.sh
resources:
- providers
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- mutations.gatekeeper.sh
resources:
- '*'
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- policy
resourceNames:
- gatekeeper-admin
resources:
- podsecuritypolicies
verbs:
- use
- apiGroups:
- status.gatekeeper.sh
resources:
- '*'
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- templates.gatekeeper.sh
resources:
- constrainttemplates
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
- apiGroups:
- templates.gatekeeper.sh
resources:
- constrainttemplates/finalizers
verbs:
- delete
- get
- patch
- update
- apiGroups:
- templates.gatekeeper.sh
resources:
- constrainttemplates/status
verbs:
- get
- patch
- update
- apiGroups:
- admissionregistration.k8s.io
resourceNames:
- gatekeeper-validating-webhook-configuration
resources:
- validatingwebhookconfigurations
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
---
# Source: gatekeeper/templates/gatekeeper-manager-rolebinding-clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-manager-rolebinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: gatekeeper-manager-role
subjects:
- kind: ServiceAccount
name: gatekeeper-admin
namespace: 'gatekeeper-system'
---
# Source: gatekeeper/templates/gatekeeper-manager-role-role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
creationTimestamp: null
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-manager-role
namespace: 'gatekeeper-system'
rules:
- apiGroups:
- ""
resources:
- events
verbs:
- create
- patch
- apiGroups:
- ""
resources:
- secrets
verbs:
- create
- delete
- get
- list
- patch
- update
- watch
---
# Source: gatekeeper/templates/gatekeeper-manager-rolebinding-rolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-manager-rolebinding
namespace: 'gatekeeper-system'
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: gatekeeper-manager-role
subjects:
- kind: ServiceAccount
name: gatekeeper-admin
namespace: 'gatekeeper-system'
---
# Source: gatekeeper/templates/gatekeeper-webhook-service-service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-webhook-service
namespace: 'gatekeeper-system'
spec:
ports:
- name: https-webhook-server
port: 443
targetPort: webhook-server
selector:
app: 'gatekeeper'
chart: 'gatekeeper'
control-plane: controller-manager
gatekeeper.sh/operation: webhook
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
---
# Source: gatekeeper/templates/gatekeeper-audit-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
control-plane: audit-controller
gatekeeper.sh/operation: audit
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-audit
namespace: 'gatekeeper-system'
spec:
replicas: 1
selector:
matchLabels:
app: 'gatekeeper'
chart: 'gatekeeper'
control-plane: audit-controller
gatekeeper.sh/operation: audit
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
template:
metadata:
annotations:
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
control-plane: audit-controller
gatekeeper.sh/operation: audit
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
spec:
affinity:
{}
automountServiceAccountToken: true
containers:
-
image: openpolicyagent/gatekeeper:v3.9.0
args:
- --audit-interval=60
- --log-level=INFO
- --constraint-violations-limit=20
- --audit-from-cache=false
- --audit-chunk-size=500
- --audit-match-kind-only=false
- --emit-audit-events=false
- --operation=audit
- --operation=status
- --operation=mutation-status
- --logtostderr
- --health-addr=:9090
- --prometheus-port=8888
- --enable-external-data=false
- --metrics-backend=prometheus
- --disable-cert-rotation=true
command:
- /manager
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: CONTAINER_NAME
value: manager
imagePullPolicy: 'IfNotPresent'
livenessProbe:
httpGet:
path: /healthz
port: 9090
name: manager
ports:
- containerPort: 8888
name: metrics
protocol: TCP
- containerPort: 9090
name: healthz
protocol: TCP
readinessProbe:
httpGet:
path: /readyz
port: 9090
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
securityContext:
seccompProfile:
type: RuntimeDefault
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 999
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /certs
name: cert
readOnly: true
- mountPath: /tmp/audit
name: tmp-volume
dnsPolicy: ClusterFirst
hostNetwork: false
imagePullSecrets:
[]
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
securityContext:
fsGroup: 999
supplementalGroups:
- 999
serviceAccountName: gatekeeper-admin
terminationGracePeriodSeconds: 60
tolerations:
[]
volumes:
- name: cert
secret:
defaultMode: 420
secretName: gatekeeper-webhook-server-cert
- emptyDir: {}
name: tmp-volume
---
# Source: gatekeeper/templates/gatekeeper-controller-manager-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
control-plane: controller-manager
gatekeeper.sh/operation: webhook
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-controller-manager
namespace: 'gatekeeper-system'
spec:
replicas: 3
selector:
matchLabels:
app: 'gatekeeper'
chart: 'gatekeeper'
control-plane: controller-manager
gatekeeper.sh/operation: webhook
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
template:
metadata:
annotations:
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
control-plane: controller-manager
gatekeeper.sh/operation: webhook
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: gatekeeper.sh/operation
operator: In
values:
- webhook
topologyKey: kubernetes.io/hostname
weight: 100
automountServiceAccountToken: true
containers:
-
image: openpolicyagent/gatekeeper:v3.9.0
args:
- --port=8443
- --health-addr=:9090
- --prometheus-port=8888
- --logtostderr
- --log-denies=false
- --emit-admission-events=false
- --log-level=INFO
- --exempt-namespace=gatekeeper-system
- --operation=webhook
- --enable-external-data=false
- --log-mutations=false
- --mutation-annotations=false
- --disable-cert-rotation=false
- --metrics-backend=prometheus
- --operation=mutation-webhook
- --disable-opa-builtin={http.send}
command:
- /manager
env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: CONTAINER_NAME
value: manager
imagePullPolicy: 'IfNotPresent'
livenessProbe:
httpGet:
path: /healthz
port: 9090
name: manager
ports:
- containerPort: 8443
name: webhook-server
protocol: TCP
- containerPort: 8888
name: metrics
protocol: TCP
- containerPort: 9090
name: healthz
protocol: TCP
readinessProbe:
httpGet:
path: /readyz
port: 9090
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
securityContext:
seccompProfile:
type: RuntimeDefault
allowPrivilegeEscalation: false
capabilities:
drop:
- all
readOnlyRootFilesystem: true
runAsGroup: 999
runAsNonRoot: true
runAsUser: 1000
volumeMounts:
- mountPath: /certs
name: cert
readOnly: true
dnsPolicy: ClusterFirst
hostNetwork: false
imagePullSecrets:
[]
nodeSelector:
kubernetes.io/os: linux
priorityClassName: system-cluster-critical
securityContext:
fsGroup: 999
supplementalGroups:
- 999
serviceAccountName: gatekeeper-admin
terminationGracePeriodSeconds: 60
tolerations:
[]
volumes:
- name: cert
secret:
defaultMode: 420
secretName: gatekeeper-webhook-server-cert
---
# Source: gatekeeper/templates/gatekeeper-mutating-webhook-configuration-mutatingwebhookconfiguration.yaml
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-mutating-webhook-configuration
webhooks:
- admissionReviewVersions:
- v1
- v1beta1
clientConfig:
service:
name: gatekeeper-webhook-service
namespace: 'gatekeeper-system'
path: /v1/mutate
failurePolicy: Ignore
matchPolicy: Exact
name: mutation.gatekeeper.sh
namespaceSelector:
matchExpressions:
- key: admission.gatekeeper.sh/ignore
operator: DoesNotExist
objectSelector: {}
reinvocationPolicy: Never
rules:
- apiGroups:
- '*'
apiVersions:
- '*'
operations:
- CREATE
- UPDATE
resources:
- '*'
sideEffects: None
timeoutSeconds: 1
---
# Source: gatekeeper/templates/gatekeeper-validating-webhook-configuration-validatingwebhookconfiguration.yaml
apiVersion: admissionregistration.k8s.io/v1
kind: ValidatingWebhookConfiguration
metadata:
labels:
app: 'gatekeeper'
chart: 'gatekeeper'
gatekeeper.sh/system: "yes"
heritage: 'Helm'
release: 'gatekeeper'
name: gatekeeper-validating-webhook-configuration
webhooks:
- admissionReviewVersions:
- v1
- v1beta1
clientConfig:
service:
name: gatekeeper-webhook-service
namespace: 'gatekeeper-system'
path: /v1/admit
failurePolicy: Ignore
matchPolicy: Exact
name: validation.gatekeeper.sh
namespaceSelector:
matchExpressions:
- key: admission.gatekeeper.sh/ignore
operator: DoesNotExist
objectSelector: {}
rules:
- apiGroups:
- '*'
apiVersions:
- '*'
operations:
- CREATE
- UPDATE
resources:
- '*'
# Explicitly list all known subresources except "status" (to avoid destabilizing the cluster and increasing load on gatekeeper).
# You can find a rough list of subresources by doing a case-sensitive search in the Kubernetes codebase for 'Subresource("'
- 'pods/ephemeralcontainers'
- 'pods/exec'
- 'pods/log'
- 'pods/eviction'
- 'pods/portforward'
- 'pods/proxy'
- 'pods/attach'
- 'pods/binding'
- 'deployments/scale'
- 'replicasets/scale'
- 'statefulsets/scale'
- 'replicationcontrollers/scale'
- 'services/proxy'
- 'nodes/proxy'
# For constraints that mitigate CVE-2020-8554
- 'services/status'
sideEffects: None
timeoutSeconds: 3
- admissionReviewVersions:
- v1
- v1beta1
clientConfig:
service:
name: gatekeeper-webhook-service
namespace: 'gatekeeper-system'
path: /v1/admitlabel
failurePolicy: Fail
matchPolicy: Exact
name: check-ignore-label.gatekeeper.sh
rules:
- apiGroups:
- ""
apiVersions:
- '*'
operations:
- CREATE
- UPDATE
resources:
- namespaces
sideEffects: None
timeoutSeconds: 3
Looks like this is the difference between the init container and the failure message.
init-container
- https://gatekeeper-webhook-service.gatekeeper-system.svc/v1/admitlabel?timeout=2s
failure message
- https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s"
kubectl delete mutatingwebhookconfigurations gatekeeper-mutating-webhook-configuration kubectl delete validatingwebhookconfigurations gatekeeper-validating-webhook-configuration kubectl delete crd -l gatekeeper.sh/system=yes helm delete gatekeeper -n gatekeeper-system kubectl delete namespace gatekeeper-system helm install -n gatekeeper-system --version v3.9.0 gatekeeper gatekeeper/gatekeeper --create-namespace --debug --timeout 900s
Running individual kubectl delete for all the gatekeeper resources could cause issues if they are not done in the right order due to dependencies. you should use helm to delete, which should delete all the gatekeeper resources on the cluster.
helm delete gatekeeper -n gatekeeper-system
For the failure message
https://gatekeeper-webhook-service.gatekeeper-system.svc:443/v1/admitlabel?timeout=3s
This means the gatekeeper webhook service is not ready to serve traffic from the api server and it is timed out after 3 secs as specified in the gatekeeper-validating-webhook-configuration validatingwebhookconfiguration.
I'm having a hard time reproducing the issue on a kind cluster.
Same issue here, using v3.9.0. No difference when increasing the time out itself, as the request fails instantly due to the socket not being accessible ('connection refused'):
Internal error occurred: failed calling webhook "check-ignore-label.gatekeeper.sh": failed to call webhook: Post "https://gatekeeper-webhook-service.cattle-gatekeeper-system.svc:443/v1/admitlabel?timeout=30s": dial tcp 10.11.12.13:443: connect: connection refused
This should be fixed with #2385. This will be available as part of helm chart in the next release (v3.11) or you can test these changes using manifest_staging/charts folder today. Please feel free to comment or re-open if this issue persists.