Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": context deadline exceeded
Im trying to install cert-certificates in an fresh k8s cluster machine Steps:
helm repo add jetstack https://charts.jetstack.io
helm repo update
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.0/cert-manager.crds.yaml
helm install \
cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.8.0 \
--set installCRDs=true
and then I got this issue
Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": context deadline exceeded
The pods.service and endpoint seems to look ok
# kubectl get svc,pods,endpoints -n cert-manager
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cert-manager ClusterIP <...> <none> 9402/TCP 12m
service/cert-manager-webhook ClusterIP <...> <none> 443/TCP 12m
NAME READY STATUS RESTARTS AGE
pod/cert-manager-6bbf595697-4z855 1/1 Running 0 12m
pod/cert-manager-cainjector-6bc9d758b-gg48g 1/1 Running 0 12m
pod/cert-manager-webhook-d98678bf5-wp7t9 1/1 Running 0 12m
NAME ENDPOINTS AGE
endpoints/cert-manager <...> 12m
endpoints/cert-manager-webhook <...> 12m
can someone help to understand why am I getting this issue? Im running those command on Linux
Usually this is because the webhook is not reachable from Kubernetes apiserver for some reason (i.e a networking issue) We have some debug docs here https://cert-manager.io/docs/concepts/webhook/#known-problems-and-solutions
@klepiz are you on AKS? I experience this intermittently on AKS but not in GKE for example.
no, my nodes are in bare metal so Im not using cloud
@klepiz hi,have you solved this problem? I'm facing the same issue in bare metal with k3s cluster.
kind of, I run this as workaround
kubectl delete validatingwebhookconfiguration validating-webhook-configuration
kubectl delete mutatingwebhookconfiguration mutating-webhook-configuration
@klepiz yes, it works. @irbekrm there should have an official solution for this case,maybe this is a kind of bug?
kubectl delete validatingwebhookconfiguration validating-webhook-configuration kubectl delete mutatingwebhookconfiguration mutating-webhook-configuration
You should never do this as that way you will lose resource validation and mutation.
Just wanted to add that I've got the same issues with some of my clusters on prem and cloud with k3s. I've attempted to extend the mutate/validate webhooks to a 30s timeout( max allowed by kubernetes) and remove an extra clusterIssuer but the events still persist. Currently both are leveraging Flannel CNI's.
- Civo cloud k3s cluster (ver 1.22.11)
- pi4 8GB cluster (ver 1.24.3)
Also getting this issue. Fresh cluster deployed. I'm using MetalLB and Flannel for the networking. It's a bare metal setup. Apart from installing Flannel and creating the namespace, these commands were the first commands run.
helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace cert-manager
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.9.1/cert-manager.yaml
kubectl apply -f ./issuer/ingress-controller-service.yml
kubectl apply -f ./issuer/cert.yml
Errors when applying cert.yml.
Contents:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: XXX.YYY-tls
namespace: cert-manager
spec:
secretName: XXX.YYY-tls-cert
duration: 48h
renewBefore: 24h
commonName: XXX.YYY
dnsNames:
- XXX.YYY
issuerRef:
name: letsencrypt-prod
kind: Issuer
Debugging:
$ kbc version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.13", GitCommit:"a43c0904d0de10f92aa3956c74489c45e6453d6e", GitTreeState:"clean", BuildDate:"2022-08-17T18:23:45Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:44:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
$ kbc logs cert-manager-webhook-6466bc8f4-t5gbw -n cert-manager
I0831 08:18:45.438073 1 feature_gate.go:245] feature gates: &{map[]}
W0831 08:18:45.438133 1 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
E0831 08:19:15.443657 1 webhook.go:122] cert-manager "msg"="Failed initialising server" "error"="error building admission chain: Get \"https://10.96.0.1:443/api\": dial tcp 10.96.0.1:443: i/o timeout"
$ kbc get pods -n cert-manager -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cert-manager-55649d64b4-jnmgm 1/1 Running 0 24m 10.244.2.98 k-w-002 <none> <none>
cert-manager-cainjector-666db4777-56c8s 1/1 Running 0 24m 10.244.4.101 k-w-003 <none> <none>
cert-manager-webhook-6466bc8f4-25v22 1/1 Running 0 12m 10.244.1.15 k-w-001 <none> <none>
ingress-nginx-controller-55dcf56b68-vs7bn 1/1 Running 0 7m9s 10.244.4.125 k-w-003 <none> <none>
quickstart-ingress-nginx-controller-54f6f89679-nr995 1/1 Running 0 21m 10.244.4.110 k-w-003 <none> <none>
$ kbc get svc -n cert-manager -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
cert-manager ClusterIP 10.110.51.49 <none> 9402/TCP 3h16m app.kubernetes.io/component=controller,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cert-manager
cert-manager-webhook ClusterIP 10.97.58.250 <none> 443/TCP 3h16m app.kubernetes.io/component=webhook,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=webhook
ingress-nginx-controller LoadBalancer 10.107.232.56 10.7.7.10 80:31066/TCP,443:32011/TCP 6m54s app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
ingress-nginx-controller-admission ClusterIP 10.105.64.235 <none> 443/TCP 6m54s app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
quickstart-ingress-nginx-controller LoadBalancer 10.109.25.34 10.7.9.10 80:30817/TCP,443:31464/TCP 22m app.kubernetes.io/component=controller,app.kubernetes.io/instance=quickstart,app.kubernetes.io/name=ingress-nginx
quickstart-ingress-nginx-controller-admission ClusterIP 10.98.67.225 <none> 443/TCP 22m app.kubernetes.io/component=controller,app.kubernetes.io/instance=quickstart,app.kubernetes.io/name=ingress-nginx
$ kbc get ValidatingWebhookConfiguration -n cert-manager
NAME WEBHOOKS AGE
cert-manager-webhook 1 3h19m
ingress-nginx-admission 1 9m22s
quickstart-ingress-nginx-admission 1 24m
$ kbc describe pod cert-manager-webhook-6466bc8f4-25v22 -n cert-manager
Name: cert-manager-webhook-6466bc8f4-25v22
Namespace: cert-manager
Priority: 0
Node: k-w-001/10.7.60.11
Start Time: Wed, 31 Aug 2022 05:59:43 +0000
Labels: app=webhook
app.kubernetes.io/component=webhook
app.kubernetes.io/instance=cert-manager
app.kubernetes.io/name=webhook
app.kubernetes.io/version=v1.9.1
pod-template-hash=6466bc8f4
Annotations: <none>
Status: Running
IP: 10.244.1.15
IPs:
IP: 10.244.1.15
Controlled By: ReplicaSet/cert-manager-webhook-6466bc8f4
Containers:
cert-manager:
Container ID: docker://1200df958d74e3d4b398be1e6ed0992b7d3660e8b4710b1f498ac324489013fe
Image: quay.io/jetstack/cert-manager-webhook:v1.9.1
Image ID: docker-pullable://quay.io/jetstack/cert-manager-webhook@sha256:4ab2982a220e1c719473d52d8463508422ab26e92664732bfc4d96b538af6b8a
Port: 10250/TCP
Host Port: 0/TCP
Args:
--v=2
--secure-port=10250
--dynamic-serving-ca-secret-namespace=$(POD_NAMESPACE)
--dynamic-serving-ca-secret-name=cert-manager-webhook-ca
--dynamic-serving-dns-names=cert-manager-webhook,cert-manager-webhook.cert-manager,cert-manager-webhook.cert-manager.svc
State: Running
Started: Wed, 31 Aug 2022 05:59:47 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:6080/livez delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:6080/healthz delay=5s timeout=1s period=5s #success=1 #failure=3
Environment:
POD_NAMESPACE: cert-manager (v1:metadata.namespace)
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xfvn8 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-xfvn8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15m default-scheduler Successfully assigned cert-manager/cert-manager-webhook-6466bc8f4-25v22 to k-w-001
Normal Pulling 15m kubelet Pulling image "quay.io/jetstack/cert-manager-webhook:v1.9.1"
Normal Pulled 15m kubelet Successfully pulled image "quay.io/jetstack/cert-manager-webhook:v1.9.1" in 3.151900949s
Normal Created 15m kubelet Created container cert-manager
Normal Started 15m kubelet Started container cert-manager
Edit: Just reinstalled Ubuntu and installed Kubernetes. This issue still persists.
I'm running k3s cluster on gcp vm's and faced with same issue. Moreover no one of installed validation webhooks are working as well, except cases where validation pods are running with network=host privileges
kind of, I run this as workaround
kubectl delete validatingwebhookconfiguration validating-webhook-configuration kubectl delete mutatingwebhookconfiguration mutating-webhook-configuration
that's the only thing that helped after 15 retries, tnx :)
I did some testing. It seems that when I remove metallb from my kubespray config that I don't seem to get the context deadline exceeded error anymore.
I suspect this is mostly due to kubespray dictating (and metallb needing) kube_proxy_strict_arp set to true. After deploying cert-manager (via static manifests on docs site) I was able to use the manifests here - https://cert-manager.io/docs/installation/verify/#manual-verification to get a cert created without issue.
From there, I was able to modify the kube proxy configmap to do strict_arp, then install metallb, and then advertise a L2 pool. A service set to LoadBalancer was able to obtain an IP and route just fine.
I then deleted the cert test resources, verified, then recreated them without issue. To test further, I deleted those and then the cert-manager static manifests and tried to reinstall cert-manager. This is where I ran into issues. It was able to recreate all resources and the pods did come up but this error presents in the webhook pod-
namespace/cert-manager-test created
Error from server (InternalError): error when creating "2test.yml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": x509: certificate signed by unknown authority
Error from server (InternalError): error when creating "2test.yml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": x509: certificate signed by unknown authority
This error persists even when reverting all metallb installtion and deleting and recreating all cert-manager infra.
So from my limited testing, it appears that it might work fine if you install in order of
- cluster
- cert-manager
- metallb
But I'm not sure if it will break as soon as the webhook pod restarts or after some period of time. I will guess that it would break upon any cert-manager upgrade. I was also receiving errors in the webhook pod logs about TLS handshake errors (must be from metallb.)
Going further, I think I have found something that works that I would appreciate a dev's input on (if it truly works, if it will be stable for the long haul, etc.)
For kubespray (and calico, metallb) users, I think this should work from my limited testing.
- Create your cluster specifying kube_proxy_strict_arp: true in your k8s-clusters vars and adding metallb in your addons.yml (assuming you want it.)
- Download the static manifests here- https://github.com/cert-manager/cert-manager/releases/download/v1.10.0/cert-manager.yaml
- Edit them with whatever editor to reflect 3 changes
- cert-manager-webhook deployment- change 10250 to some other port- 13250 if you please (and follows for this example.) - This needs done TWICE here- once to specify the command and then an update to the port
- cert-manager-webhook service- change 10250 to 13250. This should work with "https" but I changed it anyhow.
- Deploy the manifests with kubectl apply -f
I still get handshake errors in the webhook pod logs but unsure if that is an issue or not, Maybe? This "solution" if it is one was drawn up using some notes from the following link (cause 2, cause 3.) https://cert-manager.io/docs/troubleshooting/webhook/#error-io-timeout-connectivity-issue
Seems to work (maybe) according to this comment from an issue long ago too- https://github.com/cert-manager/cert-manager/issues/2319#issuecomment-738806971
(cert-manager v1.9.0 via Helm Chart)
I had the same problem and it solved it by following, hope it helps any future readers
1. Below to fully delete all objects created by cert-manager otherwise it crashed while re-installing
helm uninstall cert-manager -n cert-manager
kubectl delete roles cert-manager-startupapicheck:create-cert -n cert-manager;
kubectl delete serviceaccount cert-manager-startupapicheck -n cert-manager;
kubectl delete serviceaccount default -n cert-manager;
kubectl delete jobs cert-manager-startupapicheck -n cert-manager;
kubectl delete rolebindings cert-manager-startupapicheck:create-cert -n cert-manager;
2. Also deleted all cert-manager related CRDs (Not sure if this was necessary but Certificates come up by itself once cert-manager works properly anyway)
kubectl delete -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.0/cert-manager.crds.yaml
3. Update Chart
helm repo add jetstack https://charts.jetstack.io
helm repo update
4. Install Chart with below config (Set host network true to make webhook pod run in the host's network namespace & Set securePort to 10260 to prevent a conflict between the webhook and the kubelet)
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.9.0 --set startupapicheck.timeout=5m --set installCRDs=true --set webhook.hostNetwork=true --set webhook.securePort=10260
Reference: https://cert-manager.io/docs/troubleshooting/webhook/#io-timeout
(cert-manager v1.9.0 via Helm Chart)
I had the same problem and it solved it by following, hope it helps any future readers
1. Below to fully delete all objects created by cert-manager otherwise it crashed while re-installing
helm uninstall cert-manager -n cert-manager kubectl delete roles cert-manager-startupapicheck:create-cert -n cert-manager; kubectl delete serviceaccount cert-manager-startupapicheck -n cert-manager; kubectl delete serviceaccount default -n cert-manager; kubectl delete jobs cert-manager-startupapicheck -n cert-manager; kubectl delete rolebindings cert-manager-startupapicheck:create-cert -n cert-manager;2. Also deleted all cert-manager related CRDs (Not sure if this was necessary but Certificates come up by itself once cert-manager works properly anyway)
kubectl delete -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.0/cert-manager.crds.yaml3. Update Chart
helm repo add jetstack https://charts.jetstack.io helm repo update4. Install Chart with below config (Set host network true to make webhook pod run in the host's network namespace & Set securePort to 10260 to prevent a conflict between the webhook and the kubelet)
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.9.0 --set startupapicheck.timeout=5m --set installCRDs=true --set webhook.hostNetwork=true --set webhook.securePort=10260Reference: https://cert-manager.io/docs/troubleshooting/webhook/#io-timeout
Thanks for the guidelines, @imageschool 's solution works to me
For anyone who still has problems with webhook connectivity, make sure to take a look at our guide: https://cert-manager.io/docs/troubleshooting/webhook/ Feel free to re-open this issue if necessary.
kind of, I run this as workaround
kubectl delete validatingwebhookconfiguration validating-webhook-configuration kubectl delete mutatingwebhookconfiguration mutating-webhook-configuration
Yep. That's the only thing that fixed my issue.
To be clear, cert-manager needs its validatingwebhookconfiguration and mutatingwebhookconfiguration to work securely and correctly.