cert-manager icon indicating copy to clipboard operation
cert-manager copied to clipboard

Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": context deadline exceeded

Open klepiz opened this issue 3 years ago • 10 comments

Im trying to install cert-certificates in an fresh k8s cluster machine Steps:

helm repo add jetstack https://charts.jetstack.io
helm repo update
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.8.0/cert-manager.crds.yaml
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.8.0 \
  --set installCRDs=true

and then I got this issue

Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": context deadline exceeded

The pods.service and endpoint seems to look ok

# kubectl get svc,pods,endpoints -n cert-manager
NAME                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/cert-manager           ClusterIP   <...>    <none>        9402/TCP   12m
service/cert-manager-webhook   ClusterIP   <...>   <none>        443/TCP    12m

NAME                                          READY   STATUS    RESTARTS   AGE
pod/cert-manager-6bbf595697-4z855             1/1     Running   0          12m
pod/cert-manager-cainjector-6bc9d758b-gg48g   1/1     Running   0          12m
pod/cert-manager-webhook-d98678bf5-wp7t9      1/1     Running   0          12m

NAME                             ENDPOINTS             AGE
endpoints/cert-manager           <...>     12m
endpoints/cert-manager-webhook   <...>   12m

can someone help to understand why am I getting this issue? Im running those command on Linux

klepiz avatar Jun 07 '22 20:06 klepiz

Usually this is because the webhook is not reachable from Kubernetes apiserver for some reason (i.e a networking issue) We have some debug docs here https://cert-manager.io/docs/concepts/webhook/#known-problems-and-solutions

irbekrm avatar Jun 08 '22 09:06 irbekrm

@klepiz are you on AKS? I experience this intermittently on AKS but not in GKE for example.

consideRatio avatar Jun 09 '22 11:06 consideRatio

no, my nodes are in bare metal so Im not using cloud

klepiz avatar Jun 13 '22 19:06 klepiz

@klepiz hi,have you solved this problem? I'm facing the same issue in bare metal with k3s cluster.

xzycn avatar Jul 07 '22 09:07 xzycn

kind of, I run this as workaround

kubectl delete validatingwebhookconfiguration validating-webhook-configuration
kubectl delete mutatingwebhookconfiguration mutating-webhook-configuration

klepiz avatar Jul 07 '22 14:07 klepiz

@klepiz yes, it works. @irbekrm there should have an official solution for this case,maybe this is a kind of bug?

xzycn avatar Jul 08 '22 02:07 xzycn

kubectl delete validatingwebhookconfiguration validating-webhook-configuration kubectl delete mutatingwebhookconfiguration mutating-webhook-configuration

You should never do this as that way you will lose resource validation and mutation.

irbekrm avatar Jul 08 '22 10:07 irbekrm

Just wanted to add that I've got the same issues with some of my clusters on prem and cloud with k3s. I've attempted to extend the mutate/validate webhooks to a 30s timeout( max allowed by kubernetes) and remove an extra clusterIssuer but the events still persist. Currently both are leveraging Flannel CNI's.

  • Civo cloud k3s cluster (ver 1.22.11)
  • pi4 8GB cluster (ver 1.24.3)

NerdyShawn avatar Jul 29 '22 12:07 NerdyShawn

Also getting this issue. Fresh cluster deployed. I'm using MetalLB and Flannel for the networking. It's a bare metal setup. Apart from installing Flannel and creating the namespace, these commands were the first commands run.

helm upgrade --install ingress-nginx ingress-nginx --repo https://kubernetes.github.io/ingress-nginx --namespace cert-manager
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.9.1/cert-manager.yaml
kubectl apply -f ./issuer/ingress-controller-service.yml
kubectl apply -f ./issuer/cert.yml

Errors when applying cert.yml.

Contents:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: XXX.YYY-tls
  namespace: cert-manager
spec:
  secretName: XXX.YYY-tls-cert
  duration: 48h
  renewBefore: 24h
  commonName: XXX.YYY
  dnsNames:
  - XXX.YYY
  issuerRef:
    name: letsencrypt-prod
    kind: Issuer

Debugging:

$ kbc version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.13", GitCommit:"a43c0904d0de10f92aa3956c74489c45e6453d6e", GitTreeState:"clean", BuildDate:"2022-08-17T18:23:45Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:44:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}


$ kbc logs cert-manager-webhook-6466bc8f4-t5gbw -n cert-manager
I0831 08:18:45.438073       1 feature_gate.go:245] feature gates: &{map[]}
W0831 08:18:45.438133       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
E0831 08:19:15.443657       1 webhook.go:122] cert-manager "msg"="Failed initialising server" "error"="error building admission chain: Get \"https://10.96.0.1:443/api\": dial tcp 10.96.0.1:443: i/o timeout"

$ kbc get pods -n cert-manager -o wide
NAME                                                   READY   STATUS    RESTARTS   AGE    IP             NODE      NOMINATED NODE   READINESS GATES
cert-manager-55649d64b4-jnmgm                          1/1     Running   0          24m    10.244.2.98    k-w-002   <none>           <none>
cert-manager-cainjector-666db4777-56c8s                1/1     Running   0          24m    10.244.4.101   k-w-003   <none>           <none>
cert-manager-webhook-6466bc8f4-25v22                   1/1     Running   0          12m    10.244.1.15    k-w-001   <none>           <none>
ingress-nginx-controller-55dcf56b68-vs7bn              1/1     Running   0          7m9s   10.244.4.125   k-w-003   <none>           <none>
quickstart-ingress-nginx-controller-54f6f89679-nr995   1/1     Running   0          21m    10.244.4.110   k-w-003   <none>           <none>

$ kbc get svc -n cert-manager -o wide
NAME                                            TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE     SELECTOR
cert-manager                                    ClusterIP      10.110.51.49    <none>        9402/TCP                     3h16m   app.kubernetes.io/component=controller,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=cert-manager
cert-manager-webhook                            ClusterIP      10.97.58.250    <none>        443/TCP                      3h16m   app.kubernetes.io/component=webhook,app.kubernetes.io/instance=cert-manager,app.kubernetes.io/name=webhook
ingress-nginx-controller                        LoadBalancer   10.107.232.56   10.7.7.10     80:31066/TCP,443:32011/TCP   6m54s   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
ingress-nginx-controller-admission              ClusterIP      10.105.64.235   <none>        443/TCP                      6m54s   app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx,app.kubernetes.io/name=ingress-nginx
quickstart-ingress-nginx-controller             LoadBalancer   10.109.25.34    10.7.9.10     80:30817/TCP,443:31464/TCP   22m     app.kubernetes.io/component=controller,app.kubernetes.io/instance=quickstart,app.kubernetes.io/name=ingress-nginx
quickstart-ingress-nginx-controller-admission   ClusterIP      10.98.67.225    <none>        443/TCP                      22m     app.kubernetes.io/component=controller,app.kubernetes.io/instance=quickstart,app.kubernetes.io/name=ingress-nginx


$ kbc get ValidatingWebhookConfiguration -n cert-manager
NAME                                 WEBHOOKS   AGE
cert-manager-webhook                 1          3h19m
ingress-nginx-admission              1          9m22s
quickstart-ingress-nginx-admission   1          24m

$ kbc describe pod cert-manager-webhook-6466bc8f4-25v22 -n cert-manager
Name:         cert-manager-webhook-6466bc8f4-25v22
Namespace:    cert-manager
Priority:     0
Node:         k-w-001/10.7.60.11
Start Time:   Wed, 31 Aug 2022 05:59:43 +0000
Labels:       app=webhook
              app.kubernetes.io/component=webhook
              app.kubernetes.io/instance=cert-manager
              app.kubernetes.io/name=webhook
              app.kubernetes.io/version=v1.9.1
              pod-template-hash=6466bc8f4
Annotations:  <none>
Status:       Running
IP:           10.244.1.15
IPs:
  IP:           10.244.1.15
Controlled By:  ReplicaSet/cert-manager-webhook-6466bc8f4
Containers:
  cert-manager:
    Container ID:  docker://1200df958d74e3d4b398be1e6ed0992b7d3660e8b4710b1f498ac324489013fe
    Image:         quay.io/jetstack/cert-manager-webhook:v1.9.1
    Image ID:      docker-pullable://quay.io/jetstack/cert-manager-webhook@sha256:4ab2982a220e1c719473d52d8463508422ab26e92664732bfc4d96b538af6b8a
    Port:          10250/TCP
    Host Port:     0/TCP
    Args:
      --v=2
      --secure-port=10250
      --dynamic-serving-ca-secret-namespace=$(POD_NAMESPACE)
      --dynamic-serving-ca-secret-name=cert-manager-webhook-ca
      --dynamic-serving-dns-names=cert-manager-webhook,cert-manager-webhook.cert-manager,cert-manager-webhook.cert-manager.svc
    State:          Running
      Started:      Wed, 31 Aug 2022 05:59:47 +0000
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:6080/livez delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:      http-get http://:6080/healthz delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:
      POD_NAMESPACE:  cert-manager (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-xfvn8 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  kube-api-access-xfvn8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  15m   default-scheduler  Successfully assigned cert-manager/cert-manager-webhook-6466bc8f4-25v22 to k-w-001
  Normal  Pulling    15m   kubelet            Pulling image "quay.io/jetstack/cert-manager-webhook:v1.9.1"
  Normal  Pulled     15m   kubelet            Successfully pulled image "quay.io/jetstack/cert-manager-webhook:v1.9.1" in 3.151900949s
  Normal  Created    15m   kubelet            Created container cert-manager
  Normal  Started    15m   kubelet            Started container cert-manager

Edit: Just reinstalled Ubuntu and installed Kubernetes. This issue still persists.

Slyke avatar Aug 31 '22 06:08 Slyke

I'm running k3s cluster on gcp vm's and faced with same issue. Moreover no one of installed validation webhooks are working as well, except cases where validation pods are running with network=host privileges

T-helper avatar Sep 21 '22 15:09 T-helper

kind of, I run this as workaround

kubectl delete validatingwebhookconfiguration validating-webhook-configuration
kubectl delete mutatingwebhookconfiguration mutating-webhook-configuration

that's the only thing that helped after 15 retries, tnx :)

dejanmarich avatar Oct 17 '22 17:10 dejanmarich

I did some testing. It seems that when I remove metallb from my kubespray config that I don't seem to get the context deadline exceeded error anymore.

I suspect this is mostly due to kubespray dictating (and metallb needing) kube_proxy_strict_arp set to true. After deploying cert-manager (via static manifests on docs site) I was able to use the manifests here - https://cert-manager.io/docs/installation/verify/#manual-verification to get a cert created without issue.

From there, I was able to modify the kube proxy configmap to do strict_arp, then install metallb, and then advertise a L2 pool. A service set to LoadBalancer was able to obtain an IP and route just fine.

I then deleted the cert test resources, verified, then recreated them without issue. To test further, I deleted those and then the cert-manager static manifests and tried to reinstall cert-manager. This is where I ran into issues. It was able to recreate all resources and the pods did come up but this error presents in the webhook pod-

namespace/cert-manager-test created
Error from server (InternalError): error when creating "2test.yml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": x509: certificate signed by unknown authority
Error from server (InternalError): error when creating "2test.yml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": failed to call webhook: Post "https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s": x509: certificate signed by unknown authority

This error persists even when reverting all metallb installtion and deleting and recreating all cert-manager infra.

So from my limited testing, it appears that it might work fine if you install in order of

  1. cluster
  2. cert-manager
  3. metallb

But I'm not sure if it will break as soon as the webhook pod restarts or after some period of time. I will guess that it would break upon any cert-manager upgrade. I was also receiving errors in the webhook pod logs about TLS handshake errors (must be from metallb.)

greenship24 avatar Oct 31 '22 14:10 greenship24

Going further, I think I have found something that works that I would appreciate a dev's input on (if it truly works, if it will be stable for the long haul, etc.)

For kubespray (and calico, metallb) users, I think this should work from my limited testing.

  1. Create your cluster specifying kube_proxy_strict_arp: true in your k8s-clusters vars and adding metallb in your addons.yml (assuming you want it.)
  2. Download the static manifests here- https://github.com/cert-manager/cert-manager/releases/download/v1.10.0/cert-manager.yaml
  3. Edit them with whatever editor to reflect 3 changes
  4. cert-manager-webhook deployment- change 10250 to some other port- 13250 if you please (and follows for this example.) - This needs done TWICE here- once to specify the command and then an update to the port
  5. cert-manager-webhook service- change 10250 to 13250. This should work with "https" but I changed it anyhow.
  6. Deploy the manifests with kubectl apply -f

I still get handshake errors in the webhook pod logs but unsure if that is an issue or not, Maybe? This "solution" if it is one was drawn up using some notes from the following link (cause 2, cause 3.) https://cert-manager.io/docs/troubleshooting/webhook/#error-io-timeout-connectivity-issue

Seems to work (maybe) according to this comment from an issue long ago too- https://github.com/cert-manager/cert-manager/issues/2319#issuecomment-738806971

greenship24 avatar Oct 31 '22 15:10 greenship24

(cert-manager v1.9.0 via Helm Chart)

I had the same problem and it solved it by following, hope it helps any future readers

1. Below to fully delete all objects created by cert-manager otherwise it crashed while re-installing

helm uninstall cert-manager -n cert-manager
kubectl delete roles cert-manager-startupapicheck:create-cert -n cert-manager;
kubectl delete serviceaccount cert-manager-startupapicheck -n cert-manager;
kubectl delete serviceaccount default -n cert-manager;
kubectl delete jobs cert-manager-startupapicheck -n cert-manager;
kubectl delete rolebindings cert-manager-startupapicheck:create-cert -n cert-manager;

2. Also deleted all cert-manager related CRDs (Not sure if this was necessary but Certificates come up by itself once cert-manager works properly anyway)

kubectl delete -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.0/cert-manager.crds.yaml

3. Update Chart

helm repo add jetstack https://charts.jetstack.io
helm repo update

4. Install Chart with below config (Set host network true to make webhook pod run in the host's network namespace & Set securePort to 10260 to prevent a conflict between the webhook and the kubelet)

helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.9.0 --set startupapicheck.timeout=5m --set installCRDs=true --set webhook.hostNetwork=true --set webhook.securePort=10260

Reference: https://cert-manager.io/docs/troubleshooting/webhook/#io-timeout

imageschool avatar Nov 16 '22 08:11 imageschool

(cert-manager v1.9.0 via Helm Chart)

I had the same problem and it solved it by following, hope it helps any future readers

1. Below to fully delete all objects created by cert-manager otherwise it crashed while re-installing

helm uninstall cert-manager -n cert-manager
kubectl delete roles cert-manager-startupapicheck:create-cert -n cert-manager;
kubectl delete serviceaccount cert-manager-startupapicheck -n cert-manager;
kubectl delete serviceaccount default -n cert-manager;
kubectl delete jobs cert-manager-startupapicheck -n cert-manager;
kubectl delete rolebindings cert-manager-startupapicheck:create-cert -n cert-manager;

2. Also deleted all cert-manager related CRDs (Not sure if this was necessary but Certificates come up by itself once cert-manager works properly anyway)

kubectl delete -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.0/cert-manager.crds.yaml

3. Update Chart

helm repo add jetstack https://charts.jetstack.io
helm repo update

4. Install Chart with below config (Set host network true to make webhook pod run in the host's network namespace & Set securePort to 10260 to prevent a conflict between the webhook and the kubelet)

helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.9.0 --set startupapicheck.timeout=5m --set installCRDs=true --set webhook.hostNetwork=true --set webhook.securePort=10260

Reference: https://cert-manager.io/docs/troubleshooting/webhook/#io-timeout

Thanks for the guidelines, @imageschool 's solution works to me

nduc4nh avatar Nov 18 '22 19:11 nduc4nh

For anyone who still has problems with webhook connectivity, make sure to take a look at our guide: https://cert-manager.io/docs/troubleshooting/webhook/ Feel free to re-open this issue if necessary.

inteon avatar Feb 14 '23 09:02 inteon

kind of, I run this as workaround

kubectl delete validatingwebhookconfiguration validating-webhook-configuration
kubectl delete mutatingwebhookconfiguration mutating-webhook-configuration

Yep. That's the only thing that fixed my issue.

Nabulio avatar Aug 24 '23 21:08 Nabulio

To be clear, cert-manager needs its validatingwebhookconfiguration and mutatingwebhookconfiguration to work securely and correctly.

inteon avatar Aug 24 '23 21:08 inteon