kyverno icon indicating copy to clipboard operation
kyverno copied to clipboard

[Bug] Webhook server: x509: certificate signed by unknown authority

Open andrewhibbert opened this issue 3 years ago • 10 comments

Kyverno Version

1.6.x

Kubernetes Version

1.20.x

Kubernetes Platform

EKS

Kyverno Rule Type

Other

Description

This error occurs then you retry and it is okay. For example when creating a pod:

Error from server (InternalError): Internal error occurred: failed calling webhook "mutate.kyverno.svc-fail": Post "https://infra-kyverno-svc.infra.svc:443/mutate?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "*.kyverno.svc")

Steps to reproduce

  1. Create a pod

Expected behavior

No error message

Screenshots

No response

Kyverno logs

No response

Slack discussion

No response

Troubleshooting

  • [X] I have read and followed the documentation AND the troubleshooting guide.
  • [X] I have searched other issues in this repository and mine is not recorded.

andrewhibbert avatar Feb 24 '22 16:02 andrewhibbert

Thanks for opening your first issue here! Be sure to follow the issue template!

welcome[bot] avatar Feb 24 '22 16:02 welcome[bot]

@andrewhibbert would you kindly provide complete reproduction steps which include how Kyverno was installed? It appears from your output that you have customized the name of the Kyverno installation and placed it in a non-default Namespace. It would help to have a complete understanding of what you've done, what policy you're using, and what resource you're submitting.

chipzoller avatar Feb 24 '22 19:02 chipzoller

@andrewhibbert - can you attach the steps for me to reproduce the issue?

realshuting avatar Mar 16 '22 11:03 realshuting

Closing, please re-open with steps to reproduce.

realshuting avatar Mar 28 '22 14:03 realshuting

Steps to reproduce:

  • helm install ...
  • helm upgrade ...

with --set createSelfSignedCert=true

The effect is that the cert CA is re-created, but that does not cause the PODs to restart and a such the cert on the POD is not signed anymore by the CA.

The helm chart could gate the creation by checking for the existence:

{{- if not (lookup "v1" "Secret" .Release.Namespace "kyverno-svc.kyverno.svc.kyverno-tls-ca") }}
... everything like before with helm annotation ... 
metadata:
  annotations:
    "helm.sh/resource-policy": "keep"
{{- end }}

The same section has another issue: The cert generated by the helm function will be rejected by clusters above 1.21

The generation only sets the CN which is rejected by golang 1.16+

This would be around line 2/3 of the original helm chart:

{{- $ca := .ca | default (genCA (printf "*.%s.svc" (include "kyverno.namespace" .)) 1024) -}}
{{- $svcName := (printf "%s.%s.svc" (include "kyverno.serviceName" .) (include "kyverno.namespace" .)) -}}
{{- $cert := genSignedCert $svcName nil (list $svcName) 1024 $ca -}}

This solution is far from perfect, one could also just have a sha check over the secret as part of the deployment to make the pods restart on CA change (and cert change which also happens, but isn't picked up in a good way by kyverno yet)

  annotations:
    checksum/config: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}

P-S-bits avatar Apr 06 '22 11:04 P-S-bits

Anyone found a way to unlock the API Server ?

I can't scale down/up kyverno pods as the webhook is failing, even after having deleted validatingwebhookconfiguration and mutatingwebhookconfiguration

jlamande avatar Jul 11 '22 09:07 jlamande

@jlamande is your question related to this issue or something else? Can you clarify?

chipzoller avatar Jul 11 '22 12:07 chipzoller

According to https://github.com/kyverno/kyverno/releases/tag/v1.6.2, this bug https://github.com/kyverno/kyverno/issues/3300 is fixed. But it is still open and milestone is set as 1.7.3. Can you please confirm if the issue is still open?

gautvenk avatar Jul 25 '22 21:07 gautvenk

@gautvenk - do you have the same issue when using custom certificates? I believe the issue was closed due to a lack of reproducing information.

realshuting avatar Jul 26 '22 06:07 realshuting

Steps to reproduce:

  • helm install ...
  • helm upgrade ...

with --set createSelfSignedCert=true

The effect is that the cert CA is re-created, but that does not cause the PODs to restart and a such the cert on the POD is not signed anymore by the CA.

The helm chart could gate the creation by checking for the existence:

{{- if not (lookup "v1" "Secret" .Release.Namespace "kyverno-svc.kyverno.svc.kyverno-tls-ca") }}
... everything like before with helm annotation ... 
metadata:
  annotations:
    "helm.sh/resource-policy": "keep"
{{- end }}

The same section has another issue: The cert generated by the helm function will be rejected by clusters above 1.21

The generation only sets the CN which is rejected by golang 1.16+

This would be around line 2/3 of the original helm chart:

{{- $ca := .ca | default (genCA (printf "*.%s.svc" (include "kyverno.namespace" .)) 1024) -}}
{{- $svcName := (printf "%s.%s.svc" (include "kyverno.serviceName" .) (include "kyverno.namespace" .)) -}}
{{- $cert := genSignedCert $svcName nil (list $svcName) 1024 $ca -}}

This solution is far from perfect, one could also just have a sha check over the secret as part of the deployment to make the pods restart on CA change (and cert change which also happens, but isn't picked up in a good way by kyverno yet)

  annotations:
    checksum/config: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}

@P-S-bits - thanks for the information! Would you like to contribute a fix?

realshuting avatar Jul 26 '22 06:07 realshuting

I bought this issue up in the slack channel and was informed about this issue. Adding my setup info here.

Error we noticed: Internal error occurred: failed calling webhook "validate.kyverno.svc-fail": Post "https://kyverno-svc.kyverno.svc:443/validate?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "*.kyverno.svc"

  1. We use the base helm chart with default values.
  namespace: kyverno
  image:
    tag: v1.7.3
  serviceMonitor:
    enabled: true
  1. Did not adjust createSelfSignedCert in values files as you can see above.
  2. We also didn't have the HA setup in DEV. We had 1 pod running in dev and had a cluster outage that caused the kyverno pod to be replaced. In which case i expect the webhook to be available given the pod behind the webhook svc is gone. But the cert validation error is unclear to me.

Please let me know if i can help answer any questions here. I have been unable to reproduce this on my end.

ibexmonj avatar Sep 27 '22 16:09 ibexmonj

I'll look at it.

eddycharly avatar Sep 28 '22 09:09 eddycharly

https://github.com/kyverno/kyverno/pull/4745 should fix it. I don't think we need to restart the pods

eddycharly avatar Sep 29 '22 10:09 eddycharly

@eddycharly Appreciate the quick response.

I do have a quick question. In my setup I did not overwrite the values for createselfsignedcert. The if conditional here will only kick in if it is set to true wouldn't it ? OR am i misunderstanding this.

ibexmonj avatar Sep 29 '22 11:09 ibexmonj

Yes, it only generates a self signed cert if this flag is enabled.

eddycharly avatar Sep 29 '22 11:09 eddycharly

Got it. So this will fix the other users issue. Do you have any insight on the error i posted above ? I am not setting the createselfsignedcert value to true and im curious how kyverno handles cert creation in that case.

ibexmonj avatar Sep 29 '22 12:09 ibexmonj

When does your error happen ? The log you posted is from api server ? kubectl ? something else ?

eddycharly avatar Sep 29 '22 12:09 eddycharly

In your case @ibexmonj it is because the cert authority is unknown, this happens if the webhook configuration does not have the ca bundle correctly set (kyverno is responsible for setting it when creating the webhook config or the cert is renewed, in case the cert is renewed the rotation is graceful, both old and new certs are accepter).

I don't see how this can happen, do you have a scenario to reproduce the issue ?

eddycharly avatar Sep 29 '22 12:09 eddycharly

The error was noticed in our deploy pipeline and I believe its from the API server. We had an outage that affected a couple control plane and worker nodes, we had just 1 kyverno pod in dev so i expected the webhook to the unavailable and block admission but that cert error is something I have been unable to reproduce. I do see kyverno-svc.kyverno.svc.kyverno-tls-ca and kyverno-svc.kyverno.svc.kyverno-tls-pair. Is there something i can check and verify ? I do still have the same setup with createSelfSignedCert set to false.

ibexmonj avatar Sep 29 '22 12:09 ibexmonj

If the pod is down the error message should be different, if the pod is up, it should use what is in the secret for the tls connection.

What version are you running ?

Are both secrets tls ones (not opaque) like below ?

kyverno-svc.kyverno.svc.kyverno-tls-ca     **kubernetes.io/tls**    2      92m
kyverno-svc.kyverno.svc.kyverno-tls-pair   **kubernetes.io/tls**    2      92m

eddycharly avatar Sep 29 '22 12:09 eddycharly

Running 1.7.3

kyverno-svc.kyverno.svc.kyverno-tls-ca     Opaque                                1      30d
kyverno-svc.kyverno.svc.kyverno-tls-pair   kubernetes.io/tls                     2      30d

ibexmonj avatar Sep 29 '22 12:09 ibexmonj

Even if kyverno is down it should be able to rebirth as we exclude the kyverno namespace from the webhooks configs (depending on your config).

eddycharly avatar Sep 29 '22 12:09 eddycharly

Ah, the opaque one is probably causing issues.

eddycharly avatar Sep 29 '22 12:09 eddycharly

Btw, about exclude the kyverno namespace from the webhooks configs . I believe that logic relies on the namespace label being set. I am on 1.20 so we do not have that feature enabled. I only recently set that label after reading about it.

ibexmonj avatar Sep 29 '22 12:09 ibexmonj

There should be two keys, kyverno probably can't create the ca bundle. Can you look at the webhook ? You should see something like this:

Webhooks:
  Admission Review Versions:
    v1beta1
  Client Config:
    Ca Bundle:  LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURHekNDQWdPZ0F3SUJBZ0lRVGRucG15SUg2eVJxZFloTVozMWdGakFOQmdrcWhraUc5dzBCQVFzRkFEQVkKTVJZd0ZBWURWUVFEREEwcUxtdDVkbVZ5Ym04dWMzWmpNQjRYRFRJeU1Ea3lPVEV3TkRNd09Wb1hEVEkxTURjeApPVEV3TkRNd09Wb3dHREVXTUJRR0ExVUVBd3dOS2k1cmVYWmxjbTV2TG5OMll6Q0NBU0l3RFFZSktvWklodmNOCkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFNdURMemxhQ0NEWUozY2lBVnpBdWdxQXVvSXlaeHArbHRqRzg3VjcKeXcxRnZaN2wyUlZIMzlqUlNTbnVtY0cwbGlqYkozcXo1a01iTU10UkNOTTNabks0MzBTUnFJWmJFM0FVamtRRwpiV01PMmtFVHVjSU00SXhVeHBhcFBtcW9iYU9URnE1WkFqdmRKS284QkczSm5ZbFBPa241clk1SmlUTXcyYWl1CkFjY2pqYnpFQk9BQXArdlVQOVRMVkFwQ3FKU3h0Wk5XaFlXVU9MUktvQmpMVXVzcXNuRWhJMGlNdUYyaWUxY2oKcFJianphZy9jUWY1YityaWZ3UFlHWXBQS3pUdkE5Q2E5N3pLc0JJZzM3Nk9yTVJtR2lURTBtdFI1WnRTVURNRAp2c2kwTDZ6dU5SaTFETVV1MzBFaVh6QkErSkxQT09IYkZ4aER4VFc3UGdVbmFKOENBd0VBQWFOaE1GOHdEZ1lEClZSMFBBUUgvQkFRREFnS2tNQjBHQTFVZEpRUVdNQlFHQ0NzR0FRVUZCd01CQmdnckJnRUZCUWNEQWpBUEJnTlYKSFJNQkFmOEVCVEFEQVFIL01CMEdBMVVkRGdRV0JCUTF4NFlsNGd2aVl5cDRPWFJoUmppOVIwSTU4akFOQmdrcQpoa2lHOXcwQkFRc0ZBQU9DQVFFQXVzUGVhQURMZUJDYlladmFtUW5MMElUMWtyRnVoeXZpNWFWVU01QVJpVVl0CnNKNzFnN1JISzNuQXlhQ0UrUUo1d0pNamN3RVhraTFwdjhXZFFPdm50YTY1T3BKUXRkUkJEN2hZck5JbTVTTU8KZEtENElCeCttaDZFTkZLc0hzTW95VDg0Sm1IZlRXRTRKVkNzRktBVU8vWlhXT0YvN3B5Q3c0N3dLZW1EbldDbAp6bFpqYVVPMkN4ZDcrQWhIYzlBWStSOWt0dmRoSHlmNXhTUzQwVWZQZUZlQklyMHlFaGNnWk5tMjFPdlh4RkdECjh4YU5FRXMvWlVYcjFwdnRHQU1IUHVDeitoNFM3QWZuNU42bFlxUThpMG5PWXhzOXBLTHU2ZVZFUmY0QkZWZm4KTmdmSVFpajA1RUNLc0plS3RjUjRnd01JaHgwMVJCZ014VzU3RnRPQ1d3PT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=

kubectl describe validatingwebhookconfigurations kyverno-policy-validating-webhook-cfg

eddycharly avatar Sep 29 '22 12:09 eddycharly

True about the label.

eddycharly avatar Sep 29 '22 12:09 eddycharly

I see the ca bundle is being set when i check both the webhooks .

ibexmonj avatar Sep 29 '22 12:09 ibexmonj

Ok, you can probably fix this by deleting both secrets, kyverno should recreate them. If you do so, please make sure they are both tls secrets.

eddycharly avatar Sep 29 '22 12:09 eddycharly

Could the namespace label kubernetes.io/metadata.name=kyverno missing that day have caused this issue by preventing kyverno from creating it ?

ibexmonj avatar Sep 29 '22 12:09 ibexmonj

I am not sure the secret type can be changed after creation, maybe kyverno just fails continuously creating/updating the secret.

eddycharly avatar Sep 29 '22 12:09 eddycharly