kyverno
kyverno copied to clipboard
[Bug] Webhook server: x509: certificate signed by unknown authority
Kyverno Version
1.6.x
Kubernetes Version
1.20.x
Kubernetes Platform
EKS
Kyverno Rule Type
Other
Description
This error occurs then you retry and it is okay. For example when creating a pod:
Error from server (InternalError): Internal error occurred: failed calling webhook "mutate.kyverno.svc-fail": Post "https://infra-kyverno-svc.infra.svc:443/mutate?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "*.kyverno.svc")
Steps to reproduce
- Create a pod
Expected behavior
No error message
Screenshots
No response
Kyverno logs
No response
Slack discussion
No response
Troubleshooting
- [X] I have read and followed the documentation AND the troubleshooting guide.
- [X] I have searched other issues in this repository and mine is not recorded.
Thanks for opening your first issue here! Be sure to follow the issue template!
@andrewhibbert would you kindly provide complete reproduction steps which include how Kyverno was installed? It appears from your output that you have customized the name of the Kyverno installation and placed it in a non-default Namespace. It would help to have a complete understanding of what you've done, what policy you're using, and what resource you're submitting.
@andrewhibbert - can you attach the steps for me to reproduce the issue?
Closing, please re-open with steps to reproduce.
Steps to reproduce:
- helm install ...
- helm upgrade ...
with --set createSelfSignedCert=true
The effect is that the cert CA is re-created, but that does not cause the PODs to restart and a such the cert on the POD is not signed anymore by the CA.
The helm chart could gate the creation by checking for the existence:
{{- if not (lookup "v1" "Secret" .Release.Namespace "kyverno-svc.kyverno.svc.kyverno-tls-ca") }}
... everything like before with helm annotation ...
metadata:
annotations:
"helm.sh/resource-policy": "keep"
{{- end }}
The same section has another issue: The cert generated by the helm function will be rejected by clusters above 1.21
The generation only sets the CN which is rejected by golang 1.16+
This would be around line 2/3 of the original helm chart:
{{- $ca := .ca | default (genCA (printf "*.%s.svc" (include "kyverno.namespace" .)) 1024) -}}
{{- $svcName := (printf "%s.%s.svc" (include "kyverno.serviceName" .) (include "kyverno.namespace" .)) -}}
{{- $cert := genSignedCert $svcName nil (list $svcName) 1024 $ca -}}
This solution is far from perfect, one could also just have a sha check over the secret as part of the deployment to make the pods restart on CA change (and cert change which also happens, but isn't picked up in a good way by kyverno yet)
annotations:
checksum/config: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
Anyone found a way to unlock the API Server ?
I can't scale down/up kyverno pods as the webhook is failing, even after having deleted validatingwebhookconfiguration and mutatingwebhookconfiguration
@jlamande is your question related to this issue or something else? Can you clarify?
According to https://github.com/kyverno/kyverno/releases/tag/v1.6.2, this bug https://github.com/kyverno/kyverno/issues/3300 is fixed. But it is still open and milestone is set as 1.7.3
. Can you please confirm if the issue is still open?
@gautvenk - do you have the same issue when using custom certificates? I believe the issue was closed due to a lack of reproducing information.
Steps to reproduce:
- helm install ...
- helm upgrade ...
with --set createSelfSignedCert=true
The effect is that the cert CA is re-created, but that does not cause the PODs to restart and a such the cert on the POD is not signed anymore by the CA.
The helm chart could gate the creation by checking for the existence:
{{- if not (lookup "v1" "Secret" .Release.Namespace "kyverno-svc.kyverno.svc.kyverno-tls-ca") }} ... everything like before with helm annotation ... metadata: annotations: "helm.sh/resource-policy": "keep" {{- end }}
The same section has another issue: The cert generated by the helm function will be rejected by clusters above 1.21
The generation only sets the CN which is rejected by golang 1.16+
This would be around line 2/3 of the original helm chart:
{{- $ca := .ca | default (genCA (printf "*.%s.svc" (include "kyverno.namespace" .)) 1024) -}} {{- $svcName := (printf "%s.%s.svc" (include "kyverno.serviceName" .) (include "kyverno.namespace" .)) -}} {{- $cert := genSignedCert $svcName nil (list $svcName) 1024 $ca -}}
This solution is far from perfect, one could also just have a sha check over the secret as part of the deployment to make the pods restart on CA change (and cert change which also happens, but isn't picked up in a good way by kyverno yet)
annotations: checksum/config: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
@P-S-bits - thanks for the information! Would you like to contribute a fix?
I bought this issue up in the slack channel and was informed about this issue. Adding my setup info here.
Error we noticed:
Internal error occurred: failed calling webhook "validate.kyverno.svc-fail": Post "https://kyverno-svc.kyverno.svc:443/validate?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "*.kyverno.svc"
- We use the base helm chart with default values.
namespace: kyverno
image:
tag: v1.7.3
serviceMonitor:
enabled: true
- Did not adjust
createSelfSignedCert
in values files as you can see above. - We also didn't have the HA setup in DEV. We had 1 pod running in dev and had a cluster outage that caused the kyverno pod to be replaced. In which case i expect the webhook to be available given the pod behind the webhook svc is gone. But the cert validation error is unclear to me.
Please let me know if i can help answer any questions here. I have been unable to reproduce this on my end.
I'll look at it.
https://github.com/kyverno/kyverno/pull/4745 should fix it. I don't think we need to restart the pods
@eddycharly Appreciate the quick response.
I do have a quick question. In my setup I did not overwrite the values for createselfsignedcert. The if conditional here will only kick in if it is set to true wouldn't it ? OR am i misunderstanding this.
Yes, it only generates a self signed cert if this flag is enabled.
Got it. So this will fix the other users issue. Do you have any insight on the error i posted above ? I am not setting the createselfsignedcert value to true and im curious how kyverno handles cert creation in that case.
When does your error happen ? The log you posted is from api server ? kubectl ? something else ?
In your case @ibexmonj it is because the cert authority is unknown, this happens if the webhook configuration does not have the ca bundle correctly set (kyverno is responsible for setting it when creating the webhook config or the cert is renewed, in case the cert is renewed the rotation is graceful, both old and new certs are accepter).
I don't see how this can happen, do you have a scenario to reproduce the issue ?
The error was noticed in our deploy pipeline and I believe its from the API server. We had an outage that affected a couple control plane and worker nodes, we had just 1 kyverno pod in dev so i expected the webhook to the unavailable and block admission but that cert error is something I have been unable to reproduce.
I do see kyverno-svc.kyverno.svc.kyverno-tls-ca
and kyverno-svc.kyverno.svc.kyverno-tls-pair
. Is there something i can check and verify ? I do still have the same setup with createSelfSignedCert
set to false.
If the pod is down the error message should be different, if the pod is up, it should use what is in the secret for the tls connection.
What version are you running ?
Are both secrets tls
ones (not opaque
) like below ?
kyverno-svc.kyverno.svc.kyverno-tls-ca **kubernetes.io/tls** 2 92m
kyverno-svc.kyverno.svc.kyverno-tls-pair **kubernetes.io/tls** 2 92m
Running 1.7.3
kyverno-svc.kyverno.svc.kyverno-tls-ca Opaque 1 30d
kyverno-svc.kyverno.svc.kyverno-tls-pair kubernetes.io/tls 2 30d
Even if kyverno is down it should be able to rebirth as we exclude the kyverno namespace from the webhooks configs (depending on your config).
Ah, the opaque
one is probably causing issues.
Btw, about exclude the kyverno namespace from the webhooks configs
. I believe that logic relies on the namespace label being set. I am on 1.20 so we do not have that feature enabled. I only recently set that label after reading about it.
There should be two keys, kyverno probably can't create the ca bundle. Can you look at the webhook ? You should see something like this:
Webhooks:
Admission Review Versions:
v1beta1
Client Config:
Ca Bundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURHekNDQWdPZ0F3SUJBZ0lRVGRucG15SUg2eVJxZFloTVozMWdGakFOQmdrcWhraUc5dzBCQVFzRkFEQVkKTVJZd0ZBWURWUVFEREEwcUxtdDVkbVZ5Ym04dWMzWmpNQjRYRFRJeU1Ea3lPVEV3TkRNd09Wb1hEVEkxTURjeApPVEV3TkRNd09Wb3dHREVXTUJRR0ExVUVBd3dOS2k1cmVYWmxjbTV2TG5OMll6Q0NBU0l3RFFZSktvWklodmNOCkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFNdURMemxhQ0NEWUozY2lBVnpBdWdxQXVvSXlaeHArbHRqRzg3VjcKeXcxRnZaN2wyUlZIMzlqUlNTbnVtY0cwbGlqYkozcXo1a01iTU10UkNOTTNabks0MzBTUnFJWmJFM0FVamtRRwpiV01PMmtFVHVjSU00SXhVeHBhcFBtcW9iYU9URnE1WkFqdmRKS284QkczSm5ZbFBPa241clk1SmlUTXcyYWl1CkFjY2pqYnpFQk9BQXArdlVQOVRMVkFwQ3FKU3h0Wk5XaFlXVU9MUktvQmpMVXVzcXNuRWhJMGlNdUYyaWUxY2oKcFJianphZy9jUWY1YityaWZ3UFlHWXBQS3pUdkE5Q2E5N3pLc0JJZzM3Nk9yTVJtR2lURTBtdFI1WnRTVURNRAp2c2kwTDZ6dU5SaTFETVV1MzBFaVh6QkErSkxQT09IYkZ4aER4VFc3UGdVbmFKOENBd0VBQWFOaE1GOHdEZ1lEClZSMFBBUUgvQkFRREFnS2tNQjBHQTFVZEpRUVdNQlFHQ0NzR0FRVUZCd01CQmdnckJnRUZCUWNEQWpBUEJnTlYKSFJNQkFmOEVCVEFEQVFIL01CMEdBMVVkRGdRV0JCUTF4NFlsNGd2aVl5cDRPWFJoUmppOVIwSTU4akFOQmdrcQpoa2lHOXcwQkFRc0ZBQU9DQVFFQXVzUGVhQURMZUJDYlladmFtUW5MMElUMWtyRnVoeXZpNWFWVU01QVJpVVl0CnNKNzFnN1JISzNuQXlhQ0UrUUo1d0pNamN3RVhraTFwdjhXZFFPdm50YTY1T3BKUXRkUkJEN2hZck5JbTVTTU8KZEtENElCeCttaDZFTkZLc0hzTW95VDg0Sm1IZlRXRTRKVkNzRktBVU8vWlhXT0YvN3B5Q3c0N3dLZW1EbldDbAp6bFpqYVVPMkN4ZDcrQWhIYzlBWStSOWt0dmRoSHlmNXhTUzQwVWZQZUZlQklyMHlFaGNnWk5tMjFPdlh4RkdECjh4YU5FRXMvWlVYcjFwdnRHQU1IUHVDeitoNFM3QWZuNU42bFlxUThpMG5PWXhzOXBLTHU2ZVZFUmY0QkZWZm4KTmdmSVFpajA1RUNLc0plS3RjUjRnd01JaHgwMVJCZ014VzU3RnRPQ1d3PT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
kubectl describe validatingwebhookconfigurations kyverno-policy-validating-webhook-cfg
True about the label.
I see the ca bundle
is being set when i check both the webhooks
.
Ok, you can probably fix this by deleting both secrets, kyverno should recreate them.
If you do so, please make sure they are both tls
secrets.
Could the namespace label kubernetes.io/metadata.name=kyverno
missing that day have caused this issue by preventing kyverno from creating it ?
I am not sure the secret type can be changed after creation, maybe kyverno just fails continuously creating/updating the secret.