keda
keda copied to clipboard
Setting failurePolicy to Fail in the admission webhook does not work
Report
We've installed keda via the downloaded release yaml.
We want to use the admission webhook with "failurePolicy: Fail". As soon as we change that we see the following issue, when fluxcd tries to apply a namesapce using a scaledobject.
{"level":"error","ts":"2023-11-15T13:11:16.956Z","msg":"Reconciliation failed after 2.426684853s, next try in 5m0s","controller":"kustomization","controllerGroup":"kustomize.toolkit.fluxcd.io","controllerKind":"Kustomization","Kustomization":{"name":"gatekeeper","namespace":"flux-system"},"namespace":"flux-system","name":"gatekeeper","reconcileID":"9d4d6fe1-da59-40e9-af8c-798651024822","revision":"dev@sha1:b7ced8d1a6d67107cedfb8bdd665d81de2877ba0","error":"ScaledObject/gatekeeper/httpcache dry-run failed, reason: InternalError: Internal error occurred: failed calling webhook \"vscaledobject.kb.io\": failed to call webhook: Post \"[https://keda-admission-webhooks.keda.svc:443/validate-keda-sh-v1alpha1-scaledobject?timeout=10s](https://keda-admission-webhooks.keda.svc/validate-keda-sh-v1alpha1-scaledobject?timeout=10s)\": EOF\n"}
The admisson webhook deployment is running and the svc is reachable via port-forward. The used certificates use the right names.
How can we further debug this?
I'm not sure if "internal error" is an error of the admission controller or if the admission controller can't be reached. The admission controller itself does not log any error.
For us it looks like the admission webhook does not work at all but the error is ignored with the default config?
Expected Behavior
The admission webhook works with: failurePolicy: Fail
Actual Behavior
The admission webhook can't be used becauser of an "internal error".
Steps to Reproduce the Problem
- edit the admisison controller and set failurePolicy: Fail
Logs from KEDA operator
The only error i found in the operator logs was:
2023-11-15T13:42:32Z ERROR cert-rotation Webhook not found. Unable to update certificate. {"name": "keda-admission-webhooks", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "error": "ValidatingWebhookConfiguration.admissionregistration.k8s.io \"keda-admission-webhooks\" not found"}
So the name of the webhook seemed not to match.
Changing the name of the validating webhook from keda-admission to keda-admission-webhooks did not help.
Afterwards we saw errors like:
2023-11-15T14:07:30Z ERROR cert-rotation Error updating webhook with certificate {"name": "keda-admission-webhooks", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "error": "Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"keda-admission-webhooks\": the object has been modified; please apply your changes to the latest version and try again"}
KEDA Version
2.12.0
Kubernetes Version
1.26
Platform
Microsoft Azure
Scaler Details
No response
Anything else?
No response
Hello, What manifest are you using to deploy KEDA? There are 2 flavors, with and without webhooks and maybe there is any failure on them
We've installed keda via the downloaded release yaml which we split up in separate files and apply with fluxcd's kustomize controller.
There are 3 different yaml inside the release:
- keda-2.12.0-core.yaml
- keda-2.12.0-crds.yaml
- keda-2.12.0.yaml
Which are you using?
- keda-2.12.0-crds.yaml
- keda-2.12.0.yaml
I'm reviewing the configuration and it's looks nice in the yamls but I've noticed a weird thing. The logs you sent say that the ValidatingWebhookConfiguration's name is keda-admission-webhooks
but that's not the default value, default value is keda-admission
.
keda-admission-webhooks
is the service name between the ValidatingWebhookConfiguration and the webhook deployment, but that's correctly configured in the yaml.
Are you modifying the naming? The ValidatingWebhookConfiguration's name that KEDA operator will use is provided by this arg validating-webhook-name
. Can you check KEDA operator pod's arguments to check if it's overrided
As we were looking into the issue we thought the naming might not match at some point and tried to change it but it made no difference. We might have forgotten to set everything back to the default.
Could you try to set everything to default and post the logs?
Yes, thanks for your help :) Will do. Might need some days though.
Sure, just ping me back when you have more info :)
We updated all resources to the 2.12.1 release now, overwriting our changes. After setting "failurePolicy: Fail" in the validating webhook the namespaces with a scaledobject can't be applied any longer.
✗ Kustomization reconciliation failed: ScaledObject/istio-system/istio-ingressgateway dry-run failed, reason: InternalError: Internal error occurred: failed calling webhook "vscaledobject.kb.io": failed to call webhook: Post "https://keda-admission-webhooks.keda.svc:443/validate-keda-sh-v1alpha1-scaledobject?timeout=10s": EOF
We also still see the following errors in the operator log:
2023-11-30T07:39:46Z ERROR cert-rotation Error updating webhook with certificate {"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "error": "Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"keda-admission\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/open-policy-agent/cert-controller/pkg/rotator.(*ReconcileWH).ensureCerts
/workspace/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:789
github.com/open-policy-agent/cert-controller/pkg/rotator.(*ReconcileWH).Reconcile
/workspace/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:739
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
2023-11-30T07:39:46Z ERROR Reconciler error {"controller": "cert-rotator", "object": {"name":"kedaorg-certs","namespace":"keda"}, "namespace": "keda", "name": "kedaorg-certs", "reconcileID": "8e7e9cfa-5bb0-45ec-8526-2b36e307ecbd", "error": "Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"keda-admission\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
Is the message transient or you see it permantently in your logs? Are you deploying KEDA with any kind of auto-sync? If yes, the auto-sync tool can be conflicting with cert-controller (the internal controller that KEDA uses for managing internally the certificates). If this is your case, I'd suggest disabling the aout-sync to check if the error disapears and if yes, suing cert-manager to manage certificates externally instead of using the internal controller
Yes, the error message are popping up regulary (around every 5 min).
What do you mena with auto-sync? The resources created/applied by fluxcd are reconciled but certs are not part of it.
I'll try to use cert-manager.
Using cert-manager to create and inject the cert makes no difference :(
In my experience with ArgoCD, that error is because flux is reconciling the configuration all the time, locking the resource.
Using cert-manager to create and inject the cert makes no difference :(
What do you mean? it's not possible because if you use cert-manager, you have to disable this mechanism from the operator (helm chart does it automatically). https://keda.sh/docs/2.12/operate/security/#use-your-own-tls-certificates
After discussing this in the fluxcd slack channel we decided to go without the admission controller: https://cloud-native.slack.com/archives/CLAJ40HV3/p1701698388439549
I've posted on the channel too, let's see if there is something that we can do in the future to prevent this 🤞
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity.