keda icon indicating copy to clipboard operation
keda copied to clipboard

Setting failurePolicy to Fail in the admission webhook does not work

Open monotek opened this issue 1 year ago • 16 comments

Report

We've installed keda via the downloaded release yaml.

We want to use the admission webhook with "failurePolicy: Fail". As soon as we change that we see the following issue, when fluxcd tries to apply a namesapce using a scaledobject.

{"level":"error","ts":"2023-11-15T13:11:16.956Z","msg":"Reconciliation failed after 2.426684853s, next try in 5m0s","controller":"kustomization","controllerGroup":"kustomize.toolkit.fluxcd.io","controllerKind":"Kustomization","Kustomization":{"name":"gatekeeper","namespace":"flux-system"},"namespace":"flux-system","name":"gatekeeper","reconcileID":"9d4d6fe1-da59-40e9-af8c-798651024822","revision":"dev@sha1:b7ced8d1a6d67107cedfb8bdd665d81de2877ba0","error":"ScaledObject/gatekeeper/httpcache dry-run failed, reason: InternalError: Internal error occurred: failed calling webhook \"vscaledobject.kb.io\": failed to call webhook: Post \"[https://keda-admission-webhooks.keda.svc:443/validate-keda-sh-v1alpha1-scaledobject?timeout=10s](https://keda-admission-webhooks.keda.svc/validate-keda-sh-v1alpha1-scaledobject?timeout=10s)\": EOF\n"}

The admisson webhook deployment is running and the svc is reachable via port-forward. The used certificates use the right names.

How can we further debug this?

I'm not sure if "internal error" is an error of the admission controller or if the admission controller can't be reached. The admission controller itself does not log any error.

For us it looks like the admission webhook does not work at all but the error is ignored with the default config?

Expected Behavior

The admission webhook works with: failurePolicy: Fail

Actual Behavior

The admission webhook can't be used becauser of an "internal error".

Steps to Reproduce the Problem

  1. edit the admisison controller and set failurePolicy: Fail

Logs from KEDA operator

The only error i found in the operator logs was:

2023-11-15T13:42:32Z	ERROR	cert-rotation	Webhook not found. Unable to update certificate.	{"name": "keda-admission-webhooks", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "error": "ValidatingWebhookConfiguration.admissionregistration.k8s.io \"keda-admission-webhooks\" not found"}

So the name of the webhook seemed not to match.

Changing the name of the validating webhook from keda-admission to keda-admission-webhooks did not help.

Afterwards we saw errors like:

2023-11-15T14:07:30Z	ERROR	cert-rotation	Error updating webhook with certificate	{"name": "keda-admission-webhooks", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "error": "Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"keda-admission-webhooks\": the object has been modified; please apply your changes to the latest version and try again"}

KEDA Version

2.12.0

Kubernetes Version

1.26

Platform

Microsoft Azure

Scaler Details

No response

Anything else?

No response

monotek avatar Nov 15 '23 14:11 monotek

Hello, What manifest are you using to deploy KEDA? There are 2 flavors, with and without webhooks and maybe there is any failure on them

JorTurFer avatar Nov 26 '23 16:11 JorTurFer

We've installed keda via the downloaded release yaml which we split up in separate files and apply with fluxcd's kustomize controller.

monotek avatar Nov 27 '23 06:11 monotek

There are 3 different yaml inside the release:

  • keda-2.12.0-core.yaml
  • keda-2.12.0-crds.yaml
  • keda-2.12.0.yaml

Which are you using?

JorTurFer avatar Nov 27 '23 11:11 JorTurFer

  • keda-2.12.0-crds.yaml
  • keda-2.12.0.yaml

monotek avatar Nov 27 '23 13:11 monotek

I'm reviewing the configuration and it's looks nice in the yamls but I've noticed a weird thing. The logs you sent say that the ValidatingWebhookConfiguration's name is keda-admission-webhooks but that's not the default value, default value is keda-admission. keda-admission-webhooks is the service name between the ValidatingWebhookConfiguration and the webhook deployment, but that's correctly configured in the yaml. Are you modifying the naming? The ValidatingWebhookConfiguration's name that KEDA operator will use is provided by this arg validating-webhook-name. Can you check KEDA operator pod's arguments to check if it's overrided

JorTurFer avatar Nov 27 '23 21:11 JorTurFer

As we were looking into the issue we thought the naming might not match at some point and tried to change it but it made no difference. We might have forgotten to set everything back to the default.

monotek avatar Nov 28 '23 06:11 monotek

Could you try to set everything to default and post the logs?

JorTurFer avatar Nov 28 '23 07:11 JorTurFer

Yes, thanks for your help :) Will do. Might need some days though.

monotek avatar Nov 28 '23 12:11 monotek

Sure, just ping me back when you have more info :)

JorTurFer avatar Nov 28 '23 12:11 JorTurFer

We updated all resources to the 2.12.1 release now, overwriting our changes. After setting "failurePolicy: Fail" in the validating webhook the namespaces with a scaledobject can't be applied any longer.

✗ Kustomization reconciliation failed: ScaledObject/istio-system/istio-ingressgateway dry-run failed, reason: InternalError: Internal error occurred: failed calling webhook "vscaledobject.kb.io": failed to call webhook: Post "https://keda-admission-webhooks.keda.svc:443/validate-keda-sh-v1alpha1-scaledobject?timeout=10s": EOF

We also still see the following errors in the operator log:

2023-11-30T07:39:46Z	ERROR	cert-rotation	Error updating webhook with certificate	{"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "error": "Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"keda-admission\": the object has been modified; please apply your changes to the latest version and try again"}
github.com/open-policy-agent/cert-controller/pkg/rotator.(*ReconcileWH).ensureCerts
	/workspace/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:789
github.com/open-policy-agent/cert-controller/pkg/rotator.(*ReconcileWH).Reconcile
	/workspace/vendor/github.com/open-policy-agent/cert-controller/pkg/rotator/rotator.go:739
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
2023-11-30T07:39:46Z	ERROR	Reconciler error	{"controller": "cert-rotator", "object": {"name":"kedaorg-certs","namespace":"keda"}, "namespace": "keda", "name": "kedaorg-certs", "reconcileID": "8e7e9cfa-5bb0-45ec-8526-2b36e307ecbd", "error": "Operation cannot be fulfilled on validatingwebhookconfigurations.admissionregistration.k8s.io \"keda-admission\": the object has been modified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227

monotek avatar Nov 30 '23 07:11 monotek

Is the message transient or you see it permantently in your logs? Are you deploying KEDA with any kind of auto-sync? If yes, the auto-sync tool can be conflicting with cert-controller (the internal controller that KEDA uses for managing internally the certificates). If this is your case, I'd suggest disabling the aout-sync to check if the error disapears and if yes, suing cert-manager to manage certificates externally instead of using the internal controller

JorTurFer avatar Nov 30 '23 08:11 JorTurFer

Yes, the error message are popping up regulary (around every 5 min).

What do you mena with auto-sync? The resources created/applied by fluxcd are reconciled but certs are not part of it.

I'll try to use cert-manager.

monotek avatar Nov 30 '23 08:11 monotek

Using cert-manager to create and inject the cert makes no difference :(

monotek avatar Nov 30 '23 10:11 monotek

In my experience with ArgoCD, that error is because flux is reconciling the configuration all the time, locking the resource.

Using cert-manager to create and inject the cert makes no difference :(

What do you mean? it's not possible because if you use cert-manager, you have to disable this mechanism from the operator (helm chart does it automatically). https://keda.sh/docs/2.12/operate/security/#use-your-own-tls-certificates

JorTurFer avatar Dec 02 '23 00:12 JorTurFer

After discussing this in the fluxcd slack channel we decided to go without the admission controller: https://cloud-native.slack.com/archives/CLAJ40HV3/p1701698388439549

monotek avatar Dec 07 '23 15:12 monotek

I've posted on the channel too, let's see if there is something that we can do in the future to prevent this 🤞

JorTurFer avatar Dec 07 '23 16:12 JorTurFer

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Feb 06 '24 05:02 stale[bot]

This issue has been automatically closed due to inactivity.

stale[bot] avatar Feb 13 '24 09:02 stale[bot]