flux2
flux2 copied to clipboard
Reconciliation doesn't progress if it encounters errors with the vscaledobject.kb.io admission webhook
Describe the bug
When deploying applications that have a KEDA ScaledObject, if there are any issues with the configuration that result in errors with the vscaledobject.kb.io
admission webhook Flux fails to reconcile the Kustomization even after the problems are addressed.
Steps to reproduce
Create a simple Kustomization that consists of a Deployment and a ScaledObject. The Deployment manifest intentionally has the resources
section commented out to induce the initial error.
kustomization.yaml
:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: default
resources:
- test.yaml
test.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
spec:
selector:
matchLabels:
app: test
template:
metadata:
labels:
app: test
spec:
containers:
- name: test
image: busybox
command: ["sleep", "infinity"]
# resources:
# requests:
# cpu: 50m
# memory: 50Mi
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: test
spec:
maxReplicaCount: 2
minReplicaCount: 1
scaleTargetRef:
name: test
triggers:
- metadata:
value: "50"
type: cpu
metricType: Utilization
Flux Kustomization:
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: test
namespace: flux-system
spec:
interval: 10m0s
sourceRef:
kind: GitRepository
name: flux-system
path: ./kustomize/apps/test
prune: true
When this Kustomization is applied, the reconciliation will fail with an error similar to this one:
ScaledObject/default/test dry-run failed, reason: Forbidden: admission webhook "vscaledobject.kb.io" denied the request: the scaledobject has a cpu trigger but the container test doesn't have the cpu request defined
After the configuration issue is addressed (i.e. the resources
section is uncommented above) and the configuration is committed to git, Flux continues to report the error even after attempting to manually reconcile the git source and the Kustomization itself. Applying the configuration using kubectl apply -k .
results in success, and a subsequent flux reconcile kustomization test
then works.
Expected behavior
Flux automatically picks up the newest changes that contain configuration fixes and applies them to the Kustomization.
Screenshots and recordings
No response
OS / Distro
N/A
Flux version
v2.1.1
Flux check
► checking prerequisites ✔ Kubernetes 1.27.4-eks-2d98532 >=1.25.0-0 ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.36.1 ✔ image-automation-controller: deployment ready ► ghcr.io/fluxcd/image-automation-controller:v0.36.1 ✔ image-reflector-controller: deployment ready ► ghcr.io/fluxcd/image-reflector-controller:v0.30.0 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v1.1.0 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v1.1.0 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v1.1.1 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta2 ✔ buckets.source.toolkit.fluxcd.io/v1beta2 ✔ gitrepositories.source.toolkit.fluxcd.io/v1 ✔ helmcharts.source.toolkit.fluxcd.io/v1beta2 ✔ helmreleases.helm.toolkit.fluxcd.io/v2beta1 ✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ imagepolicies.image.toolkit.fluxcd.io/v1beta2 ✔ imagerepositories.image.toolkit.fluxcd.io/v1beta2 ✔ imageupdateautomations.image.toolkit.fluxcd.io/v1beta1 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta2 ✔ receivers.notification.toolkit.fluxcd.io/v1 ✔ all checks passed
Git provider
GitHub
Container Registry provider
No response
Additional context
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
I am having this issue as well where the webhook can fail transiently where the deployments don't exist yet that the scaledobjects are looking for depending on a race. Even after the deployments exist the helm release will not reconcile until I've suspended and resumed it.
Same here.
Same here.
might be a keda error though, im on argocd
I've been able to get around this by manually applying the deployment to the cluster, which then allows flux to create the scaledobject and continue reconciling normally. It's an annoying workaround, but at least you only have to do it once when setting up a new service
Same here with ArgoCD 2.9.5 and Custom Metric Autoscaler (KEDA) 2.11.2 on OpenShift 4.12.30: once fixed the Deployment with missing CPU requests, ScaledObject is still not created and keda-admission
Pod claims that:
2024-02-16T16:05:43Z ERROR scaledobject-validation-webhook validation error {"error": "the scaledobject has a cpu trigger but the container XXXXX doesn't have the cpu request defined"}
We've discussed about this issue with Keda devs in Flux Slack some months ago and they are aware of it. If it's not fixed please open an issue in Keda repo, there is nothing we can about it in Flux.