kustomize-controller
kustomize-controller copied to clipboard
Cannot delete kustomization when it is running health checks
I have been trying to delete a kustomization that had a faulty pod created underneath it (kustomization in unknown state) with no luck. even removing finalizers from the kustomization doesn't help.
here is the kustomization definition:
apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: ***
reconcile.fluxcd.io/requestedAt: "2021-11-11T12:06:49.593954-05:00"
creationTimestamp: "2021-11-11T16:46:26Z"
deletionGracePeriodSeconds: 0
deletionTimestamp: "2021-11-11T16:46:59Z"
generation: 5
labels:
app: ****
app.kubernetes.io/managed-by: pulumi
workloadtype: task
name: migration-aaesbzw7le
namespace: sandbox
resourceVersion: "59353062"
uid: 4f6f966f-8e14-46e8-8935-f92f2db7916b
spec:
force: true
images:
- name: *****
newTag: *******
interval: 30m0s
patches:
- patch: *****
target:
kind: ConfigMap
name: app-config
version: v1
- patch: *****
target:
kind: Pod
labelSelector: *****
name: .*
path: ****
postBuild:
substitute:
****
prune: true
sourceRef:
kind: GitRepository
name: *****
suspend: true
status:
conditions:
- lastTransitionTime: "2021-11-11T16:46:26Z"
message: running health checks with a timeout of 29m30s
reason: Progressing
status: Unknown
type: Ready
- lastTransitionTime: "2021-11-11T16:46:26Z"
message: running health checks with a timeout of 29m30s
reason: Progressing
status: Unknown
type: Healthy
observedGeneration: -1
Update: It does get deleted after the 30 minutes timeout
You have to set spec.timeout
when using .spec.wait
or spec.healthCheck
, otherwise the timeout will default to the interval which is is set to half on hour. Please see the docs: https://fluxcd.io/docs/components/kustomize/kustomization/#health-assessment
@stefanprodan yeah but shouldn't the Kustomization
be able to be deleted before the timeout? or is this the intended behaviour?
This is how Kubernetes controller-runtime queue works, once a request is processed any changes are queued until the current operation finishes. Setting high timeouts means that a request will block, would you set a 30 minutes timeout to a readiness or liveness probe?
@stefanprodan makes sense. we have been using this mechanism to track some long-running pods (like a migration) and update another kustomization once it is done. but when it does fail you will have to wait all this time to delete it. Is there a smarter way to do this?
The only workaround is to suspend the Kustomization, then restart the controller by killing its pod. When the controller starts it will no longer try to run the health check again since it receives the new object that's suspended.
based on @stefanprodan 's advice, we ended up lowering the values for both timeout and interval and it gives the chance to the kustomization to be unblocked after each timeout which makes it deletable.