kustomize-controller icon indicating copy to clipboard operation
kustomize-controller copied to clipboard

Cannot delete kustomization when it is running health checks

Open miadabrin opened this issue 3 years ago • 6 comments

I have been trying to delete a kustomization that had a faulty pod created underneath it (kustomization in unknown state) with no luck. even removing finalizers from the kustomization doesn't help.

here is the kustomization definition:

apiVersion: kustomize.toolkit.fluxcd.io/v1beta2
kind: Kustomization
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: ***
    reconcile.fluxcd.io/requestedAt: "2021-11-11T12:06:49.593954-05:00"
  creationTimestamp: "2021-11-11T16:46:26Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2021-11-11T16:46:59Z"
  generation: 5
  labels:
    app: ****
    app.kubernetes.io/managed-by: pulumi
    workloadtype: task
  name: migration-aaesbzw7le
  namespace: sandbox
  resourceVersion: "59353062"
  uid: 4f6f966f-8e14-46e8-8935-f92f2db7916b
spec:
  force: true
  images:
  - name: *****
    newTag: *******
  interval: 30m0s
  patches:
  - patch: *****
    target:
      kind: ConfigMap
      name: app-config
      version: v1
  - patch: *****
    target:
      kind: Pod
      labelSelector: *****
      name: .*
  path: ****
  postBuild:
    substitute:
    ****
  prune: true
  sourceRef:
    kind: GitRepository
    name: *****
  suspend: true
status:
  conditions:
  - lastTransitionTime: "2021-11-11T16:46:26Z"
    message: running health checks with a timeout of 29m30s
    reason: Progressing
    status: Unknown
    type: Ready
  - lastTransitionTime: "2021-11-11T16:46:26Z"
    message: running health checks with a timeout of 29m30s
    reason: Progressing
    status: Unknown
    type: Healthy
  observedGeneration: -1

image

Update: It does get deleted after the 30 minutes timeout

miadabrin avatar Nov 11 '21 17:11 miadabrin

You have to set spec.timeout when using .spec.wait or spec.healthCheck, otherwise the timeout will default to the interval which is is set to half on hour. Please see the docs: https://fluxcd.io/docs/components/kustomize/kustomization/#health-assessment

stefanprodan avatar Nov 11 '21 17:11 stefanprodan

@stefanprodan yeah but shouldn't the Kustomization be able to be deleted before the timeout? or is this the intended behaviour?

miadabrin avatar Nov 11 '21 17:11 miadabrin

This is how Kubernetes controller-runtime queue works, once a request is processed any changes are queued until the current operation finishes. Setting high timeouts means that a request will block, would you set a 30 minutes timeout to a readiness or liveness probe?

stefanprodan avatar Nov 11 '21 17:11 stefanprodan

@stefanprodan makes sense. we have been using this mechanism to track some long-running pods (like a migration) and update another kustomization once it is done. but when it does fail you will have to wait all this time to delete it. Is there a smarter way to do this?

miadabrin avatar Nov 11 '21 17:11 miadabrin

The only workaround is to suspend the Kustomization, then restart the controller by killing its pod. When the controller starts it will no longer try to run the health check again since it receives the new object that's suspended.

stefanprodan avatar Nov 11 '21 17:11 stefanprodan

based on @stefanprodan 's advice, we ended up lowering the values for both timeout and interval and it gives the chance to the kustomization to be unblocked after each timeout which makes it deletable.

miadabrin avatar Nov 12 '21 22:11 miadabrin