flux2 icon indicating copy to clipboard operation
flux2 copied to clipboard

kstatus unknown appears to pass a healthCheck

Open jack1902 opened this issue 10 months ago • 0 comments

Describe the bug

When using the healthCheck feature on a flux kustomization, i was hoping to observe the dependsOn being honoured such that the initial kustomization wasn't seen as "Applied" straight away.

This is not what i am seeing and i believe it is down to kstatus reporting back a unknown status and flux not treating that as a failure/progressing state. It is my belief that if someone is configuring a healthCheck that unknown status being reported by kstatus should be seen as a "keep attempting until we timeout". I struggle to see any condition where i add a healthCheck to Flux for a CR where an unknown status being reported by kstatus is a good thing, and if unknown is something truly happening because of a bad CR then removal of the healthCheck is always a possibility.

If i am mistaken on this, i'm happy to debug where applicable as observedGeneration: -1 on any and all CRD' in use seems excessive and something i don't see on many CR' in the wild, for example reviewing the zalando postgres CRD it doesn't contain observedGeneration: -1 so kstatus wouldn't function on a first time deployment for that

Steps to reproduce

  1. Provision a cluster which makes use of CRD'
  2. Make use of those CRD' with Flux mapping a healthCheck onto the CR in question
  3. Observe Flux kustomization being marked as applied within a few seconds (the backing controller for the CR wouldn't have had enough time to update the CR with a status field yet which forces kstatus to report an unknown state)

Expected behavior

Flux to wait for the healthCheck to either:

  • timeout - because no pass/fail condition occured
  • pass - because kstatus reported a Complete state
  • fail - because kstatus reported a Failed state

Screenshots and recordings

No response

OS / Distro

N/A

Flux version

N/A

Flux check

N/A

Git provider

No response

Container Registry provider

No response

Additional context

I initially raise a bug against Strimzi as i was observing my kafka kustomization being marked as applied even though i had a healthCheck configured for the first install of kafka into my cluster (no status field is present when first installing the resource).

relates to: https://github.com/strimzi/strimzi-kafka-operator/issues/9278 (strimzi operator, but i don't see that at fault) relates to: https://github.com/kubernetes-sigs/cli-utils/issues/632 (question around what happens when no status field exists on the CR, i believe unknown but wasn't 100% sure of that)

Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

jack1902 avatar Oct 20 '23 16:10 jack1902