node-healthcheck-operator icon indicating copy to clipboard operation
node-healthcheck-operator copied to clipboard

NodeHealthCheck status is not updated when remediation CR is deleted by remediator

Open aibarbetta opened this issue 7 months ago • 4 comments

Hi all, I'm using NHC with a custom remediator. In some cases, my Kubernetes nodes are deleted, and as the documentation says here, my remediator will delete the remediation Custom Resource. The issue is that the NHC resource still shows these old remediations on its phase, reason, and inFlightRemediations:

    inFlightRemediations:
      yul1-r11-u14: "2023-11-07T21:53:04Z"
      yul1-r11-u15: "2023-11-07T02:49:42Z"
    observedNodes: 131
    phase: Remediating
    reason: NHC is remediating 2 nodes

this blocks all updates and deletion of the NHC resource, since the validating webhook thinks a remediation is still in progress and responds with:

admission webhook "vnodehealthcheck.kb.io" denied the request: selector update prohibited due to running remediation

am I missing a configuration to signal NHC of these deletions?

aibarbetta avatar Nov 08 '23 19:11 aibarbetta