node-maintenance-operator icon indicating copy to clipboard operation
node-maintenance-operator copied to clipboard

Do not reconcile NodeMaintenance if deletion requested

Open vaspahomov opened this issue 2 months ago • 7 comments

Why we need this PR

It fixes the bug: If we are deleting NodeMaintenance on existing Node - node-maintenance-operator sometimes keeps taints. This caused by that resource can be reconciled multiple times during deletion with consequent "reconcile normal" and "reconcile delete".

Changes made

Which issue(s) this PR fixes

Test plan

vaspahomov avatar Sep 26 '25 09:09 vaspahomov

Walkthrough

Updates reconciliation logic in nodemaintenance controller: adjusts finalizer handling based on DeletionTimestamp, adds finalizer when absent on non-deleting objects, and skips reconciliation with a log when deletion is in progress and finalizer is missing. Introduces an early return during deletion. Adds an informational log message.

Changes

Cohort / File(s) Summary
Controller reconciliation flow
controllers/nodemaintenance_controller.go
Reworked finalizer check: add finalizer when not deleting; if deleting and finalizer missing, log and return early. Added informational log for deletion-in-progress without finalizer.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant K8s as Kubernetes API
  participant Ctrl as NodeMaintenance Controller

  K8s->>Ctrl: Reconcile(NodeMaintenance)
  alt Finalizer missing AND not deleting
    Ctrl->>K8s: Update: add finalizer
    Note right of Ctrl: Continue normal reconciliation
  else Finalizer missing AND deleting
    Ctrl-->>K8s: Log "deletion in progress, finalizer missing"
    Ctrl-->>K8s: Return (skip further reconciliation)
  else Finalizer present
    Ctrl->>Ctrl: Proceed with standard reconcile path
  end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

Thump-thump goes my reviewer’s heart,
Finalizers set—now we’re smart.
If deletion knocks, we pause the dance,
Log a note, skip the chance.
Hop, hop—clean flow, tidy scene,
Carrots for code that stays serene. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly captures the core change by stating that NodeMaintenance resources will no longer be reconciled once deletion is requested, which directly reflects the added early return logic and finalizer handling in the controller code.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • [ ] Create PR with unit tests
  • [ ] Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot] avatar Sep 26 '25 09:09 coderabbitai[bot]

Hi @vaspahomov. Thanks for your PR.

I'm waiting for a medik8s member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

openshift-ci[bot] avatar Sep 26 '25 09:09 openshift-ci[bot]

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: vaspahomov Once this PR has been reviewed and has the lgtm label, please assign razo7 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Sep 26 '25 09:09 openshift-ci[bot]

/ok-to-test

clobrano avatar Oct 06 '25 07:10 clobrano

@vaspahomov sorry for the late reply and thanks for the contribution!

clobrano avatar Oct 06 '25 07:10 clobrano

Thanks for fixing this bug :)

I think your change already addresses the issue where NodeMaintenance objects being deleted, and does not have the finalizer anymore would fall through to normal maintenance logic.

I believe the problem originated from the if/else branch being a bit hard to read, hence I would suggest something that might make this logic more robust and easier to maintain: Consider using DeletionTimestamp as the primary decision driver.

Currently we're checking finalizer presence first, but the deletion state is really what should determine the reconcile path.

A pattern like this could eliminate the complex compound conditions:

if !nm.ObjectMeta.DeletionTimestamp.IsZero() {
    // Handle all deletion cases here (with or without finalizer)
    INFO("NMO is being deleted, skipping reconcile")
    if finalizer {
        ...
    }
    return emptyResult, nil
}

if !finalizer {
    add finalizer
}
// Normal reconcile continues here

The current fix works correctly, but restructuring around deletion state might prevent similar issues in the future.

What do you think about this approach?

clobrano avatar Oct 06 '25 08:10 clobrano

@vaspahomov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.17-openshift-e2e fac65521490a5cd37805543f2f4bea10d4fe67b8 link true /test 4.17-openshift-e2e
ci/prow/4.18-openshift-e2e fac65521490a5cd37805543f2f4bea10d4fe67b8 link true /test 4.18-openshift-e2e
ci/prow/4.16-openshift-e2e fac65521490a5cd37805543f2f4bea10d4fe67b8 link true /test 4.16-openshift-e2e
ci/prow/4.19-openshift-e2e fac65521490a5cd37805543f2f4bea10d4fe67b8 link true /test 4.19-openshift-e2e
ci/prow/4.20-openshift-e2e fac65521490a5cd37805543f2f4bea10d4fe67b8 link true /test 4.20-openshift-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Oct 06 '25 08:10 openshift-ci[bot]