node-maintenance-operator
node-maintenance-operator copied to clipboard
Do not reconcile NodeMaintenance if deletion requested
Why we need this PR
It fixes the bug: If we are deleting NodeMaintenance on existing Node - node-maintenance-operator sometimes keeps taints. This caused by that resource can be reconciled multiple times during deletion with consequent "reconcile normal" and "reconcile delete".
Changes made
Which issue(s) this PR fixes
Test plan
Walkthrough
Updates reconciliation logic in nodemaintenance controller: adjusts finalizer handling based on DeletionTimestamp, adds finalizer when absent on non-deleting objects, and skips reconciliation with a log when deletion is in progress and finalizer is missing. Introduces an early return during deletion. Adds an informational log message.
Changes
| Cohort / File(s) | Summary |
|---|---|
Controller reconciliation flowcontrollers/nodemaintenance_controller.go |
Reworked finalizer check: add finalizer when not deleting; if deleting and finalizer missing, log and return early. Added informational log for deletion-in-progress without finalizer. |
Sequence Diagram(s)
sequenceDiagram
autonumber
participant K8s as Kubernetes API
participant Ctrl as NodeMaintenance Controller
K8s->>Ctrl: Reconcile(NodeMaintenance)
alt Finalizer missing AND not deleting
Ctrl->>K8s: Update: add finalizer
Note right of Ctrl: Continue normal reconciliation
else Finalizer missing AND deleting
Ctrl-->>K8s: Log "deletion in progress, finalizer missing"
Ctrl-->>K8s: Return (skip further reconciliation)
else Finalizer present
Ctrl->>Ctrl: Proceed with standard reconcile path
end
Estimated code review effort
🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Thump-thump goes my reviewer’s heart,
Finalizers set—now we’re smart.
If deletion knocks, we pause the dance,
Log a note, skip the chance.
Hop, hop—clean flow, tidy scene,
Carrots for code that stays serene. 🥕
Pre-merge checks and finishing touches
✅ Passed checks (3 passed)
| Check name | Status | Explanation |
|---|---|---|
| Title Check | ✅ Passed | The title succinctly captures the core change by stating that NodeMaintenance resources will no longer be reconciled once deletion is requested, which directly reflects the added early return logic and finalizer handling in the controller code. |
| Docstring Coverage | ✅ Passed | No functions found in the changes. Docstring coverage check skipped. |
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Hi @vaspahomov. Thanks for your PR.
I'm waiting for a medik8s member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: vaspahomov Once this PR has been reviewed and has the lgtm label, please assign razo7 for approval. For more information see the Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
/ok-to-test
@vaspahomov sorry for the late reply and thanks for the contribution!
Thanks for fixing this bug :)
I think your change already addresses the issue where NodeMaintenance objects being deleted, and does not have the finalizer anymore would fall through to normal maintenance logic.
I believe the problem originated from the if/else branch being a bit hard to read, hence I would suggest something that might make this logic more robust and easier to maintain: Consider using DeletionTimestamp as the primary decision driver.
Currently we're checking finalizer presence first, but the deletion state is really what should determine the reconcile path.
A pattern like this could eliminate the complex compound conditions:
if !nm.ObjectMeta.DeletionTimestamp.IsZero() {
// Handle all deletion cases here (with or without finalizer)
INFO("NMO is being deleted, skipping reconcile")
if finalizer {
...
}
return emptyResult, nil
}
if !finalizer {
add finalizer
}
// Normal reconcile continues here
The current fix works correctly, but restructuring around deletion state might prevent similar issues in the future.
What do you think about this approach?
@vaspahomov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| ci/prow/4.17-openshift-e2e | fac65521490a5cd37805543f2f4bea10d4fe67b8 | link | true | /test 4.17-openshift-e2e |
| ci/prow/4.18-openshift-e2e | fac65521490a5cd37805543f2f4bea10d4fe67b8 | link | true | /test 4.18-openshift-e2e |
| ci/prow/4.16-openshift-e2e | fac65521490a5cd37805543f2f4bea10d4fe67b8 | link | true | /test 4.16-openshift-e2e |
| ci/prow/4.19-openshift-e2e | fac65521490a5cd37805543f2f4bea10d4fe67b8 | link | true | /test 4.19-openshift-e2e |
| ci/prow/4.20-openshift-e2e | fac65521490a5cd37805543f2f4bea10d4fe67b8 | link | true | /test 4.20-openshift-e2e |
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.