kured icon indicating copy to clipboard operation
kured copied to clipboard

Uncordon immediately after drain failure

Open pando85 opened this issue 7 months ago • 3 comments

We've been running Kured for some time and recently noticed a change in behavior starting with version 1.16.2. When a node fails to drain, it is now uncordoned only after the releaseDelay period.

Previously (e.g., in version 1.16.1 and earlier), the node was uncordoned immediately after the drain failure. This behavior was preferable for our use case, as it allows workloads to be rescheduled quickly if a reboot cannot proceed.

It appears this change was introduced in this commit.

We would like to request an option or a fix to restore the previous behavior—immediate uncordon on drain failure—to minimize disruption in scenarios where the reboot cannot be completed.

pando85 avatar May 19 '25 12:05 pando85

I like the idea. This is a bug indeed. It was not covered by a test. Can you introduce a test too ?

evrardjp avatar May 26 '25 12:05 evrardjp

I would try to work on this if I have time.

pando85 avatar May 26 '25 12:05 pando85

Hello @evrardjp . I finally get the time for working on this issue. Is this still relevant?

Also, we are interesting of updating to the latest version. I'm seeing that you are working on #1000 how could we coordinate to get this fixed?

pando85 avatar Oct 08 '25 12:10 pando85