system-upgrade-controller icon indicating copy to clipboard operation
system-upgrade-controller copied to clipboard

Deleting a plan when the plan is failing should return nodes to schedulable

Open stellirin opened this issue 5 years ago • 4 comments
trafficstars

Is your feature request related to a problem? Please describe.

After my upgrade plan failed due to rancher/k3s-upgrade#22 I deleted the plan, with the hope that the nodes would be returned to status Ready. The controller continued to schedule jobs and the node remained with status Ready,SchedulingDisabled.

Describe the solution you'd like The controller should see that attempts to upgrade are failing and return the node back to status Ready. No more attempts to schedule the upgrade should be made until the upgrade plan is modified.

Describe alternatives you've considered I manually ran kubectl uncordon my-node and all was fine.

Additional context

stellirin avatar May 09 '20 13:05 stellirin

@stellirin thanks for reporting this and the underlying issue (that I moved to https://github.com/rancher/k3s-upgrade/issues/22).

Right now plan "recovery", such as it is, is an exercise left to the human operator. The SUC will only uncordon a node when:

  • the apply job has completed successfully
  • the plan still exists with non-nil drain or cordon is true

I can change the SUC to know whether to uncordon at the time the job is created (via a label most likely) instead of at job completion via plan lookup. This would cover the weird corner case of a plan being deleted right before it's job completed so the node ending up schedulable again without manual intervention. But, given the general-purpose nature of SUC, I am not confident that we can know whether or not a node can be reverted to schedulable if the application of plans on it has failed.

Another way of stating my concern: unless an upgrade job has completed successfully for a node, SUC has no reliable basis to assume that marking the node as schedulable again is warranted. This is because "failure" can happen anywhere in the upgrade job which applies arbitrary, imperative-mode mutations.

dweomer avatar May 11 '20 20:05 dweomer

Thanks for your explanation, after a second look at this I probably should have requested something like 'Don't cordon nodes unless the plan phase succeeds'.

I'm now assuming here that plan is doing what it suggests, which is that no (significant) changes to the node are done, it is preparing the node before applying any changes (I didn't notice any changes, but it failed early in my case).

If that assumption is correct, then it may make sense to do one or some of:

  • not cordon a node until plan is complete
    • plans can fail multiple times without blocking the node
    • probably easier to adapt existing code to this
  • cordon the node, count the number of failed plans, stop planning after x fails, uncordon the node
    • not failing forever, no resources wasted
    • requires a lot of additional code
    • cordon at the start probably has a use case I'm not aware of

Of course if I'm wrong about my assumptions then feel free to tell me to shut up, but in any case I really appreciate that you are attempting to solve automated updates like this!

stellirin avatar May 12 '20 06:05 stellirin

Thanks for your explanation, after a second look at this I probably should have requested something like 'Don't cordon nodes unless the plan phase succeeds'.

Not sure what you mean here. The cordon / drain is encoded as an init container (one or the other but never both) invoking kubectl that comes after the prepare container (init containers are run serially). Please note that the first thing kubectl drain does is what kubectl cordon does:

I'm now assuming here that plan is doing what it suggests, which is that no (significant) changes to the node are done, it is preparing the node before applying any changes (I didn't notice any changes, but it failed early in my case).

If that assumption is correct, then it may make sense to do one or some of:

  • not cordon a node until plan is complete

    • plans can fail multiple times without blocking the node
    • probably easier to adapt existing code to this
  • cordon the node, count the number of failed plans, stop planning after x fails, uncordon the node

    • not failing forever, no resources wasted
    • requires a lot of additional code
    • cordon at the start probably has a use case I'm not aware of

Something to keep in mind is that the cordon / drain is optional. If your installation doesn't benefit from it then you can disable it by deleting the .spec.drain in k3os-latest. If you do delete .spec.drain I recommend keeping the cordon behavior by setting .spec.cordon: true to prevent the inadvertent scheduling of something to the node right before a reboot.

Of course if I'm wrong about my assumptions then feel free to tell me to shut up, but in any case I really appreciate that you are attempting to solve automated updates like this!

You've given me something to think about for a possibly smarter implementation in the future and I appreciate that. In the meantime, assuming your primary concern is k3OS, keep an eye on https://github.com/rancher/k3os/issues/486 as this will entail a slightly smarter pair of plans working together to first upgrade node-role.kubernetes.io/master==true and then everything else. Part of this will be only cordoning the master nodes (the way rancher/rancher does via k3s-upgrade, see https://github.com/rancher/system-upgrade-controller/blob/v0.5.0/examples/k3s-upgrade.yaml#L22) but keeping the drain for workers, by default.

dweomer avatar May 26 '20 06:05 dweomer

What should I do for recovery the master node from Ready,SchedulingDisabled to Ready when upgrade plan is falling ?

AnnatarHe avatar Jun 24 '20 10:06 AnnatarHe