upgrade-manager Feature: Post-Start hook to run custom check to determine if node is ready

Feature: Post-Start hook to run custom check to determine if node is ready

Open uthark opened this issue 4 years ago • 3 comments

Is this a BUG REPORT or FEATURE REQUEST?: FEATURE

In some setups it might be the case, when Node status is Ready, but it still fails custom checks (i.e. those from kube node problem detector.

As of now, upgrade manager proceeds when node status is reported as ready

I'd like to implement support for custom checks. One option that I see is to extend isNodeReady check with a call to custom script that can perform other checks and it's exit status would indicate if node is ready.

Change required would be:

Extend RollingUpgradeSpec to allow specify custom script (similar to PreDrain/PostDrain/PostTerminate).
Call this script in WaitForDesiredInstances.

What do you think?

Nov 02 '20 19:11 uthark

Does the node's status or some annotation/label indicate that the node was found to be in bad shape by the node problem detector? Having another script is okay, but if there is a more direct approach such as checking specific annotations/labels, it might be cleaner. WDYT?

Also, if the script fails, will it just be that that one node is skipped or will the rollingUpgrade object be marked as failed? If it's just that the node is skipped, what will the end result of the rollingUpgrade be?

Also, this script will have the time-of-check-to-time-of-use problem; i.e. the node could get into a bad state after the script executes but before the node is actually drained and terminated (this is a problem right now as well with the isNodeReady check).

Nov 02 '20 19:11 shrinandj

Annotation / label would also work instead of running the custom check.

Nov 03 '20 02:11 uthark

I think custom readiness gates is a great idea, as mentioned in slack, here is my proposal for the API:

spec:
  readinessGates:
    - matchLabels:
        MyLabelKey: MyLabelValue

We can start with this single gate of matchLabels, in the future we can add more gates as the use cases come up.

Implementation should make sure to apply this for all types of rollups (lazy / eager)

Nov 03 '20 18:11 eytan-avisror

upgrade-manager upgrade-manager copied to clipboard

Feature: Post-Start hook to run custom check to determine if node is ready

upgrade-manager
upgrade-manager copied to clipboard