upgrade-manager
upgrade-manager copied to clipboard
Feature: Post-Start hook to run custom check to determine if node is ready
Is this a BUG REPORT or FEATURE REQUEST?: FEATURE
In some setups it might be the case, when Node status is Ready
, but it still fails custom checks (i.e. those from kube node problem detector.
As of now, upgrade manager proceeds when node status is reported as ready
I'd like to implement support for custom checks.
One option that I see is to extend isNodeReady
check with a call to custom script that can perform other checks and it's exit status would indicate if node is ready.
Change required would be:
- Extend
RollingUpgradeSpec
to allow specify custom script (similar toPreDrain/PostDrain/PostTerminate
). - Call this script in
WaitForDesiredInstances
.
What do you think?
Does the node's status or some annotation/label indicate that the node was found to be in bad shape by the node problem detector? Having another script is okay, but if there is a more direct approach such as checking specific annotations/labels, it might be cleaner. WDYT?
Also, if the script fails, will it just be that that one node is skipped or will the rollingUpgrade object be marked as failed
? If it's just that the node is skipped, what will the end result of the rollingUpgrade be?
Also, this script will have the time-of-check-to-time-of-use problem; i.e. the node could get into a bad state after the script executes but before the node is actually drained and terminated (this is a problem right now as well with the isNodeReady
check).
Annotation / label would also work instead of running the custom check.
I think custom readiness gates is a great idea, as mentioned in slack, here is my proposal for the API:
spec:
readinessGates:
- matchLabels:
MyLabelKey: MyLabelValue
We can start with this single gate of matchLabels
, in the future we can add more gates as the use cases come up.
Implementation should make sure to apply this for all types of rollups (lazy / eager)