cluster-api Add MachineNodeHealthyCondition to Machine’s Ready condition

User Story

As a user, I would like to get a summary of machine status by looking at the machine's ready condition

Detailed Description

https://github.com/kubernetes-sigs/cluster-api/pull/3670 & https://github.com/kubernetes-sigs/cluster-api/pull/3890 introduced the NodeHealthy conditions on machines, however as per discussion https://github.com/kubernetes-sigs/cluster-api/pull/3670#discussion_r497470878 we decided to not include this condition in the machine's ready condition and to open an issue outlining the contract changes for v1alpha4.

The impact of this change, if implemented, is that a machine will not get ready until a CNI is installed in the cluster.

Is there any objection to implementing this in v1alpha4? Should we consider backporting?

Anything else you would like to add:

From the conditions proposal:

A Ready condition SHOULD be provided at object level to represent the overall operational state of the component (and IMO for a machine, this includes having the node with a CNI)
The Ready condition MUST be based on the summary of more detailed conditions existing on the same object
An object SHOULD NEVER be in status Ready=True if one of the object's conditions are false or if one of the object dependents is in status Ready=False.

/kind feature

Dec 02 '20 15:12 fabriziopandini

/milestone v0.4.0 /area machine

Dec 02 '20 15:12 fabriziopandini

IMO adding CNI to cluster sounds like a necessary step to make the cluster usable. And if we always want the cluster to have some CNI solution installed, then making sure that the Machine's Ready condition reflects that prerequisite only solidifies that contract.

Dec 02 '20 19:12 srm09

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

Mar 02 '21 19:03 fejta-bot

/remove-lifecycle stale

Mar 02 '21 19:03 fabriziopandini

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

May 31 '21 20:05 fejta-bot

@vincepri @CecileRobertMichon opinions about including this in v1alpha4?

Jun 01 '21 10:06 fabriziopandini

This shouldn't be a breaking change correct? I'm +1 on getting this in as soon as we can but I don't think it's blocking for v0.4.0

Jun 01 '21 17:06 CecileRobertMichon

Given that we are going to change the semantic of the Ready condition for the machines, from a certain PoV this could be considered breaking (this was at least what we assumed when we decided to not include the the condition in the summary during v1alpha3)

Jun 03 '21 10:06 fabriziopandini

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Sep 01 '21 10:09 k8s-triage-robot

/lifecycle frozen

Sep 02 '21 08:09 fabriziopandini

/triage accepted

Oct 03 '22 17:10 fabriziopandini

/help

Oct 03 '22 20:10 fabriziopandini

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Oct 03 '22 20:10 k8s-ci-robot

Is anyone looking into this? I would love to help get this implemented.

Oct 13 '22 02:10 zawachte

@zawachte AFAIK no one is driving this discussion. Some time ago I have tried to revive it writing https://docs.google.com/document/d/1hBQnWWa5d16FOslNhDwYVOhcMjLIul4tMeUgh4maI3w/edit?usp=sharing, but without success 😢

Oct 17 '22 10:10 fabriziopandini

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

Jan 19 '24 02:01 k8s-triage-robot

/priority important-longterm

Apr 12 '24 14:04 fabriziopandini

We should fix this /triage accepted

Apr 22 '24 12:04 fabriziopandini

cluster-api cluster-api copied to clipboard

Add MachineNodeHealthyCondition to Machine’s Ready condition

Guidelines

cluster-api
cluster-api copied to clipboard