cluster-api
cluster-api copied to clipboard
Add MachineNodeHealthyCondition to Machine’s Ready condition
User Story
As a user, I would like to get a summary of machine status by looking at the machine's ready condition
Detailed Description
https://github.com/kubernetes-sigs/cluster-api/pull/3670 & https://github.com/kubernetes-sigs/cluster-api/pull/3890 introduced the NodeHealthy conditions on machines, however as per discussion https://github.com/kubernetes-sigs/cluster-api/pull/3670#discussion_r497470878 we decided to not include this condition in the machine's ready condition and to open an issue outlining the contract changes for v1alpha4.
The impact of this change, if implemented, is that a machine will not get ready until a CNI is installed in the cluster.
Is there any objection to implementing this in v1alpha4? Should we consider backporting?
Anything else you would like to add:
From the conditions proposal:
- A Ready condition SHOULD be provided at object level to represent the overall operational state of the component (and IMO for a machine, this includes having the node with a CNI)
- The Ready condition MUST be based on the summary of more detailed conditions existing on the same object
- An object SHOULD NEVER be in status Ready=True if one of the object's conditions are false or if one of the object dependents is in status Ready=False.
/kind feature
/milestone v0.4.0 /area machine
IMO adding CNI to cluster sounds like a necessary step to make the cluster usable. And if we always want the cluster to have some CNI solution installed, then making sure that the Machine's Ready condition reflects that prerequisite only solidifies that contract.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale
@vincepri @CecileRobertMichon opinions about including this in v1alpha4?
This shouldn't be a breaking change correct? I'm +1 on getting this in as soon as we can but I don't think it's blocking for v0.4.0
Given that we are going to change the semantic of the Ready condition for the machines, from a certain PoV this could be considered breaking (this was at least what we assumed when we decided to not include the the condition in the summary during v1alpha3)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/lifecycle frozen
/triage accepted
/help
@fabriziopandini: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
In response to this:
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Is anyone looking into this? I would love to help get this implemented.
@zawachte AFAIK no one is driving this discussion. Some time ago I have tried to revive it writing https://docs.google.com/document/d/1hBQnWWa5d16FOslNhDwYVOhcMjLIul4tMeUgh4maI3w/edit?usp=sharing, but without success 😢
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
- Confirm that this issue is still relevant with
/triage accepted
(org members only) - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/priority important-longterm
We should fix this /triage accepted