cluster-api
cluster-api copied to clipboard
KubeadmControlPlane should provide an explanation when it decides a Machine should be replaced
User Story
As a user I would like to understand why KubeadmControlPlane (KCP) decides that a Machine needs to be replaced (in KCP parlance, the Machine "needs rollout").
Today, I see only that KCP decides the Machine needs to be replaced, but no reason as to why:
I1029 19:36:57.752152 1 controller.go:241] controller/kubeadmcontrolplane "msg"="Reconcile KubeadmControlPlane" "cluster"="dlipovetsky-adopt-48fd" "name"="dlipovetsky-adopt-48fd-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane"
I1029 19:36:58.673234 1 controller.go:328] controller/kubeadmcontrolplane "msg"="Rolling out Control Plane machines" "cluster"="dlipovetsky-adopt-48fd" "name"="dlipovetsky-adopt-48fd-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane" "needRollout"=["dlipovetsky-adopt-48fd-control-plane-0"]
This is, of course, not enough information to understand the decision.
Detailed Description
Anything else you would like to add:
KCP makes the decision by applying filter functions. These functions return a boolean. I'm working on a proof of concept change to the filter API so that the functions return a explanation along with a boolean.
/kind feature
/area control-plane
/help /milestone v1.1
@vincepri: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
/help /milestone v1.1
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale - Mark this issue or PR as rotten with
/lifecycle rotten - Close this issue or PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/lifecycle frozen
I found this interesting and was wondering if this is an issue that I could take up to start contributing to CAPI, I'm ramping up on it, but if this isn't a beginner friendly task/is time critical, do let me know. Otherwise I'll go ahead and assign myself!
@RaghavRoy145 unpacking KCP machine filters is not the easiest task to start with, but complexity is something really opinionated..
Thanks @fabriziopandini , and you're right.
/assign
I started work on this in https://github.com/dlipovetsky/cluster-api/tree/dlipovetsky/verbose-filters-v1alpha4. I didn't have time to stick with it. I'd be happy to collaborate with you, if you'd like, @RaghavRoy145 .
I would love to collaborate, I'm pretty sure I'll need all the help I can get! 😄
/triage accepted /unassign @RaghavRoy145
/assign
I have something locally which comes close. I'll create a PR when I get to it (might not be very soon)
Note to myself: first very hacky version with a bunch of other stuff mixed in can be found here: https://github.com/sbueringer/cluster-api/commits/pr-improve-kcp-logging