cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

KubeadmControlPlane should provide an explanation when it decides a Machine should be replaced

Open dlipovetsky opened this issue 3 years ago • 12 comments

User Story

As a user I would like to understand why KubeadmControlPlane (KCP) decides that a Machine needs to be replaced (in KCP parlance, the Machine "needs rollout").

Today, I see only that KCP decides the Machine needs to be replaced, but no reason as to why:

I1029 19:36:57.752152       1 controller.go:241] controller/kubeadmcontrolplane "msg"="Reconcile KubeadmControlPlane" "cluster"="dlipovetsky-adopt-48fd" "name"="dlipovetsky-adopt-48fd-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane" 
I1029 19:36:58.673234       1 controller.go:328] controller/kubeadmcontrolplane "msg"="Rolling out Control Plane machines" "cluster"="dlipovetsky-adopt-48fd" "name"="dlipovetsky-adopt-48fd-control-plane" "namespace"="default" "reconciler group"="controlplane.cluster.x-k8s.io" "reconciler kind"="KubeadmControlPlane" "needRollout"=["dlipovetsky-adopt-48fd-control-plane-0"]

This is, of course, not enough information to understand the decision.

Detailed Description

Anything else you would like to add:

KCP makes the decision by applying filter functions. These functions return a boolean. I'm working on a proof of concept change to the filter API so that the functions return a explanation along with a boolean.

/kind feature

dlipovetsky avatar Nov 02 '21 00:11 dlipovetsky

/area control-plane

randomvariable avatar Nov 02 '21 14:11 randomvariable

/help /milestone v1.1

vincepri avatar Nov 02 '21 14:11 vincepri

@vincepri: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/help /milestone v1.1

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 02 '21 14:11 k8s-ci-robot

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jan 31 '22 14:01 k8s-triage-robot

/lifecycle frozen

vincepri avatar Jan 31 '22 16:01 vincepri

I found this interesting and was wondering if this is an issue that I could take up to start contributing to CAPI, I'm ramping up on it, but if this isn't a beginner friendly task/is time critical, do let me know. Otherwise I'll go ahead and assign myself!

RaghavRoy145 avatar Jun 24 '22 19:06 RaghavRoy145

@RaghavRoy145 unpacking KCP machine filters is not the easiest task to start with, but complexity is something really opinionated..

fabriziopandini avatar Jun 25 '22 19:06 fabriziopandini

Thanks @fabriziopandini , and you're right.

RaghavRoy145 avatar Jun 26 '22 02:06 RaghavRoy145

/assign

RaghavRoy145 avatar Jun 26 '22 02:06 RaghavRoy145

I started work on this in https://github.com/dlipovetsky/cluster-api/tree/dlipovetsky/verbose-filters-v1alpha4. I didn't have time to stick with it. I'd be happy to collaborate with you, if you'd like, @RaghavRoy145 .

dlipovetsky avatar Jun 27 '22 15:06 dlipovetsky

I would love to collaborate, I'm pretty sure I'll need all the help I can get! 😄

RaghavRoy145 avatar Jun 27 '22 17:06 RaghavRoy145

/triage accepted /unassign @RaghavRoy145

fabriziopandini avatar Oct 03 '22 19:10 fabriziopandini

/assign

I have something locally which comes close. I'll create a PR when I get to it (might not be very soon)

Note to myself: first very hacky version with a bunch of other stuff mixed in can be found here: https://github.com/sbueringer/cluster-api/commits/pr-improve-kcp-logging

sbueringer avatar Mar 15 '23 08:03 sbueringer