cluster-api icon indicating copy to clipboard operation
cluster-api copied to clipboard

Version validation according Kubernetes Version Skew Policy

Open chrischdi opened this issue 2 years ago • 16 comments

User Story

As a operator I would like to ensure that creating/updating a Cluster/KubeadmControlPlane/MachineDeployment/MachineSet/Machine does not violate the [Kubernetes Version Skew Policy] for staying in supported upgrade paths and not break running applications.

Detailed Description

This issue proposes to add additional validation against the [Kubernetes Version Skew Policy].

Assuming version v1.X is the desired kubernetes version:

  • For a KubeadmControlPlane or a ControlPlane Machine:
    • [...] the newest and oldest kube-apiserver instances must be within one minor version.

  • For a MachineDeployment, MachineSet or workload Machine
    • [...] must not be newer than kube-apiserver, and may be up to two minor versions older.

To-be-discussed:

  • Should the Kubernetes Version Skew Policy validations only get implemented for ClusterClass based Clusters / Validation on Cluster.spec.topology.version?
    • An arguments for that is: on ClusterClass based Clusters the topology and relations are well-defined with a top-down approach.

Anything else you would like to add:

  • There is already version validation at the ClusterClass based clusters (via .spec.topology.version). xref

    • The current implementation does not allow to downgrade patch versions, which would be allowed from the Kubernetes Version Skew policy side.
  • Kubeadm also does validation against the Kubernetes Version Skew Policy, but this may not map to the current documentation at [Kubernetes Version Skew Policy].

/kind feature

chrischdi avatar Aug 04 '22 12:08 chrischdi

Kubeadm also does validation against the Kubernetes Version Skew Policy, but this may not map to the current documentation at [Kubernetes Version Skew Policy].

the supported skew in kubeadm is documented here https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#version-skew-policy

that said it does not strictly follow some more controversial areas, such as the kube-proxy / kubelet skew.

neolit123 avatar Aug 04 '22 12:08 neolit123

I think there's a lot of overlap with this issue #7010 and existing issues e.g. https://github.com/kubernetes-sigs/cluster-api/issues/4321, https://github.com/kubernetes-sigs/cluster-api/issues/6614, https://github.com/kubernetes-sigs/cluster-api/issues/6040 - there might be more.

It would be good to ensure that the older issues are all covered by the newer ones, close the older ones and continue the conversation on these more holistic issues.

killianmuldoon avatar Aug 04 '22 12:08 killianmuldoon

I agree for #6040 regarding this issue.

#4321 and #6614 are more related to #7010

chrischdi avatar Aug 04 '22 12:08 chrischdi

Thanks for looking for duplicates; I have consolidated everything in https://github.com/kubernetes-sigs/cluster-api/issues/7010 and in this issue

fabriziopandini avatar Aug 04 '22 14:08 fabriziopandini

Q: Should the validation only consider the spec.version or should it also consider the current status?

E.g. if we only validate the spec it's possible to bump the version two minor versions before the rollout is finished.

sbueringer avatar Aug 16 '22 11:08 sbueringer

Q: Should the validation only consider the spec.version or should it also consider the current status?

E.g. if we only validate the spec it's possible to bump the version two minor versions before the rollout is finished.

Maybe yes, maybe no, also related here: #6651 .

chrischdi avatar Aug 16 '22 12:08 chrischdi

/triage accepted

fabriziopandini avatar Sep 30 '22 19:09 fabriziopandini

/help

fabriziopandini avatar Sep 30 '22 19:09 fabriziopandini

@fabriziopandini: This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Sep 30 '22 19:09 k8s-ci-robot

/assign

ramineni avatar Oct 19 '22 06:10 ramineni

Do we have a clear idea what exactly we want to implement and where? (just asking, because it would be good to have consensus on that before we go back and forth on an implementation)

sbueringer avatar Oct 19 '22 08:10 sbueringer

+1 to nail down some details first @ramineni is it ok for your to some comments here as soon as you make up your mind around this topic/before starting implementation?

fabriziopandini avatar Oct 19 '22 08:10 fabriziopandini

Do we have a clear idea what exactly we want to implement and where? (just asking, because it would be good to have consensus on that before we go back and forth on an implementation)

Have not gone that much far, just started to explore the details . I'll sure discuss here or in slack before going for implementation.

ramineni avatar Oct 19 '22 09:10 ramineni

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Feb 08 '23 09:02 k8s-triage-robot

/remove-lifecycle stale

chrischdi avatar Feb 08 '23 11:02 chrischdi

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-triage-robot avatar Mar 20 '24 06:03 k8s-triage-robot

/priority important-longterm

fabriziopandini avatar Apr 12 '24 13:04 fabriziopandini

The Cluster API project currently lacks enough active contributors to adequately respond to all issues and PRs.

We mitigated the issue with machine set preflight checks and we highly reccommend users to turn them on. /close

fabriziopandini avatar Apr 24 '24 13:04 fabriziopandini

@fabriziopandini: Closing this issue.

In response to this:

The Cluster API project currently lacks enough active contributors to adequately respond to all issues and PRs.

We mitigated the issue with machine set preflight checks and we highly reccommend users to turn them on. /close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 24 '24 13:04 k8s-ci-robot