cluster-api
cluster-api copied to clipboard
Version validation according Kubernetes Version Skew Policy
User Story
As a operator I would like to ensure that creating/updating a Cluster/KubeadmControlPlane/MachineDeployment/MachineSet/Machine does not violate the [Kubernetes Version Skew Policy] for staying in supported upgrade paths and not break running applications.
Detailed Description
This issue proposes to add additional validation against the [Kubernetes Version Skew Policy].
Assuming version v1.X is the desired kubernetes version:
- For a KubeadmControlPlane or a ControlPlane Machine:
-
[...] the newest and oldest
kube-apiserverinstances must be within one minor version.
-
- For a MachineDeployment, MachineSet or workload Machine
-
[...] must not be newer than
kube-apiserver, and may be up to two minor versions older.
-
To-be-discussed:
- Should the Kubernetes Version Skew Policy validations only get implemented for ClusterClass based Clusters / Validation on
Cluster.spec.topology.version?- An arguments for that is: on ClusterClass based Clusters the topology and relations are well-defined with a top-down approach.
Anything else you would like to add:
-
There is already version validation at the ClusterClass based clusters (via
.spec.topology.version). xref- The current implementation does not allow to downgrade patch versions, which would be allowed from the Kubernetes Version Skew policy side.
-
Kubeadm also does validation against the Kubernetes Version Skew Policy, but this may not map to the current documentation at [Kubernetes Version Skew Policy].
/kind feature
Kubeadm also does validation against the Kubernetes Version Skew Policy, but this may not map to the current documentation at [Kubernetes Version Skew Policy].
the supported skew in kubeadm is documented here https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#version-skew-policy
that said it does not strictly follow some more controversial areas, such as the kube-proxy / kubelet skew.
I think there's a lot of overlap with this issue #7010 and existing issues e.g. https://github.com/kubernetes-sigs/cluster-api/issues/4321, https://github.com/kubernetes-sigs/cluster-api/issues/6614, https://github.com/kubernetes-sigs/cluster-api/issues/6040 - there might be more.
It would be good to ensure that the older issues are all covered by the newer ones, close the older ones and continue the conversation on these more holistic issues.
I agree for #6040 regarding this issue.
#4321 and #6614 are more related to #7010
Thanks for looking for duplicates; I have consolidated everything in https://github.com/kubernetes-sigs/cluster-api/issues/7010 and in this issue
Q: Should the validation only consider the spec.version or should it also consider the current status?
E.g. if we only validate the spec it's possible to bump the version two minor versions before the rollout is finished.
Q: Should the validation only consider the spec.version or should it also consider the current status?
E.g. if we only validate the spec it's possible to bump the version two minor versions before the rollout is finished.
Maybe yes, maybe no, also related here: #6651 .
/triage accepted
/help
@fabriziopandini: This request has been marked as needing help from a contributor.
Guidelines
Please ensure that the issue body includes answers to the following questions:
- Why are we solving this issue?
- To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
- Does this issue have zero to low barrier of entry?
- How can the assignee reach out to you for help?
For more details on the requirements of such an issue, please see here and ensure that they are met.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.
In response to this:
/help
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/assign
Do we have a clear idea what exactly we want to implement and where? (just asking, because it would be good to have consensus on that before we go back and forth on an implementation)
+1 to nail down some details first @ramineni is it ok for your to some comments here as soon as you make up your mind around this topic/before starting implementation?
Do we have a clear idea what exactly we want to implement and where? (just asking, because it would be good to have consensus on that before we go back and forth on an implementation)
Have not gone that much far, just started to explore the details . I'll sure discuss here or in slack before going for implementation.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.
This bot triages PRs according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the PR is closed
You can:
- Mark this PR as fresh with
/remove-lifecycle stale - Close this PR with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
- Confirm that this issue is still relevant with
/triage accepted(org members only) - Close this issue with
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/priority important-longterm
The Cluster API project currently lacks enough active contributors to adequately respond to all issues and PRs.
We mitigated the issue with machine set preflight checks and we highly reccommend users to turn them on. /close
@fabriziopandini: Closing this issue.
In response to this:
The Cluster API project currently lacks enough active contributors to adequately respond to all issues and PRs.
We mitigated the issue with machine set preflight checks and we highly reccommend users to turn them on. /close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.