cluster-api
cluster-api copied to clipboard
Introduce distributionVersion field for improved Kubernetes Distribution version handling
What would you like to be added (User Story)?
As a user I would like to have a distributionVersion field to better handle versioning of the kubernetes distribution being installed
Detailed Description
Currently, the handling of Kubernetes versions in the version field is problematic when installing or upgrading a distribution of Kubernetes that utilizes its own version scheme and lifecycle. A distribution can include more software components than just Kubernetes, and/or a distinct support lifecycle that includes the release of fixes and other changes on its own timeline. Those characteristics can necessitate the use of an independent version scheme. The version field, present in multiple resources: ControlPlane, MachineSpec, Cluster Topology, specifically represents the Kubernetes version. In order to control the cluster’s distribution version, we need to specify its value for the ControlPlane and MachineSet/MachineDeployment objects involved. This is problematic when a user wants to deploy a specific version of a kubernetes distribution, they cannot specify the version, but it should be calculated based on the related kubernetes version (which is not always possible in any case).
To address this, we propose introducing a new field, spec.distributionVersion in the following resources:
- ControlPlane (ControlPlane.Spec.Version)
- MachineSpec (Machine.Spec.Version /MachineSet.Spec.Template.Spec.Version/MachineDeploymen.Spec.Template.Spec.Version/MachinePool.Spec.Template.Spec.Version)
- Topology (Cluster.Topology.Version)
The new field and the current spec.version would be mutually exclusive.
This would be an optional field for both the ControlPlane contract and MachineSpec. Its value would be the distribution version (e.g. for OpenShift it should be something like v4.17.0). No version semantics will be imposed on the field.
If distributionVersion is present, the status.version should be populated with the related kubernetes version (e.g. for OpenShift with distributionVersion with value 4.17.0, status.version should be 1.30.4).
All logic that makes a decision based on evaluating the kubernetes version should rely on the status field instead of the spec field.
Related:
https://github.com/kubernetes-sigs/cluster-api/pull/11564
/cc @fabriziopandini @enxebre @sbueringer
Anything else you would like to add?
No response
Label(s) to be applied
/kind feature /area api
This issue is currently awaiting triage.
If CAPI contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.
The triage/accepted label can be added by org members by writing /triage accepted in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.
Thanks for filing this issue as a follow up of the discussion on the PR!
Somehow related, we should consider also KEP-4330: Compatibility Versions in Kubernetes which is introducing emulated-versions and min compatibility version as a key info to influence the cluster behaviour (see user stories in the KEP)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Discussed in the 22nd of May office hours
The new field and the current spec.version would be mutually exclusive.
Unfortunately, the idea of leaving the spec.version field under some circumstances is going to create many problems in CAPI.
We need a plan (possibily not invasive) to address the fact that CAPI is full of code paths where the code assumes that version is set and represents a K8s version, and this is at the core of fundationational construct like the entire upgrade process and all the relelated test machinery.
Hopefully previous comments on the same topic might help in starting the work on this plan
- https://github.com/kubernetes-sigs/cluster-api/pull/11564#issuecomment-2538767839
- https://github.com/kubernetes-sigs/cluster-api/pull/11564#issuecomment-2548223084
- https://github.com/kubernetes-sigs/cluster-api/pull/11564#issuecomment-2579184582
Also, worth to recall https://github.com/kubernetes-sigs/cluster-api/issues/11816#issuecomment-2643846693 from above and to bring up possible impacts on the ongoing discussion in https://github.com/kubernetes-sigs/cluster-api/pull/12199
Unfortunately, the idea of leaving the spec.version field under some circumstances is going to create many problems in CAPI.
Would moving it to status (mirror it when spec is defined, and derive from distributionVersion when not) be too disruptive? I totally understand that K8s version should always be available, that was one of the target of this initial proposal.
In any case I'll take all this into account and get back with a possibly improved and more detailed proposal, thanks for the feedback - much appreciated 🙏
If distributionVersion is present, the status.version should be populated with the related kubernetes version (e.g. for OpenShift with distributionVersion with value 4.17.0, status.version should be 1.30.4).
What component would be responsible for this?
Would moving it to status (mirror it when spec is defined, and derive from distributionVersion when not) be too disruptive? I totally understand that K8s version should always be available, that was one of the target of this initial proposal.
This could be beneficial for more than just the proposal here. Do we do any validation of upgrades at the moment? If we had the option of spec (I want this) and status (controller has verified through some rules that the upgrade is permitted), then that could unlock more in the way of pre-flight checks couldn't it?
I know within OpenShift for example we have similar patterns in other places where a user requests "I want this" and we validate that transition before accepting, and then the controllers observe the status where we've said "yes, we verify this transition is acceptable"
What component would be responsible for this?
I think for each CR it's owner should be responsible for updating this field. However we need a component that would resolve the k8s version for each distribution version. I don't think this task is fitting particularly in any currently defined component.
If we want to discuss implementation, what I had in mind is to have a thin controller (optional in general, but mandatory for distributionVersion usage) maintaining a new CRD DistributionVersionSet which would have a list of distributionVersion/k8s version pairs, and each controller could watch that resource to then update its status field - something like the following:
apiVersion: cluster.x-k8s.io/v1beta1
kind: DistributionVersionSet
metadata:
name: ocp-distribution-versions
spec:
versions:
- distributionVersion: 4.18.0
version: 1.31.1
- distributionVersion: 4.19.4
version: 1.33.4
This way Cluster, MachineDeployment, Controlplane, etc controllers could just look up the values and fill the status of their own resources.
I like the idea of abstracting this away into something that components can look up, that makes sense. I do wonder though, how would the components know which distribution version set they need to look at? Would there be some configuration on the cluster that would reference the correct set for example?
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Reopen this issue with
/reopen - Mark this issue as fresh with
/remove-lifecycle rotten - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
In response to this:
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied- After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied- After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closedYou can:
- Reopen this issue with
/reopen- Mark this issue as fresh with
/remove-lifecycle rotten- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.