autoscaler
autoscaler copied to clipboard
Which standards does the VPA conform to regarding releases?
When developing on the VPA, we need to figure out how to handle the rollout of new features such that we can handle compatibility between components (VPA components, and kube components). We also need realise that a user may need to downgrade from time to time.
Kubernetes has this documented here: https://kubernetes.io/releases/version-skew-policy/
Additionally, the VPA has its own bundled CRDs. This makes it different from k/k.
We should decide what we want to aim for and document it, so that as we develop new features we ensure that we comply to these standards.
/area vertical-pod-autoscaler
@voelzmo's comment in the SIG meeting today was really interesting.
Should we split out the lifecycle of the CRD from the VPA binaries?
eg.
- VPA CRD changes must always be backwards compatible (unless there is a deprecation of an entire version for example).
- VPA binaries should be "downgradable" but may result in loss of functionality (for example alpha features). Downgrading should NOT cause unrecoverable issues (eg. crashloops).
Should we split out the lifecycle of the CRD from the VPA binaries?
eg.
- VPA CRD changes must always be backwards compatible (unless there is a deprecation of an entire version for example).
- VPA binaries should be "downgradable" but may result in loss of functionality (for example alpha features). Downgrading should NOT cause unrecoverable issues (eg. crashloops).
I like this.
How many versions back can a user downgrade? My guess is only a single version.
Regarding kube-apiserver support, I assume we officially support the 3 latest minor releases?
How many versions back can a user downgrade? My guess is only a single version.
Right, I think we would only officially support the n-1'th version.
Regarding kube-apiserver support, I assume we officially support the 3 latest minor releases?
This one is a tricky question... Once we graduate In-Place updates to Beta, we will have to change our minimum supported version to 1.33. Does that mean we need to continue supporting previous versions of VPA for the other minor versions? This one is tricky...
I would really prefer not having to split the release into one per minor version (like CA does today) as that adds a ton of overhead.
- VPA CRD changes must always be backwards compatible (unless there is a deprecation of an entire version for example).
+1 on that.
IMHO, I think it's better to require newer versions of Kubernetes instead of keeping several different versions as branches. This might be a bit harder for some users, but it will help keep the project easier to manage in the long run.
Once we graduate In-Place updates to Beta
Beta would still allow to disable a featuregate, right? Or do we not want to offer this option, such that Beta is actually GA? ;)
VPA CRD changes must always be backwards compatible
I'm not sure what that means. Are you suggesting all CRD changes need to happen in a way such that you could switch to a previous version of the CRD and it magically still works without migrating anything that was stored with the new version in etcd?
Once we graduate In-Place updates to Beta
Beta would still allow to disable a featuregate, right? Or do we not want to offer this option, such that Beta is actually GA? ;)
If we conform to what k/k does, beta does allowing a user to disable a feature gate.
This one is a tricky question... Once we graduate In-Place updates to Beta, we will have to change our minimum supported version to 1.33. Does that mean we need to continue supporting previous versions of VPA for the other minor versions? This one is tricky...
I guess what it means is that we can't graduate In-Place to beta, until the minimum supported Kubernetes version is 1.33?
| VPA | Kubernetes |
|---|---|
| 1.4.0 | 1.33 |
| 1.5.0 | 1.34 |
| 1.6.0 | 1.35 |
In-Place was added in VPA 1.4.0. I guess we can only enable in-place by default in 1.6.0, since that will support 1.33 -> 1.35?
Having a look at the k/k documentation, I feel like VPA is closer to kube-controller-manager than all the other components, and it allows backwards compatibility by a single version (see https://kubernetes.io/releases/version-skew-policy/#kube-controller-manager-kube-scheduler-and-cloud-controller-manager). May be we should change the policy to be similar?
Proposal:
- We change the version numbers of the VPA to match that of Kubernetes (cc: https://github.com/kubernetes/autoscaler/issues/5759)
- Note: In the linked issue Marco describes some negatives to this approach, which are valid points.
- Each release supports the matching version of Kubernetes AND the previous minor version
- ie: VPA 1.35 will be supported on Kubernetes 1.35 and 1.34, but not 1.33.
- We maintain the last 3 VPA minor releases
- ie: if the latest is 1.35, we do security patches for 1.35, 1.34 and 1.33
The one advantage here is that when we add a new feature, such as in-place, we only need to wait 1 Kubernetes release before we enable features by default.
ie:
- Kubernetes 1.32 is released with in-place, off by default - VPA 1.32 does nothing
- Kubernetes 1.33 is released with in-place, on by default - VPA 1.33 adds in-place in alpha, off by default
- Kubernetes 1.34 is released with in-place, on by default - VPA 1.34 changes in-place to beta, on by default
I agree that changing the VPA version numbers to match Kubernetes makes things easier. But jumping from VPA 1.4 to 1.33 feels a bit strange and might confuse people.
I suggest we add a table to make things clearer. something like:
VPA Version | Supported Kubernetes Versions | Feature Flags (Default)
------------|-------------------------------|--------------------------
1.4 | 1.33, 1.32 | in-place: alpha (off)
This way, everyone can quickly see which VPA version works with which Kubernetes versions, and what features are enabled by default.
I agree that changing the VPA version numbers to match Kubernetes makes things easier. But jumping from VPA 1.4 to 1.33 feels a bit strange and might confuse people.
Yup, very valid. I was merging 2 thoughts. May be we should keep the two discussions separate. I agere with the table that you posted.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/staleis applied - After 30d of inactivity since
lifecycle/stalewas applied,lifecycle/rottenis applied - After 30d of inactivity since
lifecycle/rottenwas applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle rotten - Close this issue with
/close - Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten