autoscaler Version Skew Policy

Version Skew Policy

Open riconnon opened this issue 1 year ago • 7 comments

During upgrades of kubernetes API server and/or cluster autoscaler upgrade it is inevitable that the autoscaler and other kubernetes components must have a variance in version for at least a period of time. Looking at the various relevant docs the cluster autoscaler is not included in the overall Kubernetes Version Skew Policy here: https://kubernetes.io/releases/version-skew-policy/

Looking at the README on this repo I see

Some user reports indicate successful use of a newer version of Cluster Autoscaler with older clusters, however, there is always a chance that it won't work as expected.

Is there a supported process to upgrade both the cluster and the autoscaler such that it is expected to continue to work throughout? It seems like the CA needs to have an official version skew policy supporting at least one minor version in one direction from the API server.

Jul 27 '22 11:07 riconnon

Hi, AFAIK It's strongly recommended to keep the minor version of the Cluster Autoscaler matching the version of the k8s cluster it's deployed to give the strong coupling of the logic between the two due to the vendoring of the scheduler into the CA. and there is no 1:1 matching of patch releases of the CA to k8s version.

Jul 28 '22 07:07 Shubham82

/cc @gjtempleton

Jul 28 '22 07:07 Shubham82

On a similar topic, do we need to ensure cluster-autoscaler image tag values when installing cluster-autoscaler via helm charts in eks clusters? cluster-autoscaler helm chat v9.19.2 default value for "image.tag" is "v1.23.0". Is the chart version compatible with eks v1.22 cluster and are there any instructions?

Jul 28 '22 14:07 kumarpmd

@Shubham82 I understand that recommendation but if, for example, I am going to upgrade my cluster from 1.20 to 1.21 I need to do one of two things:

I can ugprade my cluster (API server, kube controller manager, etc) to 1.21 first and have a window during which I have a 1.21 cluster and a 1.20 autoscaler
I can upgrade my autoscaler first and have a window during which I have a 1.21 autoscaler and a 1.20 cluster

In either case, as far as I can tell, the autoscaler project is saying it doesn't "support" this setup, so how can I ever upgrade from 1.20 to 1.21 since it's impossible to upgrade the two things at the exact same time.

Aug 01 '22 10:08 riconnon

cluster autoscaler chart for eks 1.22 is covered in issue#4850.. I will repost in that issue.. thanks

Aug 01 '22 12:08 kumarpmd

@Shubham82 I understand that recommendation but if, for example, I am going to upgrade my cluster from 1.20 to 1.21 I need to do one of two things:

I can ugprade my cluster (API server, kube controller manager, etc) to 1.21 first and have a window during which I have a 1.21 cluster and a 1.20 autoscaler

I can upgrade my autoscaler first and have a window during which I have a 1.21 autoscaler and a 1.20 cluster

In either case, as far as I can tell, the autoscaler project is saying it doesn't "support" this setup, so how can I ever upgrade from 1.20 to 1.21 since it's impossible to upgrade the two things at the exact same time.

Hi @riconnon In FAQ, I found this How can I update CA dependencies (particularly k8s.io/kubernetes)?, See if it will answer your question. There is no upgrade guide mentioned under Cluster Autoscaler.

Aug 02 '22 05:08 Shubham82

Hi, @gjtempleton Could please take a look?

Aug 02 '22 05:08 Shubham82

I'm also trying to reconcile this with the charts. It's not clear to me what approach is best to ensure compatibility. Best I can tell there's two potential approaches:

Use the latest charts version to ensure we get any potential compatibility or security fixes and pin the specific Autoscaler version image to match the cluster as recommended.
Find the last charts version that matches the autoscaler version needed and use that. Would result in potentially pretty old charts versions being installed though that could in turn have issues themselves.

Would be great to have official guidance on this.

Sep 02 '22 09:09 gygitlab

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Dec 01 '22 10:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

Dec 31 '22 11:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Jan 30 '23 11:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Jan 30 '23 11:01 k8s-ci-robot

autoscaler autoscaler copied to clipboard

Version Skew Policy

autoscaler
autoscaler copied to clipboard