autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

Version Skew Policy

Open riconnon opened this issue 1 year ago • 7 comments

During upgrades of kubernetes API server and/or cluster autoscaler upgrade it is inevitable that the autoscaler and other kubernetes components must have a variance in version for at least a period of time. Looking at the various relevant docs the cluster autoscaler is not included in the overall Kubernetes Version Skew Policy here: https://kubernetes.io/releases/version-skew-policy/

Looking at the README on this repo I see

Some user reports indicate successful use of a newer version of Cluster Autoscaler with older clusters, however, there is always a chance that it won't work as expected.

Is there a supported process to upgrade both the cluster and the autoscaler such that it is expected to continue to work throughout? It seems like the CA needs to have an official version skew policy supporting at least one minor version in one direction from the API server.

riconnon avatar Jul 27 '22 11:07 riconnon

Hi, AFAIK It's strongly recommended to keep the minor version of the Cluster Autoscaler matching the version of the k8s cluster it's deployed to give the strong coupling of the logic between the two due to the vendoring of the scheduler into the CA. and there is no 1:1 matching of patch releases of the CA to k8s version.

Shubham82 avatar Jul 28 '22 07:07 Shubham82

/cc @gjtempleton

Shubham82 avatar Jul 28 '22 07:07 Shubham82

On a similar topic, do we need to ensure cluster-autoscaler image tag values when installing cluster-autoscaler via helm charts in eks clusters? cluster-autoscaler helm chat v9.19.2 default value for "image.tag" is "v1.23.0". Is the chart version compatible with eks v1.22 cluster and are there any instructions?

kumarpmd avatar Jul 28 '22 14:07 kumarpmd

@Shubham82 I understand that recommendation but if, for example, I am going to upgrade my cluster from 1.20 to 1.21 I need to do one of two things:

  1. I can ugprade my cluster (API server, kube controller manager, etc) to 1.21 first and have a window during which I have a 1.21 cluster and a 1.20 autoscaler
  2. I can upgrade my autoscaler first and have a window during which I have a 1.21 autoscaler and a 1.20 cluster

In either case, as far as I can tell, the autoscaler project is saying it doesn't "support" this setup, so how can I ever upgrade from 1.20 to 1.21 since it's impossible to upgrade the two things at the exact same time.

riconnon avatar Aug 01 '22 10:08 riconnon

cluster autoscaler chart for eks 1.22 is covered in issue#4850.. I will repost in that issue.. thanks

kumarpmd avatar Aug 01 '22 12:08 kumarpmd

@Shubham82 I understand that recommendation but if, for example, I am going to upgrade my cluster from 1.20 to 1.21 I need to do one of two things:

  1. I can ugprade my cluster (API server, kube controller manager, etc) to 1.21 first and have a window during which I have a 1.21 cluster and a 1.20 autoscaler
  2. I can upgrade my autoscaler first and have a window during which I have a 1.21 autoscaler and a 1.20 cluster

In either case, as far as I can tell, the autoscaler project is saying it doesn't "support" this setup, so how can I ever upgrade from 1.20 to 1.21 since it's impossible to upgrade the two things at the exact same time.

Hi @riconnon In FAQ, I found this How can I update CA dependencies (particularly k8s.io/kubernetes)?, See if it will answer your question. There is no upgrade guide mentioned under Cluster Autoscaler.

Shubham82 avatar Aug 02 '22 05:08 Shubham82

Hi, @gjtempleton Could please take a look?

Shubham82 avatar Aug 02 '22 05:08 Shubham82

I'm also trying to reconcile this with the charts. It's not clear to me what approach is best to ensure compatibility. Best I can tell there's two potential approaches:

  • Use the latest charts version to ensure we get any potential compatibility or security fixes and pin the specific Autoscaler version image to match the cluster as recommended.
  • Find the last charts version that matches the autoscaler version needed and use that. Would result in potentially pretty old charts versions being installed though that could in turn have issues themselves.

Would be great to have official guidance on this.

gygitlab avatar Sep 02 '22 09:09 gygitlab

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Dec 01 '22 10:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Dec 31 '22 11:12 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Jan 30 '23 11:01 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Jan 30 '23 11:01 k8s-ci-robot