autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

[Exoscale] Unable to resize node pools

Open mpalu opened this issue 3 years ago • 3 comments

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

Cluster Autoscaler v1.23.0 Kubernetes 1.22.8

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:51:05Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.8", GitCommit:"7061dbbf75f9f82e8ab21f9be7e8ffcaae8e0d44", GitTreeState:"clean", BuildDate:"2022-03-16T14:04:34Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

What did you expect to happen?:

At the beginning of this week CA used to work as expected, able to resize Exoscale SKS managed instance pools but it just stopped to work, very slow in comparison to Auto Scaler running on EKS though (CA on Exoscale takes ~5-15 min to scale up a new node, whereas it takes less than 1 min on AWS - this is other topic).

Now, CA on Exoscale isn't able to resize the instance pools, throwing error 403 and 9999 exception by Exoscale API. We are receiving the following error, indicating a problem or a change in the Exoscale API:

Scale-up failed for group 18e037a4-1e08-4f44-ac72-1285f4cf973d: API error ErrorCode(403) 403 (ServerAPIException 9999): Operation scaleInstancePool on resource 18e037a4-1e08-4f44-ac72-1285f4cf973d is forbidden - reason: Locked by nodepool 00edf882-3199-459b-8d55-ea330776803e on cluster 9745a733-6d0e-4846-9522-9621caf49b65

Is there any change to the Exoscale API and does it require some fix to the Cluster Autoscaler implementation by Exoscale code owners in regards to get it working again?

I'm willing to provide you more details if requested, at this moment I don't know what I should give you to help with this issue.

@pierre-emmanuelJ @7fELF @PhilippeChepy

mpalu avatar Apr 06 '22 21:04 mpalu

Looks like there is a forked project that fixes this issue and a PR submitted

https://github.com/exoscale/autoscaler-1/tree/sks/cluster-autoscaler/cloudprovider/exoscale

https://github.com/kubernetes/autoscaler/pull/4247

mpalu avatar Apr 07 '22 14:04 mpalu

The PR in regards to SKS Nodepools was merged: https://github.com/kubernetes/autoscaler/pull/4247 So this can be closed

Sapd avatar Apr 29 '22 12:04 Sapd

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Jul 28 '22 13:07 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Aug 27 '22 14:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Sep 26 '22 14:09 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Sep 26 '22 14:09 k8s-ci-robot