autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

Hetzner Autoscaler does not work with v1.29.0 interaction with ephemeral information

Open schlichtanders opened this issue 4 months ago • 1 comments

Which component are you using?: cluster-autoscaler

What version of the component are you using?: 1.29.0

Component version:

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version

Client Version: v1.28.4 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.1+k3s2

What environment is this in?: hetzner cloud

What did you expect to happen?:

autoscaling should scale up

What happened instead?:

It does not scale up.

How to reproduce it (as minimally and precisely as possible):

I am using kube-hetzner. You can create a hetzner kubernetes cluster with it

module "kube-hetzner" {
  # ...
  source = "kube-hetzner/kube-hetzner/hcloud"
  version = "2.12.2"
  cluster_autoscaler_image = "registry.k8s.io/autoscaling/cluster-autoscaler"
  cluster_autoscaler_version = "v1.29.0"
  initial_k3s_channel = "v1.29"

  autoscaler_nodepools = [
    {
      name        = "ca-group"
      server_type = "cax41"
      location    = "nbg1"
      min_nodes   = 0 
      max_nodes   = 6
    }
  ]
  # ...
}

Then you need to put some stuff onto the cluster and inspect the autoscaler logs - it is not autoscaling.

Anything else we need to know?:

The important logs are

I0226 11:32:54.763746       1 klogx.go:87] Pod mypod is unschedulable
I0226 11:32:54.763783       1 orchestrator.go:108] Upcoming 0 nodes
I0226 11:32:54.763806       1 orchestrator.go:440] Skipping node group draining-node-pool - max size reached
E0226 11:32:54.763812       1 orchestrator.go:446] Couldn't get autoscaling options for ng: NAME_OF_AUTOSCALING_GROUP
I0226 11:32:54.763861       1 orchestrator.go:542] Pod mypod/mynamespace can't be scheduled on NAME_OF_AUTOSCALING_GROUP, predicate checking error: Insufficient ephemeral-storage; predicateName=NodeResourcesFit; reasons: Insufficient ephemeral-storage; debugInfo=
I0226 11:32:54.763903       1 orchestrator.go:150] No pod can fit to NAME_OF_AUTOSCALING_GROUP
I0226 11:32:54.763913       1 orchestrator.go:164] No expansion options

This occurs when the autoscaling group is of size 0 and tries to scale up. Apparently, the group somehow is regarded to have not enough ephemeral storage. This information is wrong. I was requesting 120GB ephemeral storage, and the cax41 nodes have 320GB ephemeral storage. So I guess there some bug about comparing ephemeral storage.

schlichtanders avatar Feb 26 '24 12:02 schlichtanders

/area provider/hetzner

Shubham82 avatar Mar 05 '24 07:03 Shubham82

This was fixed by https://github.com/kubernetes/autoscaler/pull/6574. I will open backport PRs for the active releases.

apricote avatar Apr 02 '24 05:04 apricote

Backported to all current branches:

  • #6673
  • #6674
  • #6675

apricote avatar Apr 02 '24 05:04 apricote

All backports are merged and should be included in the next monthly patch releases.

/close

apricote avatar Apr 02 '24 06:04 apricote

@apricote: Closing this issue.

In response to this:

All backports are merged and should be included in the next monthly patch releases.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Apr 02 '24 06:04 k8s-ci-robot