autoscaler
autoscaler copied to clipboard
Hetzner Autoscaler does not work with v1.29.0 interaction with ephemeral information
Which component are you using?: cluster-autoscaler
What version of the component are you using?: 1.29.0
Component version:
What k8s version are you using (kubectl version
)?:
kubectl version
Output
$ kubectl versionClient Version: v1.28.4 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.29.1+k3s2
What environment is this in?: hetzner cloud
What did you expect to happen?:
autoscaling should scale up
What happened instead?:
It does not scale up.
How to reproduce it (as minimally and precisely as possible):
I am using kube-hetzner. You can create a hetzner kubernetes cluster with it
module "kube-hetzner" {
# ...
source = "kube-hetzner/kube-hetzner/hcloud"
version = "2.12.2"
cluster_autoscaler_image = "registry.k8s.io/autoscaling/cluster-autoscaler"
cluster_autoscaler_version = "v1.29.0"
initial_k3s_channel = "v1.29"
autoscaler_nodepools = [
{
name = "ca-group"
server_type = "cax41"
location = "nbg1"
min_nodes = 0
max_nodes = 6
}
]
# ...
}
Then you need to put some stuff onto the cluster and inspect the autoscaler logs - it is not autoscaling.
Anything else we need to know?:
The important logs are
I0226 11:32:54.763746 1 klogx.go:87] Pod mypod is unschedulable
I0226 11:32:54.763783 1 orchestrator.go:108] Upcoming 0 nodes
I0226 11:32:54.763806 1 orchestrator.go:440] Skipping node group draining-node-pool - max size reached
E0226 11:32:54.763812 1 orchestrator.go:446] Couldn't get autoscaling options for ng: NAME_OF_AUTOSCALING_GROUP
I0226 11:32:54.763861 1 orchestrator.go:542] Pod mypod/mynamespace can't be scheduled on NAME_OF_AUTOSCALING_GROUP, predicate checking error: Insufficient ephemeral-storage; predicateName=NodeResourcesFit; reasons: Insufficient ephemeral-storage; debugInfo=
I0226 11:32:54.763903 1 orchestrator.go:150] No pod can fit to NAME_OF_AUTOSCALING_GROUP
I0226 11:32:54.763913 1 orchestrator.go:164] No expansion options
This occurs when the autoscaling group is of size 0 and tries to scale up. Apparently, the group somehow is regarded to have not enough ephemeral storage. This information is wrong. I was requesting 120GB ephemeral storage, and the cax41
nodes have 320GB ephemeral storage. So I guess there some bug about comparing ephemeral storage.
/area provider/hetzner
This was fixed by https://github.com/kubernetes/autoscaler/pull/6574. I will open backport PRs for the active releases.
Backported to all current branches:
- #6673
- #6674
- #6675
All backports are merged and should be included in the next monthly patch releases.
/close
@apricote: Closing this issue.
In response to this:
All backports are merged and should be included in the next monthly patch releases.
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.