autoscaler
autoscaler copied to clipboard
Node is terminated to early when scale-down-unneeded-time is set to 10m
Which component are you using?: cluster-autoscaler
What version of the component are you using?: 1.27.1 / Chart 9.29.0
Component version:
What k8s version are you using (kubectl version
)?:
"v1.24.14-eks-c12679a
What environment is this in?:
AWS EKS
What did you expect to happen?: Node is terminated only after 10 minutes, after it has been marked as no longer needed
What happened instead?: Node is terminated earlier than expected, e.g. after 2 minuted
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
In our cluster we are running Jenkins with K8s agents. Sometimes we have jobs which have no resource consumption as they are waiting for other jobs or just doing some things which has low resource consumption. We monitored this a long time and figured out, that a value of 0.06
for scale-down-utilization-threshold
is working good for us as a node which has nothing todo has a value of 0.053.. . In cases where a pod is scheduled which is "just running", we have this utilisation as well and it happens, that the node got a marker as unneeded. In some cases these nodes are terminated after less than 10 minutes although 10 minutes waiting time is configured.
One example:
I0712 06:30:07.697717 1 klogx.go:87] Node ip-172-25-12-34.eu-central-1.compute.internal - cpu utilization 0.053729
I0712 06:30:07.697837 1 cluster.go:155] ip-172-25-16-25.eu-central-1.compute.internal for removal
I0712 06:31:49.738129 1 nodes.go:126] ip-172-25-12-34.eu-central-1.compute.internal was unneeded for 1m42.382742246s
After last line there is no newer information like "node termianted" or something. It is just gone.
CA is configured as followed:
skip-nodes-with-local-storage: true
skip-nodes-with-custom-controller-pods: true
cordon-node-before-terminating: true
scale-down-utilization-threshold: 0.06
scan-interval: 10s
scale-down-unneeded-time: 10m
skip-nodes-with-system-pods: true
max-empty-bulk-delete: 2