autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

Autoscaler 1.25 or later: If a node fails to be deleted, the lastScaleDownFailTime is not refresh.

Open yaohuatj opened this issue 1 year ago • 7 comments

Which component are you using?:

cluster-autoscaler

What version of the component are you using?: autoscaler 1.25

cluster-autoscaler-1.25.0

Component version:

What k8s version are you using (kubectl version)?:

kubectl version Output
$ kubectl version

What environment is this in?: hws

What did you expect to happen?: If a node fails to be deleted, the lastScaleDownFailTime will refresh.

What happened instead?: image

If the go routine fails to delete a node, the error is not detected and the function still returns nil. Then the lastScaleDownFailTime is not refresh.

This indicates that the scale-down-delay-after-failure parameter does not take effect, but the scale-down-delay-after-delete parameter takes effect.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

yaohuatj avatar Nov 23 '23 08:11 yaohuatj

/assign

tarishij17 avatar Nov 26 '23 16:11 tarishij17

I don't see the attached function in here: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/scaledown/actuation/actuator.go#L96 Can you please help me locate which go routine is referred here?

tarishij17 avatar Nov 26 '23 17:11 tarishij17

I don't see the attached function in here: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/core/scaledown/actuation/actuator.go#L96 Can you please help me locate which go routine is referred here?

image these two function will never collect errors, even if the node fails to be deleted.

yaohuatj avatar Nov 27 '23 06:11 yaohuatj

/triage accepted

Shubham82 avatar Nov 28 '23 10:11 Shubham82

Hey @gjtempleton Can i work on this issue?

Bharadwajshivam28 avatar Feb 06 '24 22:02 Bharadwajshivam28