autoscaler icon indicating copy to clipboard operation
autoscaler copied to clipboard

Autoscaler not scaling down pods when`kind` is of the resource is same between controllers

Open shamil opened this issue 1 year ago • 4 comments

  • Which component are you using?: cluster-autoscaler
  • What version of the component are you using?: 1.25.0
  • What k8s version are you using?: v1.25.10
  • What environment is this in?: kops on AWS

We are using OpenKruise and Advanced DaemonSet. Autoscaler seems like detects it as regular DaemonSet and trying to find the corresponding DaemonSet for the workload, and fails. This prevents from scaling down to occur.

I'm not sure whether it's the root cause or not, but I suspect that Autoscaler doesn't respect the API group. The pods created by Advanced DaemonSet have the following ownerReferenses:

  ownerReferences:
  - apiVersion: apps.kruise.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: DaemonSet
    name: example-advanced-daemonset
    uid: 6f0f4fa8-2694-44df-9e68-8f00630f19c1

The kind is DaemonSet, but the apiVersion is apps.kruise.io/v1alpha1. So might be that the apiVersion is being ignored, and autoscaler just looks for regular DaemonSet named example-advanced-daemonset, which obviously doesn't exist.

Here are relevant logs from autoscaler:

aws-cluster-autoscaler I0724 08:09:58.813598       1 cluster.go:170] node i-0bb670250f677c0c0 cannot be removed: daemonset for devops/example-advanced-daemonset-x44c5 is not present, err: daemonset.apps "example-advanced-daemonset" not found

Can someone advice if this expected that the apiVersion is ignored or not being respected, am I missing something?

shamil avatar Jul 24 '23 09:07 shamil

Your suspicion seems correct to me (not an expert, I just stumbled over this for other reasons):

https://github.com/kubernetes/autoscaler/blob/f9a7c7f73facc4baba7189bc6ab5c4e0e77cfee1/cluster-autoscaler/utils/drain/drain.go#L187-L197

pohly avatar Sep 20 '23 09:09 pohly

haha, you need to write a custom codes to avoid this situation

songminglong avatar Oct 07 '23 12:10 songminglong

Maybe higher version of ‘skipNodesWithCustomControllerPods’ option is useful.

daimaxiaxie avatar Nov 28 '23 07:11 daimaxiaxie

This won't help, the problem isn't with a custom controller, it's with a controller with the same kind but different group, and since the autoscaler only looks at the kind and fails to find it, it blocks scale down

hagaibarel avatar Nov 30 '23 05:11 hagaibarel