descheduler Cluster Size check shouldn't exit process when running as Deployment

Cluster Size check shouldn't exit process when running as Deployment

Open markdingram opened this issue 1 year ago • 7 comments

Is your feature request related to a problem? Please describe.

When there are 0 or 1 nodes the descheduler loop returns error the cluster size is 0 or 1 & the process exits.

This doesn't play nicely when the descheduler is running as a Deployment - the pod goes into CrashLoopBackOff due to the repeated early exits.

Describe the solution you'd like

When the descheduler is running as a Deployment the cluster size is 0 or 1 check shouldn't exit the process. The process should remain running until the next iteration.

Something like (in runDeschedulerLoop):

	// if len is still <= 1 error out
	if len(nodes) <= 1 {
		klog.V(1).InfoS("The cluster size is 0 or 1 meaning eviction causes service disruption or degradation. So aborting..")
		if d.rs.DeschedulingInterval.Seconds() == 0 {
			return fmt.Errorf("the cluster size is 0 or 1")
		} else {
			return nil
		}
	}

Describe alternatives you've considered

What version of descheduler are you using?

descheduler version:

0.28.0

Additional context

Example logs


I1124 16:00:06.155189       1 node.go:50] "Node lister returned empty list, now fetch directly"
I1124 16:00:06.160342       1 descheduler.go:121] "The cluster size is 0 or 1 meaning eviction causes service disruption or degradation. So aborting.."
E1124 16:00:06.160453       1 descheduler.go:431] the cluster size is 0 or 1
I1124 16:00:06.160876       1 reflector.go:295] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:150
I1124 16:00:06.160913       1 reflector.go:295] Stopping reflector *v1.PriorityClass (0s) from k8s.io/client-go/informers/factory.go:150
I1124 16:00:06.161268       1 secure_serving.go:255] Stopped listening on [::]:10258
I1124 16:00:06.161288       1 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
I1124 16:00:06.161487       1 reflector.go:295] Stopping reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:150
I1124 16:00:06.161529       1 reflector.go:295] Stopping reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:150

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
...
  Normal   Pulled     50m (x4 over 51m)     kubelet            Container image "registry.k8s.io/descheduler/descheduler:v0.28.0" already present on machine
  Warning  BackOff    100s (x238 over 51m)  kubelet            Back-off restarting failed container

Nov 24 '23 16:11 markdingram

descheduler descheduler copied to clipboard

Cluster Size check shouldn't exit process when running as Deployment

descheduler
descheduler copied to clipboard