descheduler
descheduler copied to clipboard
Cluster Size check shouldn't exit process when running as Deployment
Is your feature request related to a problem? Please describe.
When there are 0 or 1 nodes the descheduler loop returns error the cluster size is 0 or 1
& the process exits.
This doesn't play nicely when the descheduler is running as a Deployment - the pod goes into CrashLoopBackOff
due to the repeated early exits.
Describe the solution you'd like
When the descheduler is running as a Deployment the cluster size is 0 or 1
check shouldn't exit the process. The process should remain running until the next iteration.
Something like (in runDeschedulerLoop
):
// if len is still <= 1 error out
if len(nodes) <= 1 {
klog.V(1).InfoS("The cluster size is 0 or 1 meaning eviction causes service disruption or degradation. So aborting..")
if d.rs.DeschedulingInterval.Seconds() == 0 {
return fmt.Errorf("the cluster size is 0 or 1")
} else {
return nil
}
}
Describe alternatives you've considered
What version of descheduler are you using?
descheduler version:
0.28.0
Additional context
Example logs
I1124 16:00:06.155189 1 node.go:50] "Node lister returned empty list, now fetch directly"
I1124 16:00:06.160342 1 descheduler.go:121] "The cluster size is 0 or 1 meaning eviction causes service disruption or degradation. So aborting.."
E1124 16:00:06.160453 1 descheduler.go:431] the cluster size is 0 or 1
I1124 16:00:06.160876 1 reflector.go:295] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:150
I1124 16:00:06.160913 1 reflector.go:295] Stopping reflector *v1.PriorityClass (0s) from k8s.io/client-go/informers/factory.go:150
I1124 16:00:06.161268 1 secure_serving.go:255] Stopped listening on [::]:10258
I1124 16:00:06.161288 1 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
I1124 16:00:06.161487 1 reflector.go:295] Stopping reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:150
I1124 16:00:06.161529 1 reflector.go:295] Stopping reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:150
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
...
Normal Pulled 50m (x4 over 51m) kubelet Container image "registry.k8s.io/descheduler/descheduler:v0.28.0" already present on machine
Warning BackOff 100s (x238 over 51m) kubelet Back-off restarting failed container