eraser
eraser copied to clipboard
Run jobs in waves instead of starting on all nodes at once
Describe the solution you'd like Consider the following situation:
- A cluster is running 600 nodes, each node runs about 30 pods
- They scale based on # of pods per node, because this particular CNI is hungry with regard to reservation of IP addresses
- If Eraser gets scheduled on a node, and is in the right priority class, it will evict something else
- If eraser runs on all nodes (in their case 600) at the same time, it is a large disruption to the system (600 nodes all evicting some pod all at one time)
A potential solution is to disturb only a part of the system by running the job in phases. Say, we run the job on ceil(nodes / 5.0) nodes at a time, and stop when all nodes have completed a run.
Anything else you would like to add: It definitely complicates the logic
Environment:
- Eraser version: v1.0.0-beta.3
- Kubernetes version: (use
kubectl version
):