kueue
kueue copied to clipboard
Dynamic Job Parallelism and Resource Scaling Based on Backlog Metrics
What would you like to be added:
We would like to propose a new feature in Kueue that enables dynamic scaling of job parallelism and resource allocation (CPU, RAM, and pods) based on job backlog metrics and predefined formulas.
Idea: This feature would introduce a custom resource definition (CRD) that allows users to define scaling formulas and thresholds, which dynamically adjust the maximum parallelism and resource limits, similar to KEDA or HPA. A generic approach could be the exposing of the /scale
subresource to have a generic interface.
Why is this needed:
Currently, we are processing around 4.5 million jobs per day, and managing resource usage and costs is critical. There is a need for a mechanism that can dynamically limit or expand the maximum parallelism of jobs based on real-time backlog conditions. This would help ensure that jobs are processed efficiently without overcommitting resources or incurring unnecessary costs.
By introducing a formula-based approach to flavor resources, we can achieve a more granular and responsive system. For example, the system could increase the max CPU or RAM allocation as the admission backlog grows, ensuring that delays are minimized during high-load periods while conserving resources during low-demand times. This functionality is crucial for maintaining both performance and cost-effectiveness in large-scale Kubernetes environments.
This enhancement requires the following artifacts:
- [ ] Design doc
- [ ] API change
- [ ] Docs update
The artifacts should be linked in subsequent comments.