[Feature][ResourceOversubscriptionManager] Improving resource oversubscription handling in Apache DolphinScheduler
Search before asking
- [x] I had searched in the issues and found no similar feature requirement.
Description
As part of an educational research project at ITMO University, we aim to investigate how open-source schedulers, specifically Apache DolphinScheduler (DS), handle resource oversubscription. Oversubscription — allocating more tasks than available physical or logical resources — can increase utilization and reduce costs but often leads to performance degradation, instability, or SLA violations for critical workloads. The project will focus on identifying technical gaps in DS and proposing mechanisms to manage oversubscription safely, including metrics, scheduling policies, prioritization, and throttling strategies.
Use case
A DS cluster runs multiple concurrent workflows, temporarily exceeding available CPU, memory, or I/O resources. Without proper control, worker nodes may become overloaded, task queues grow, and critical tasks may fail or be delayed. The research project will explore potential solutions such as: Prioritizing critical workflows under oversubscription. Implementing back-pressure or throttling mechanisms. Adding observability and metrics for oversubscription states. Testing and simulating scenarios to evaluate improvements in throughput, latency, and stability.
Related issues
No response
Are you willing to submit a PR?
- [x] Yes I am willing to submit a PR!
Code of Conduct
- [x] I agree to follow this project's Code of Conduct
Please provide the actual production problems you want to solve and the detailed design scheme. I don't understand what this issue wants to do.
In production environments, multiple workflows often reserve more resources (CPU, RAM) than they actually use. For example, several tasks each declare 8 GB RAM but only consume 1–2 GB on average. As a result, cluster utilization stays low even though no new workflows can be scheduled — because declared resources exceed physical capacity. To improve efficiency, we can apply controlled resource oversubscription: temporarily allocating more logical resources than physically available, based on real usage metrics. However, DolphinScheduler currently lacks mechanisms to monitor real-time utilization or to manage safe oversubscription without risking node overload or instability.
Key Components: ResourceMonitorAnalyzer — processes real-time CPU and memory data already reported by worker heartbeats and monitoring controllers. OversubscriptionController — calculates oversubscription ratio and decides whether to allow or delay task dispatch. PolicyEngine — defines prioritization and throttling rules under oversubscription. MetricsReporter — exports oversubscription metrics to existing metrics framework (Prometheus, REST API).
Workflow 1Each worker periodically reports actual resource usage (usedCPU, usedMemory). 2 The OversubscriptionController calculates: oversubscription_ratio = (allocated_resources / physical_resources) utilization_rate = (used_resources / physical_resources) If utilization_rate < threshold (e.g., 60%), new tasks can be accepted even if allocated > 100%. If utilization_rate > safety limit (e.g., 90%), controller triggers back-pressure and suspends new task dispatch. Tasks can be prioritized based on workflow class (CRITICAL > NORMAL > BEST_EFFORT) Configuration Parameters maxOversubscriptionFactor (Maximum ratio of allocated to physical resources allowed ) 1.5 lowUtilizationThreshold (CPU/memory usage below which oversubscription is safe) 60% highUtilizationThreshold ( Utilization above which task submission is throttled) 90% priorityMode ( Workflow scheduling priority mode ) NORMAL
Server load protection was implemented in a long time ago. And prometheus metrics is also implemented in version 3.X. Which version are you using?
Can i write loadbalancer with CPU and ThreadPool Oversubscription ? In DynamicWeightedRoundRobinWorkerLoadBalancer we have weights for each worker and it is never more than 100 ! What if make possible weight exceed that value ? Tasks may take longer individually, but the overall throughput for a group of tasks could improve, especially when some resources are underutilized Memory oversubscription may be risky; this feature is intended for CPU and ThreadPool only.
private double calculateWeight(WorkerServerMetadata server) {
double load =
dynamicWeightConfigProperties.getCpuUsageWeight() * server.getCpuUsage()
+ dynamicWeightConfigProperties.getMemoryUsageWeight() * server.getMemoryUsage()
+ dynamicWeightConfigProperties.getTaskThreadPoolUsageWeight()
* server.getTaskThreadPoolUsage();
load = load / 3.0;
double osFactor = dynamicWeightConfigProperties.getOversubscriptionFactor();
double maxWeight = 100 * osFactor;
double weight = maxWeight - load;
return Math.max(weight, 0.0);
}
@kito4 Do you mean setting a parameter to prevent workers from reducing their potential load? In my view, this may not necessarily be genuinely useful and could lead to more complex configurations. It would be difficult for us to set an effective parameter.
@kito4 Do you mean setting a parameter to prevent workers from reducing their potential load? In my view, this may not necessarily be genuinely useful and could lead to more complex configurations. It would be difficult for us to set an effective parameter.
+1