volcano [Proposal] Add queue-level-scheduling-policy

What type of PR is this?

/kind documentation /area scheduling /area controllers

What this PR does / why we need it:

Queue-level-scheduling-policy from a LFX'25 issue:https://github.com/volcano-sh/volcano/issues/3992, which requires the volcano to support setting and using different scheduling policies at the queue level instead of using a globally unified scheduling policy.

Which issue(s) this PR fixes:

Fixes https://github.com/volcano-sh/volcano/issues/3992

Special notes for your reviewer:

@Monokaix @JesseStutler

Does this PR introduce a user-facing change?

Mar 31 '25 07:03 ElectricFish7

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign hwdef You can assign the PR to them by writing /assign @hwdef in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

Mar 31 '25 07:03 volcano-sh-bot

I’ve updated the proposal, and I think it should support both queue-level and job-level scheduling. cc @Monokaix @JesseStutler

Apr 15 '25 00:04 ElectricFish7

@hwdef @kingeasternsun Also supported the job level scheduling, please take a look.

Apr 15 '25 01:04 Monokaix

Overall it's ok, but I prefer cr to configmap

I think we should modify a somewhat annoying legacy problem, that is, if the parsing configuration fails, the default configuration will be used. This is implicit. I hope that the reading configuration fails and panics directly

Apr 16 '25 03:04 hwdef

Overall it's ok, but I prefer cr to configmap

I think we should modify a somewhat annoying legacy problem, that is, if the parsing configuration fails, the default configuration will be used. This is implicit. I hope that the reading configuration fails and panics directly

That's a good point.

Apr 16 '25 08:04 Monokaix

Overall it's ok, but I prefer cr to configmap

I think we should modify a somewhat annoying legacy problem, that is, if the parsing configuration fails, the default configuration will be used. This is implicit. I hope that the reading configuration fails and panics directly

The problem of cr is that it's not consistent with current global config which is in configMap.

Apr 16 '25 08:04 Monokaix

Overall, we have three approaches to mount scheduling policies:

Using a single ConfigMap — This is similar to the implementation in the current proposal. The advantage of this method is its simplicity and consistency with the default scheduler configuration file. However, it poses a risk where users may unintentionally (or intentionally) modify scheduling policies that do not belong to them.
Using one ConfigMap per scheduling policy — This approach prevents users from modifying other users’ scheduling policies, offering better isolation. However, it could lead to a large number of ConfigMaps, making file mounting and scheduler access more complex.
Using a Custom Resource Definition (CRD) — This method provides better structure and flexibility, but its usage differs from the default scheduler configuration file, potentially increasing the learning curve or setup complexity.

Apr 16 '25 08:04 ElectricFish7

Overall, we have three approaches to mount scheduling policies:

Using a single ConfigMap — This is similar to the implementation in the current proposal. The advantage of this method is its simplicity and consistency with the default scheduler configuration file. However, it poses a risk where users may unintentionally (or intentionally) modify scheduling policies that do not belong to them.

Using one ConfigMap per scheduling policy — This approach prevents users from modifying other users’ scheduling policies, offering better isolation. However, it could lead to a large number of ConfigMaps, making file mounting and scheduler access more complex.

Using a Custom Resource Definition (CRD) — This method provides better structure and flexibility, but its usage differs from the default scheduler configuration file, potentially increasing the learning curve or setup complexity.

If crd is not accepted, the second one is also good

Apr 16 '25 10:04 hwdef

I have updated the design document and user examples. PTAL @Monokaix @JesseStutler

May 21 '25 08:05 ElectricFish7

I suggest that we should add some explanations for queue-level schedule policy: For example, when users previously used cluster-level ( which refers to the previous schedule policy mode ), if binpack is used, the effect that tasks in the cluster will preferentially fill the nodes of existing tasks can be achieved, which can reduce resource fragmentation. However, in queue-level mode, if queue A is configured with binpack, queue B is configured with nodeorder. That standing in the cluster perspective, in fact, did not play a better role in reducing resource fragmentation ; from the queue perspective, the tasks in queue A are preferentially assigned to the nodes with tasks, and the tasks may be more compact. The tasks in queue B are evenly distributed in the cluster according to the node situation.

We 'd better allow users to have a correct expectation of the impact of queue configuration (in queue-level mode) , help them to configure queue-level schedule policy more reasonably, and improve their experience of using volcano.

In addition, in queue-level mode, if queue A uses proportion but queue B uses capacity, will there be a conflict in resource management ?

Jun 21 '25 06:06 XbaoWu

I suggest that we should add some explanations for queue-level schedule policy: For example, when users previously used cluster-level ( which refers to the previous schedule policy mode ), if binpack is used, the effect that tasks in the cluster will preferentially fill the nodes of existing tasks can be achieved, which can reduce resource fragmentation. However, in queue-level mode, if queue A is configured with binpack, queue B is configured with spread. That standing in the cluster perspective, in fact, did not play a better role in reducing resource fragmentation ; from the queue perspective, the tasks in queue A are preferentially assigned to the nodes with tasks, and the tasks may be more compact. The tasks in queue B are evenly distributed in the cluster according to the node situation.

We 'd better allow users to have a correct expectation of the impact of queue configuration (in queue-level mode) , help them to configure queue-level schedule policy more reasonably, and improve their experience of using volcano.

In addition, in queue-level mode, if queue A uses proportion but queue B uses capacity, will there be a conflict in resource management ?

I think queue capacity management plugins such as proportion and capacity should be configured at the global level?

Jun 21 '25 07:06 ElectricFish7

I think queue capacity management plugins such as proportion and capacity should be configured at the global level?

I think this is a solution, we may need to simply classify the existing plugins, which can only be applied globally, which can be configured at the queue-level. In order to be compatible with the configuration of the previous version of the users, queue-level schedule policy should be based on default schedule policy when encountering a global plugin.

Jun 21 '25 12:06 XbaoWu