volcano icon indicating copy to clipboard operation
volcano copied to clipboard

NodeResourceFitPlus plugin

Open LY-today opened this issue 6 months ago • 12 comments

What is the problem you're trying to solve

Hi, volcano community, I have previously proposed a NodeResourceFit enhancement to the Koordinator community, and found that volcano also has the same problem. Can you evaluate whether NodeResourceFitPlus can be implemented in volcano? If you also think this strategy makes sense, I can contribute PR

Koordinator

Describe the solution you'd like

Same as above

Additional context

Same as above

LY-today avatar Jun 18 '25 02:06 LY-today

Image

LY-today avatar Jun 18 '25 02:06 LY-today

/cc

JesseStutler avatar Jun 18 '25 02:06 JesseStutler

@LY-today Does this ScarceResourceAvoidance help you? https://github.com/volcano-sh/volcano/issues/4244

JesseStutler avatar Jun 18 '25 06:06 JesseStutler

Add sra plugin for scheduling #4248 And there is already a PR you can review it

JesseStutler avatar Jun 18 '25 06:06 JesseStutler

@JesseStutler It seems that Kingsoft has also sensed this ScarceResourceAvoidance strategy. Are other students aware of nodeResourceFitPlus? Or do you think it is necessary to build this plug-in?

LY-today avatar Jun 18 '25 06:06 LY-today

What aspects cannot https://github.com/volcano-sh/volcano/pull/4248 solved and need to be enhanced?

JesseStutler avatar Jun 18 '25 06:06 JesseStutler

What aspects cannot #4248 solved and need to be enhanced?

It seems that it does not support specifying different aggregation or dispersion strategies for different resource types?

LY-today avatar Jun 18 '25 07:06 LY-today

/cc @XbaoWu What do you think? Do you also have this requirement?

JesseStutler avatar Jun 18 '25 09:06 JesseStutler

In AI scenarios, the training framework often has a single master and multiple workers. The master does not apply for GPUs, but the workers do. Therefore, we often transfer the master to the CPU resource pool. To prevent CPU hot spots, we will break them up. GPU tasks will be scheduled to the GPU resource pool. Due to the price of GPUs, we want to cluster the scheduling so that the GPU of the entire machine can be idle to the greatest extent, so that large model tasks have resources to schedule.

LY-today avatar Jun 18 '25 09:06 LY-today

I prefer to label the CPU pool nodes and use soft anti-affinity for the master task in the CPU pool. Others can also give some opinions to see if this is generic. @hwdef @lowang-bh @Monokaix @kingeasternsun @archlitchi @googs1025

JesseStutler avatar Jun 18 '25 09:06 JesseStutler

I prefer to label the CPU pool nodes and use soft anti-affinity for the master task in the CPU pool. Others can also give some opinions to see if this is generic. @hwdef @lowang-bh @Monokaix @kingeasternsun @archlitchi @googs1025

Well, you can comment more ideas. I would like to add that adding affinity can indeed solve the problem, but the user understanding cost is high, and the underlying operation and maintenance/troubleshooting cost is also high, so if it can be converged to the scheduler to complete this task, it seems to be a better solution.

LY-today avatar Jun 18 '25 09:06 LY-today

/cc @XbaoWu What do you think? Do you also have this requirement?

I am not currently involved in this scenario. I think this is a good scenario. Can we use the topologyKey to combine the sra policy to cover this scenario

XbaoWu avatar Jun 19 '25 02:06 XbaoWu

If I enable the NodeResourceFitPlus plugin, should I be able to disable the Binpack plugin, as NodeResourceFitPlus's capabilities already encompass those of Binpack plugin?

kingeasternsun avatar Jun 25 '25 02:06 kingeasternsun

I prefer to label the CPU pool nodes and use soft anti-affinity for the master task in the CPU pool. Others can also give some opinions to see if this is generic. @hwdef @lowang-bh @Monokaix @kingeasternsun @archlitchi @googs1025

Well, you can comment more ideas. I would like to add that adding affinity can indeed solve the problem, but the user understanding cost is high, and the underlying operation and maintenance/troubleshooting cost is also high, so if it can be converged to the scheduler to complete this task, it seems to be a better solution.

From a user experience perspective, this plugin delivers significant value.

kingeasternsun avatar Jul 08 '25 09:07 kingeasternsun

I prefer to label the CPU pool nodes and use soft anti-affinity for the master task in the CPU pool. Others can also give some opinions to see if this is generic.

I think volcano has a resource reservation for GPU in proportional or the ongoing sra.

lowang-bh avatar Jul 13 '25 13:07 lowang-bh

/close https://github.com/volcano-sh/volcano/pull/4391

Monokaix avatar Sep 19 '25 06:09 Monokaix

@Monokaix: Closing this issue.

In response to this:

/close https://github.com/volcano-sh/volcano/pull/4391

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

volcano-sh-bot avatar Sep 19 '25 06:09 volcano-sh-bot