volcano icon indicating copy to clipboard operation
volcano copied to clipboard

fix calculations of podgroup min resource

Open lowang-bh opened this issue 2 years ago • 6 comments

commit 1: refact jobinfo's calculation to a function. commit 2: fix cal podgroup min resource and add testcase. relative design docs: docs about job's min resource #2945 commit 3: when jobMinAvailable < totalTask's, keep the origin logic of calculate podgorup minResource: sum up first jobMinAvailable

Fix https://github.com/volcano-sh/volcano/issues/2921 also.

lowang-bh avatar Aug 15 '23 14:08 lowang-bh

/assign @wangyang0616 @hwdef @Yikun @Thor-wl @william-wang

Hi, all guys, would you please have a look at this pr and let us discuss the remaining case (jobMinAvailable < sumup(taskMinAvailable)), eg, jobMinAvailable=2, totalTaskMinAvailable=3, we should sum up at most 2 or 3 member's resource as the job's min resource?

I have an idea is to validate the jobMinAvailable and taskTotalMinAvailable. Now there is only validations about jobMinAvailable vs totalReplicas.

But there is also another thing need to be noted: task is allocated from high priorith to low priority in allocate action, utill all jobMinAvaiable tasks allocated, then it is ready to commit. Consider this scenario: queue with capacity 2C, job with jobMinAvailable=2 has two master, each requst 1c, and taskMinAvailable =1, two workers, each requst 0.5c and taskMinAvailable=2. The idea allocation process is 1master and then 2 workers. But current allocation process is 2master (high priority) allocated first and used up queue capacity and then job can not be ready. (maybe we need to enqueue task which is min available to tasks queue)

https://github.com/volcano-sh/volcano/blob/8d8b6912f2922ae40e4594262d38d1c2f68c40e9/pkg/scheduler/api/job_info.go#L758-L763

lowang-bh avatar Aug 15 '23 14:08 lowang-bh

trigger CI

lowang-bh avatar Aug 16 '23 00:08 lowang-bh

/assign @k82cn @kevin-wangzefeng

lowang-bh avatar Aug 27 '23 05:08 lowang-bh

/assign @Monokaix

lowang-bh avatar Dec 14 '23 14:12 lowang-bh

Hi,please resolve code conflict.

Monokaix avatar Feb 01 '24 06:02 Monokaix

/lgtm

Monokaix avatar Jun 06 '24 01:06 Monokaix

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: william-wang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

volcano-sh-bot avatar Jun 13 '24 12:06 volcano-sh-bot