volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Need to refactor the reclaim action

Open JesseStutler opened this issue 1 year ago • 8 comments

Please describe your problem in detail

I'm ready to add preemptionPolicy related logic in reclaim action, but when I'm confused that when the task's preemptionPolicy is Never, do I need to push back the job and queue, continue allowing other tasks or jobs to reclaim resources. You can see that in the reclaim action, https://github.com/volcano-sh/volcano/blob/0843c0d33fccdb85439dc7086dd7cea061070901/pkg/scheduler/actions/reclaim/reclaim.go#L86-L220, reclaim firstly pop a queue and a job, but at line 110-112, 116-118, 121-124,126-129,https://github.com/volcano-sh/volcano/blob/0843c0d33fccdb85439dc7086dd7cea061070901/pkg/scheduler/actions/reclaim/reclaim.go#L110-L112 https://github.com/volcano-sh/volcano/blob/0843c0d33fccdb85439dc7086dd7cea061070901/pkg/scheduler/actions/reclaim/reclaim.go#L116-L119 https://github.com/volcano-sh/volcano/blob/0843c0d33fccdb85439dc7086dd7cea061070901/pkg/scheduler/actions/reclaim/reclaim.go#L121-L124 https://github.com/volcano-sh/volcano/blob/0843c0d33fccdb85439dc7086dd7cea061070901/pkg/scheduler/actions/reclaim/reclaim.go#L126-L129 If the task fails to filter in allocatable, Preemptive, PrePredicateFn, the queue will never be pushed back, but whether if other tasks in same jobs or other jobs in same queue can reclaim resources, I'm little bit confused about the logic here, I think when I need to implement the preemptionPolicy, there is need to push the job and queue back to allow others to continue reclaiming.

You can also see the logic in allocate: https://github.com/volcano-sh/volcano/blob/0843c0d33fccdb85439dc7086dd7cea061070901/pkg/scheduler/actions/allocate/allocate.go#L192-L199, at line 192, it wraps with !tasks.Empty() loop, so if the task fails to filter in allocatable, it's reasonable to continue here, allow other task to continue allocate.


9.20 updated: After discussing with @Monokaix @hwdef @lowang-bh , we think there are some problems in reclaim action, need to refactor the reclaim action

Any other relevant information

No response

JesseStutler avatar Sep 19 '24 12:09 JesseStutler

/cc

googs1025 avatar Sep 19 '24 15:09 googs1025

It's a good catch!

Monokaix avatar Sep 20 '24 09:09 Monokaix

/area scheduling /good-first-issue

JesseStutler avatar Jan 06 '25 08:01 JesseStutler

@JesseStutler: This request has been marked as suitable for new contributors.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-good-first-issue command.

In response to this:

/area scheduling /good-first-issue

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

volcano-sh-bot avatar Jan 06 '25 08:01 volcano-sh-bot

/assign

elysium-w avatar Jan 23 '25 11:01 elysium-w

/priority high

Since lots of users have used reclaim action and met problem, we need to refactor in this version

JesseStutler avatar Oct 25 '25 09:10 JesseStutler

/priority high

Since lots of users have used reclaim action and met problem, we need to refactor in this version

Strong support

hwdef avatar Oct 27 '25 02:10 hwdef

/assgin @guoqinwill

JesseStutler avatar Nov 13 '25 07:11 JesseStutler