volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Support reclaim/preempt gang job which allocated resources equal or lesser than MinAvailable

Open nkflash opened this issue 3 years ago • 6 comments

@Thor-wl please check this design and implementation

nkflash avatar Jul 05 '22 07:07 nkflash

Welcome @nkflash!

It looks like this is your first PR to volcano-sh/volcano 馃帀.

Thank you, and welcome to Volcano. :smiley:

volcano-sh-bot avatar Jul 05 '22 07:07 volcano-sh-bot

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign william-wang You can assign the PR to them by writing /assign @william-wang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

volcano-sh-bot avatar Jul 05 '22 07:07 volcano-sh-bot

fix https://github.com/volcano-sh/volcano/issues/2321

nkflash avatar Jul 05 '22 07:07 nkflash

Thanks for your optimiztion! Here are some suggestions as follows.

  • Please separate the design doc from the implementation in 2 PRs. As the classical process of contribution, the community will review the desgin first. Once it is settled, the implementation follows.
  • Perhaps we can add these design as part of preemption and reclaim desgin doc instead of an individual doc. In general, a new feature comes with a new desgin doc.
  • It seems a little rough to just enbale preemption and reclaim to evict pods from other jobs. A vicitim job with incomplete roles may not work well. For example, a victim spark job with only part of executor left. I think we should give more details about the evict strategy. For example, once a job is selected as victim, all of the pods or elastic pods must be evicted.
  • Perhaps we can refer to the elastic scheduler and TDM plugin as reference. This optimization can be implemented in the consideration with the 2 features mentioned above.
  • Please execute git commit --signoff xxx to add personal signature. It is required by CI.
  • Please execute go fmt xxx to make code verification happy.

Got it, I will reply each comment later.

nkflash avatar Jul 07 '22 01:07 nkflash

@nkflash Please follow the code of conduct and format the code to pass the CI. Thanks.

Thor-wl avatar Aug 12 '22 06:08 Thor-wl

@nkflash Please follow the code of conduct and format the code to pass the CI. Thanks.

Sorry, I am busy with some other workload. I will finish this later.

nkflash avatar Aug 16 '22 02:08 nkflash

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Jan 05 '23 02:01 stale[bot]

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale[bot] avatar Mar 14 '23 13:03 stale[bot]