volcano
volcano copied to clipboard
Support reclaim/preempt gang job which allocated resources equal or lesser than MinAvailable
@Thor-wl please check this design and implementation
Welcome @nkflash!
It looks like this is your first PR to volcano-sh/volcano 馃帀.
Thank you, and welcome to Volcano. :smiley:
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by:
To complete the pull request process, please assign william-wang
You can assign the PR to them by writing /assign @william-wang in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
fix https://github.com/volcano-sh/volcano/issues/2321
Thanks for your optimiztion! Here are some suggestions as follows.
- Please separate the design doc from the implementation in 2 PRs. As the classical process of contribution, the community will review the desgin first. Once it is settled, the implementation follows.
- Perhaps we can add these design as part of
preemptionandreclaimdesgin doc instead of an individual doc. In general, a new feature comes with a new desgin doc.- It seems a little rough to just enbale preemption and reclaim to evict pods from other jobs. A vicitim job with incomplete roles may not work well. For example, a victim spark job with only part of
executorleft. I think we should give more details about the evict strategy. For example, once a job is selected as victim, all of the pods or elastic pods must be evicted.- Perhaps we can refer to the elastic scheduler and TDM plugin as reference. This optimization can be implemented in the consideration with the 2 features mentioned above.
- Please execute
git commit --signoff xxxto add personal signature. It is required by CI.- Please execute
go fmt xxxto make code verification happy.
Got it, I will reply each comment later.
@nkflash Please follow the code of conduct and format the code to pass the CI. Thanks.
@nkflash Please follow the code of conduct and format the code to pass the CI. Thanks.
Sorry, I am busy with some other workload. I will finish this later.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.