volcano icon indicating copy to clipboard operation
volcano copied to clipboard

Support elastic training with pytorch

Open william-wang opened this issue 4 years ago • 4 comments

What would you like to be added:

Support elastic training with pytorch

Why is this needed:

Use elastic training to improve the utilization of GPU.

Tasks

  • [x] allow elastic job to be enqueued in case of resource shortage Implementation: #2173
  • [x] support elastic annotation in preempt/reclaim plugin implementaion: #2105

william-wang avatar Dec 03 '21 08:12 william-wang

@qiankunli, You can add the detail infomation about this subject here. Thanks.

william-wang avatar Dec 03 '21 08:12 william-wang

xref: https://github.com/volcano-sh/volcano/pull/1887

k82cn avatar Dec 07 '21 01:12 k82cn

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Mar 17 '22 09:03 stale[bot]

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Sep 08 '22 22:09 stale[bot]

Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗

stale[bot] avatar Nov 12 '22 09:11 stale[bot]