volcano icon indicating copy to clipboard operation
volcano copied to clipboard

volcano support checkpoint

Open chloe6888 opened this issue 3 years ago • 1 comments

What would you like to be added:

Hope volcano can support checkpoint. When the job is interrupted due to external reasons, it can continue to run according to the checkpoint.

Why is this needed:

some lengthy calculations can run for days.If the calculation node dies, we have to restart from the beginning.Having a checkpoint created, we can at least restart from this point of calculation.

chloe6888 avatar Aug 22 '22 09:08 chloe6888

May I know which kind of computing engine e.g spark are you using? Usually the chickpoint function is implemented in upper layer framework, e.g pytorch.

william-wang avatar Aug 26 '22 04:08 william-wang

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Nov 26 '22 23:11 stale[bot]

Closing for now as there was no activity for last 60 days after marked as stale, let us know if you need this to be reopened! 🤗

stale[bot] avatar Mar 23 '23 04:03 stale[bot]