skypilot icon indicating copy to clipboard operation
skypilot copied to clipboard

[Spot] An option for keeping failed spot job for a while before termination

Open Michaelvll opened this issue 3 years ago • 0 comments

When something wrong happens with the spot job, it would be nice to be able to log into the spot cluster to take a look at the problem. As proposed by @lhqing, having an option like --keep-minutes-after-error 60 for the spot launch can be useful for debugging.

Michaelvll avatar Sep 11 '22 00:09 Michaelvll