workflow restart logic not needed for non-spot
...unless your workflow runs take more than 35 days[^1] to finish?
[^1]: On GitHub, as per https://github.com/iterative/cml/pull/1067.
I think we should strongly consider dropping that.
Reasons I would use cml runner
- short-lived instance life-cycle management because I don't want to pay for that cloud GPU any longer than is required.
- Super Simple setup of the CI agent on a system
I think that something that runs longer than 35 days doesn't really fall into one of the above.
/opinions?
I wholeheartedly agree: the 35 day limit is enough for all the considered use cases unless proven otherwise, and the maintenance overhead of this feature probably outweighs the dubious edge cases where it becomes useful.
>35 days: use LEO
short-lived instance life-cycle management because I don't want to pay for that cloud GPU any longer than is required.
Fine tune Stable difussion, GPT-2 or GPT-3 alternatives might take more than 35 days. Just mentioning 😁
>35days: useLEO
Or anything use LEO