dvc icon indicating copy to clipboard operation
dvc copied to clipboard

Auto push experiments at end of each stage

Open dberenbaum opened this issue 2 years ago • 3 comments

Auto-pushing checkpoints was introduced to make it easier to recover long-running model training jobs in CI. For long-running processing jobs over multiple pipeline stages, the same behavior should be available at the end of each stage in the pipeline.

dberenbaum avatar Jan 19 '23 15:01 dberenbaum

We are very interested in this feature. We run long on-commit dvc pipelines in CI, by the means of dvc repro and in cases they fail we currently have to rerun everything from scratch. It would be great if intermediate results were downloadable from the remote dvc cache.

sukhovvl avatar Jan 26 '23 16:01 sukhovvl

Furthermore, we experimented a bit with cloud parallelisation of pipeline stages, i.e. a stage that looks like a normal stage for dvc, actually starts various cloud jobs. It would be great if there was a way for those jobs to call dvc pull and get the intermediate results of the previous stages. Leaving for a moment aside the question of how to transfer dvc.lock file to the remote workers and how to funnel back the results of the stages, it feels like intermediate pushes would open many workarounds for these cases. Of course it might seem like a far fetched scenario, but maybe it's another case in point in favour of this feature.

sukhovvl avatar Jan 26 '23 16:01 sukhovvl

+1 for this feature

cateseale avatar Jan 11 '24 17:01 cateseale