dvc icon indicating copy to clipboard operation
dvc copied to clipboard

auto push on a repository basis

Open igordertigor opened this issue 11 months ago • 2 comments

I run a lot of my ML workloads in short lived containers in a dedicated ML cluster. The typical workflow is like this:

  1. Prepare experiment locally, run a single, smaller epoch for testing
  2. git push && dvc push to repository
  3. Start container, git pull && dvc pull in the container
  4. Run either dvc repro or dvc exp run.
  5. git push && dvc push in the container More often than desirable, I forget step 5 here or I just run the git push part of it. As a result, I end up being left with a corrupted cache and I can't access the experiment's results using dvc metrics and similar.

I am aware that there are git-hooks that I can set up using dvc install. However, given that the containers are typically rather short lived, I tend to not install those either and there also is no guarantee that collaborators will remember to install the hooks. I would therefore appreciate a repository level setting in .dvc/config. I know that there is such a setting for experiments (exp.auto_push) but it doesn't seem to apply for cases where I run dvc repro.

Also, in a perfect world, this feature would be configurable on a per-host basis so that I can specify patterns on which autostage/auto_push are active like ml-container-.*).

igordertigor avatar Feb 03 '25 10:02 igordertigor

Can it be part of the container config (I mean hooks setup)?

shcheklein avatar Feb 03 '25 19:02 shcheklein

Not necessarily the preferred way. In many cases, I don't have direct access to the container config. So it would be an additional step to remember. It also isn't generally possible to modify the container config in a persistent way. So the collaborators issue would somewhat remain too.

igordertigor avatar Feb 16 '25 20:02 igordertigor

Sounds like you don't have control over what's installed in the container, but in case you do, I also wanted a simpler unified Git + DVC UX so I built that into Calkit (pip install calkit-python or uv tool install calkit-python). Most commands (add, commit, push, pull) simply wrap Git and DVC and do both/either depending on file type/size or where it's been committed before. calkit run --save also executes dvc repro, git add, dvc add, git commit -m "Run pipeline", git push, and dvc push all in one go. It's kind of like an opinionated Git + DVC wrapper for lazy people (like me 😄), but without the intention of replacing either for when users want low-level control.

petebachant avatar Jul 21 '25 13:07 petebachant