auto push on a repository basis
I run a lot of my ML workloads in short lived containers in a dedicated ML cluster. The typical workflow is like this:
- Prepare experiment locally, run a single, smaller epoch for testing
git push && dvc pushto repository- Start container,
git pull && dvc pullin the container - Run either
dvc reproordvc exp run. git push && dvc pushin the container More often than desirable, I forget step 5 here or I just run thegit pushpart of it. As a result, I end up being left with a corrupted cache and I can't access the experiment's results usingdvc metricsand similar.
I am aware that there are git-hooks that I can set up using dvc install. However, given that the containers are typically rather short lived, I tend to not install those either and there also is no guarantee that collaborators will remember to install the hooks. I would therefore appreciate a repository level setting in .dvc/config. I know that there is such a setting for experiments (exp.auto_push) but it doesn't seem to apply for cases where I run dvc repro.
Also, in a perfect world, this feature would be configurable on a per-host basis so that I can specify patterns on which autostage/auto_push are active like ml-container-.*).
Can it be part of the container config (I mean hooks setup)?
Not necessarily the preferred way. In many cases, I don't have direct access to the container config. So it would be an additional step to remember. It also isn't generally possible to modify the container config in a persistent way. So the collaborators issue would somewhat remain too.
Sounds like you don't have control over what's installed in the container, but in case you do, I also wanted a simpler unified Git + DVC UX so I built that into Calkit (pip install calkit-python or uv tool install calkit-python). Most commands (add, commit, push, pull) simply wrap Git and DVC and do both/either depending on file type/size or where it's been committed before. calkit run --save also executes dvc repro, git add, dvc add, git commit -m "Run pipeline", git push, and dvc push all in one go. It's kind of like an opinionated Git + DVC wrapper for lazy people (like me 😄), but without the intention of replacing either for when users want low-level control.