dvc
dvc copied to clipboard
import/update: cache git repos/clones
dvc import https://some/git/repo/ some_file
dvc update # should not re-clone, should only pull into existing cache
- related: #3438, #3473
The thing is cache is not persisted between dvc
runs, if we make it persist then that won't reclone only make git pull
in dvc update
.
yes; this is about making it persistent & pulling rather than re-cloning.
What about a repo cache at the user level? Could be a system config var so you can disable it, like analytics.
Context: #4203
in light of #4246 being merged going to downgrade priority here...
Persistent clones (as per #10511) are different from shallow clones (as per #4246). Both speed up cloning (or potentially avoid it) but only persistent clones can allow us to work with imported data without internet connectivity, which is necessary for us on a HPC where most queues have no connectivity.
Persistent clones would also allow us to separate cloning (which requires connectivity) from other dvc operations (which don't). This would allow us to do the former in an environment (queue) with connectivity and the latter in environments without.
@johnyaku Have you considered keeping a clone on a shared space of the HPC so you can import from there instead of from the internet? Even if dvc had some support for caching clones, it would likely still need to check the internet to fetch updates from those clones. If you have your own clone of the repo, you can fully control when to update it and everyone can share that single repo copy (dvc will not make a new clone of a local repo).