git-lfs
git-lfs copied to clipboard
Add command(s) to prune working directory
When trying to use git lfs to deal with data larger than the locally available storage, you are often told to:
git lfs install --skip-smudge
or
GIT_LFS_SKIP_SMUDGE=1 git clone SERVER-REPOSITORY
So that only pointers are downloaded.
However as time goes by, you will pull, commit, change, and push a lot, so that the whole git pipeline will become full of large unused data:
- The working directory will contain actual files instead of pointers
- The local repository will be full of git lfs objects
- The remote repository (often a gitlab server) will be full of unreachable git lfs objects
Recently, 2. and 3. were (IMHO) solved:
- The git lfs team kindly added a --force option when pruning, so that the local repository can be actually cleaned:
git lfs prune --force
- Gitlab Housekeeping now actually removes unused git lfs objects
IMHO and for the use case where you want to work with only subset of a repository, git-lfs was previously super interesting but unusable in real life (whether academia or industry). With 2. and 3. solved, allowing working directory pruning would allow lots of people to just start versionning their data for the first time.
Hey,
Technically, such a tool isn't required. If your working directory is clean, you can simply run GIT_LFS_SKIP_SMUDGE=1 git read-tree -u --reset HEAD
and Git will do the right thing for you by switching to pointer files.
"If your working directory is clean" That's the caveat.
Any option we provided here would also have that requirement to avoid destroying data because the requirement is that all LFS files have to be read back into the index or Git will show them as modified. You can run that command without a clean working tree, but it will blow away your changes.
You can run that command without a clean working tree, but it will blow away your changes.
Yes that's a problem.
Let's say you run:
$ GIT_LFS_SKIP_SMUDGE=1 git clone some_repo
$ ... do lots of things ...
$ git lfs pull some_file.csv
You end up in a situation where you cannot conveniently undo the last command (git lfs pull)
I'm going to mark this as an enhancement, since I think there are possibly ways we could do this.
GIT_LFS_SKIP_SMUDGE=1 git read-tree -u --reset HEAD
By the way this doesn't work, the pulled files are not replaced by their pointers.