git-lfs icon indicating copy to clipboard operation
git-lfs copied to clipboard

Add command(s) to prune working directory

Open Pierre-Bartet opened this issue 4 years ago • 6 comments

When trying to use git lfs to deal with data larger than the locally available storage, you are often told to:

git lfs install --skip-smudge

or

GIT_LFS_SKIP_SMUDGE=1 git clone SERVER-REPOSITORY

So that only pointers are downloaded.

However as time goes by, you will pull, commit, change, and push a lot, so that the whole git pipeline will become full of large unused data:

  1. The working directory will contain actual files instead of pointers
  2. The local repository will be full of git lfs objects
  3. The remote repository (often a gitlab server) will be full of unreachable git lfs objects

Recently, 2. and 3. were (IMHO) solved:

  1. The git lfs team kindly added a --force option when pruning, so that the local repository can be actually cleaned:
git lfs prune --force
  1. Gitlab Housekeeping now actually removes unused git lfs objects

IMHO and for the use case where you want to work with only subset of a repository, git-lfs was previously super interesting but unusable in real life (whether academia or industry). With 2. and 3. solved, allowing working directory pruning would allow lots of people to just start versionning their data for the first time.

Pierre-Bartet avatar Jan 18 '21 16:01 Pierre-Bartet

Hey,

Technically, such a tool isn't required. If your working directory is clean, you can simply run GIT_LFS_SKIP_SMUDGE=1 git read-tree -u --reset HEAD and Git will do the right thing for you by switching to pointer files.

bk2204 avatar Jan 19 '21 15:01 bk2204

"If your working directory is clean" That's the caveat.

Pierre-Bartet avatar Jan 19 '21 19:01 Pierre-Bartet

Any option we provided here would also have that requirement to avoid destroying data because the requirement is that all LFS files have to be read back into the index or Git will show them as modified. You can run that command without a clean working tree, but it will blow away your changes.

bk2204 avatar Jan 19 '21 19:01 bk2204

You can run that command without a clean working tree, but it will blow away your changes.

Yes that's a problem.

Let's say you run:

$ GIT_LFS_SKIP_SMUDGE=1 git clone some_repo
$ ... do lots of things ...
$ git lfs pull some_file.csv

You end up in a situation where you cannot conveniently undo the last command (git lfs pull)

Pierre-Bartet avatar Jan 19 '21 19:01 Pierre-Bartet

I'm going to mark this as an enhancement, since I think there are possibly ways we could do this.

bk2204 avatar Jan 19 '21 21:01 bk2204

GIT_LFS_SKIP_SMUDGE=1 git read-tree -u --reset HEAD

By the way this doesn't work, the pulled files are not replaced by their pointers.

Pierre-Bartet avatar Oct 20 '21 06:10 Pierre-Bartet