git-lfs icon indicating copy to clipboard operation
git-lfs copied to clipboard

Add command to undo fetch and checkout

Open shoelzer opened this issue 9 years ago • 49 comments

Git LFS provides control over which files to fetch/cache (lfs.fetchinclude, lfs.fetchexclude) and which files to put in the working dir (checkout <filespec>). But once files are in the cache or working dir there is no (easy) way to undo those operations. I would like a command to change files into placeholders in the working dir and delete cached files.

The goal of this command is to free disk space used by LFS files. It goes much further than prune. I should be able to remove all LFS files from the working dir and cache. By default, the command should protect against data loss by verifying that files exist on the LFS server before deleting, but there should also be an option to skip that check.

I don't know if this should be a new command, multiple new commands, or options added to existing commands. Some ideas...

  • Single new command: scrub, clear, free
  • Multiple new commands: uncheckout and unfetch
  • New options for commands: checkout --placeholder and fetch --clear-cache

shoelzer avatar Apr 28 '16 14:04 shoelzer

I think this sounds like a great idea. I think git lfs prune could get some kind of --all option that just basically does rm -rf .git/lfs/objects. "Uncheckout" (the act of replacing LFS tracked files in the working directory with the LFS pointer) should probably be a new command though. I don't like the idea of a command like git lfs checkout doing the opposite given a new flag like git lfs checkout --clear.

technoweenie avatar Apr 28 '16 15:04 technoweenie

Thanks for raising this, it's a solid point.

In terms of deleting fetched content in the .git/lfs/objects store, I think that's a valid extension to git lfs prune of anything that would have been omitted at fetch time because of lfs.fetchinclude and lfs.fetchexclude. That could be default behaviour, the rest of prune is just an inverse of fetch with a little date padding to avoid thrashing.

As for resetting what's currently in the working copy back to pointers, you can actually achieve this now using git reset --hard if the object isn't already fetched into .git/lfs/objects. The smudge filter will get invoked and if the object data isn't local already and the fetch is suppressed due to include/exclude settings then it will write the pointer to the working copy instead, over the top of any current content. The only reason this doesn't work as a solution right now is that you have no real way to easily remove all the objects you've already fetched that now match those settings, which the changes above for prune would do.

So I think this doesn't need any extra commands, just an enhancement to the default behaviour of git lfs prune. [edit]Ha @technoweenie beat me while I was typing 😆

sinbad avatar Apr 28 '16 16:04 sinbad

FWIW, an "uncheckout" command was requested in https://github.com/github/git-lfs/issues/944#issuecomment-175418615 too. I think I'd prefer having our own documented/tested command, instead of encouraging git reset --hard with caveats that the user has to worry about.

technoweenie avatar Apr 28 '16 16:04 technoweenie

Thanks for the tips on git reset --hard. I can use that right now.

I like the idea of extending git lfs prune and adding "uncheckout". Perhaps git lfs checkin?

shoelzer avatar Apr 28 '16 16:04 shoelzer

I'd favour something like git lfs checkout --clean since it's not really doing the opposite of what checkout normally does, it's just doing it again from scratch with the latest settings.

sinbad avatar Apr 29 '16 09:04 sinbad

Hi All, recently I'm evaluating LFS support big binary test data. I'm really like the idea to revert local checkout LFS file to pointer file. Is it available now in latest LFS version? Because we have many large test data, each one is around 2GB, so we want to revert all LFS files to pointer files even if those are latest version. Thanks!

swordfly avatar Aug 11 '17 12:08 swordfly

I think people start to ask for more from Git LFS to serve large files on demand such as Google Drive File Stream or Dropbox smart sync despite limited local storage. Before that, a clumsy way to manually convert the large file back is to use git lfs clean as follows:

$> mv largefile.bin largefile.bin.bak
$> cat largefile.bin.bak | git lfs clean > largefile.bin
$> rm largefile.bin.bak

Those lines can be wrapped as a script command. Unfortunately, the clean filter needs to compute based on the entire file and is running slow. Any suggestion to improve the performance would be highly anticipated.

farleylai avatar Nov 28 '17 22:11 farleylai

@farleylai Without more serious workarounds, I think that this is the "best" solution for now. That said, I really like the idea of an git-lfs-stash command. "stash" is perhaps confusing as it has other connotations, but something that would remove un-checkout the object and potentially prune it from your local cache.

This is a reasonably sized project, but I would be more than happy to guide you or anyone else through it as an OSS contribution. That said, if nobody takes this on, I'd be happy to add it myself within the next few releases (cc @technoweenie).

ttaylorr avatar Nov 29 '17 03:11 ttaylorr

@ttaylorr and @swordfly I just figured out a way to check out the pointer file as is without re-computation by untracking the file type temporarily:

git lfs untrack '*.bin'
git checkout largefile.bin
git lfs track '*.bin'

However, I still imagine a handy flag works as follows: When the flag is set true, the pointer files are always checked out as is with git checkout/fetch/pull. The user must explicitly download the large files with git lfs pull. The recovery is simply to check out again. Otherwise, git-lfs always materializes the pointers implicitly with with git checkout/fetch/pull. This essentially means to control the filtering by tracking add/push and untracking checkout/pull. So far, the closest one is set by running git lfs install --skip-smudge but only works for the first time clone. A little bit more flexibility would be appreciated.

farleylai avatar Nov 29 '17 22:11 farleylai

However, I still imagine a handy flag works as follows:

If I'm understanding your proposal correctly, I think that this is largely accomplishable with the --include and --exclude flags that are provided in Git LFS. If what you're looking for are ways to by default not check an LFS object out into the working tree, I think that this should be left to scripting within the repository.

ttaylorr avatar Dec 01 '17 17:12 ttaylorr

If what you're looking for are ways to by default not check an LFS object out into the working tree, I think that this should be left to scripting within the repository.

You can do this by running:

# in your working directory
$ git config --file=.lfsconfig lfs.fetchexclude "*"

Add that .lfsconfig file to your repository, and the default exclude value will be used if no alternate is given (user git config, arguments to git lfs clone or git lfs pull, etc).

technoweenie avatar Dec 01 '17 17:12 technoweenie

@ttaylorr Not exactly. I am aligned with the OP. So the requirement is for git to checkout/recover the pointer files as is in the repo if unchanged or absent in a transparent way. Sure enough, specifying --exlude or lfs.fetchexclude can serve as the hint but git-lfs does not seem to recover the pointer files essentially.

farleylai avatar Dec 02 '17 01:12 farleylai

@farleylai Sorry, can you explain what you mean by the phrase "recover the pointer file(s)"? Thanks.

ttaylorr avatar Dec 02 '17 01:12 ttaylorr

I think he means the reverse of smudge. It sounds like he has a repository with the pointer files already replaced by the actual large files via the smudge filter, and wants to reclaim a little local disk space by removing them.

You can script it, or do it manually right now:

# all LFS files are real
$ git lfs ls-files
9252a75c94 * bin/again.bin
0263829989 * bin/b.bin
98ea6e4f21 * bin/hi.bin
b9f86fab47 * gif/atom-undo.gif
d1c8fab514 * gif/droidtocat.gif
d1c8fab514 * gif/dupe.gif
55d51edb30 * png/render.png

$ git config lfs.fetchexclude '*'
$ git show HEAD:bin/again.bin > bin/again.bin
$ git lfs pull # no-op because of lfs.fetchexclude

# bin/again.bin is just a pointer
$ git lfs ls-files
9252a75c94 - bin/again.bin
0263829989 * bin/b.bin
98ea6e4f21 * bin/hi.bin
b9f86fab47 * gif/atom-undo.gif
d1c8fab514 * gif/droidtocat.gif
d1c8fab514 * gif/dupe.gif
55d51edb30 * png/render.png

That's pretty cumbersome, and doesn't remove the file from .git/lfs/objects.

technoweenie avatar Dec 04 '17 16:12 technoweenie

...
$ git show HEAD:bin/again.bin > bin/again.bin
...

is exactly the key command to getting the pointer file back but it seems to require the path matching what is listed by git lfs ls-files. Alternatively, turning off the lfs tracking temporarily and git checkout works as shown earlier in general for files and directories relative to cwd. Ultimately, removing the corresponding lfs files in .git/lfs/objects accordingly is welcome. So the followup question is how to get the lfs object path in .git/lfs/objects corresponding to the oid sha256. Is it sufficient to just delete it?

farleylai avatar Dec 04 '17 17:12 farleylai

Yes, LFS will happily re-download the files if they're not in .git/lfs/objects.

technoweenie avatar Dec 04 '17 18:12 technoweenie

@technoweenie Any effort started on this? Not that I need this urgently; I have a script that users run to empty the entire .git/lfs/objects folder and restore every pointer into the working directory. I even have a script for git lfs pull origin <some-large-file>.

This feature isn't urgent because non-tech users don't have huge files (Word documents less than 5MB). Tech users who need to flit between huge files (ISOs, binaries, etc) can already revert huge files to text pointers on their own.

jhannwong avatar Feb 28 '18 13:02 jhannwong

@hannwong no, but this is something that I think would be worth considering for the forthcoming v2.5.0 release.

ttaylorr avatar Feb 28 '18 23:02 ttaylorr

Looking for this feature as well and google lead me to this thread, do we it supported ready?

zzhang2019 avatar Feb 19 '19 05:02 zzhang2019

do we it supported ready?

Not yet, but we will make sure to update this issue if/when we do.

ttaylorr avatar Feb 19 '19 06:02 ttaylorr

I use this one from my git root:

lfs_files=($(git lfs ls-files -n))
for file in "${lfs_files[@]}"; do
  git cat-file -e "HEAD:${file}" && git cat-file -p "HEAD:${file}" > "$file"
done

fstefanov avatar Mar 15 '19 13:03 fstefanov

@fstefanov this will write back the pointer but not delete the cached objects in .git/lfs/objects which still use up disk space. It also makes git think that the file has changed, at least with my version of git (2.21.0).

rokroskar avatar Jun 05 '19 14:06 rokroskar

no, but this is something that I think would be worth considering for the forthcoming v2.5.0 release.

We're past 2.7 already and we're still using scripts we hacked together to accomplish this. Any way you guys could devise your own git-lfs certified and approved command for the next release?

My team :heart: git-lfs's ease of use and it'd be great to have these pruning features out-of-the-box.

cardoso-neto avatar Aug 03 '19 17:08 cardoso-neto

Great feature, I look forward to it being implemented. Here's the cleanup script I use in the meanwhile based on @fstefanov #!/bin/bash lfs_files=($(git lfs ls-files -n)) for file in "${lfs_files[@]}"; do git cat-file -e "HEAD:${file}" && git cat-file -p "HEAD:${file}" > "$file" done rm -rf .git/lfs/objects

8ctopus avatar Oct 10 '19 06:10 8ctopus

For only undo checkout, I find the following to be faster and easier.

git lfs uninstall
git lfs ls-files -n | xargs rm
git resotre .
git lfs install

gregwym avatar Dec 08 '19 13:12 gregwym

Here is a small program that I've written and been using for a few weeks: https://github.com/hobofan/lfs-unload

Still not a builtin solution, but it's been pretty robust for me so far.

hobofan avatar Apr 24 '20 18:04 hobofan

Well it's been 4 years and this ability seems, from a new-comer perspective, like something that would be assumed to be core functionality at this point and not requiring weird script fiddling. Any idea if this is on a road map somewhere at least?

skullthug avatar Dec 17 '20 01:12 skullthug

It is not on a roadmap anywhere, but we'll of course happily accept a suitable patch. if you want to remove unused LFS objects, that can be done easily with git lfs prune. If you just want to do a checkout of your branch without the LFS files, you can do that with GIT_LFS_SKIP_SMUDGE=1 git read-tree -u --reset HEAD. Note that that will delete all changes to your working tree.

bk2204 avatar Dec 17 '20 15:12 bk2204

I'm afraid contributing is beyond my know-how, but I'd urge you look at @hobofan 's solution at least and see if that's something approachable. Unfortunately after spending a full week debugging this, I can safely say git lfs prune does nothing in my case, when trying to turn LFS files back into pointers, if they already exist locally. The only thing that has technically worked is scripting @technoweenie 's comment. But, in my situation it results in a mess to cleanup with the working tree afterwards, on a level that I would never recommend this casually to a team-member.

My case specifically is I want to apply a lfs fetchexclude (on a folder where we store our binary builds), and unload the LFS files mentioned in the exclude- ideally without having to reset the entire working tree but at this point I would consider that a small mercy. Note: If I apply the fetchexclude before fetching LFS on a new clone, it works as expected as the files are generated as pointers. But again, if you don't remember todo this and the LFS files are pulled, they're really hard to get rid of. I also don't want to have to re-clone my repository in the future if we ever add more folders to exclude. So a built-in LFS unload/clean operation here would be overwhelmingly welcomed.

GIT_LFS_SKIP_SMUDGE=1 is the one thing I haven't tried yet, so I'll edit this comment with the results. Edit: GIT_LFS_SKIP_SMUDGE=1 git read-tree -u --reset HEAD did nothing at all for me, if the LFS files are already there they remain there.

skullthug avatar Dec 17 '20 20:12 skullthug

I would happily help if I had the required knowledge. Also I doubt it would be so easy, otherwise such a core issue would not have survived 4 years. It looks full of annoying corner cases. Has anyone tried @farleylai solution on the long term ?

$ git lfs untrack '*.bin'
$ git checkout largefile.bin
$ git lfs track '*.bin'

Pierre-Bartet avatar Dec 24 '20 09:12 Pierre-Bartet