Repo not shrinking, number of commits doubles
I must be doing something wrong, but can't suss out what that might be. Steps taken:
$ git clone --mirror https://dev.azure.com/company/Playground/_git/SizeTest R Cloning into bare repository 'R'... remote: Azure Repos remote: Found 1512904 objects to send. (20646 ms) Receiving objects: 100% (1512904/1512904), 28.49 GiB | 22.01 MiB/s, done. Resolving deltas: 100% (1054450/1054450), done.
$ cd R
$ git filter-repo --paths-from-file ../pathsToRemove.txt --invert-paths Parsed 172371 commits New history written in 403.99 seconds; now repacking/cleaning... Repacking your repo and cleaning out old unneeded objects Enumerating objects: 1997320, done. Counting objects: 100% (1997320/1997320), done. Delta compression using up to 8 threads Compressing objects: 100% (605875/605875), done. Writing objects: 100% (1997320/1997320), done. Selecting bitmap commits: 328059, done. Building bitmaps: 100% (370/370), done. Total 1997320 (delta 1515440), reused 1857890 (delta 1376055), pack-reused 0 Expanding reachable commits in commit graph: 330509, done. Completely finished after 992.60 seconds.
At this point, doing a du -sk shows that the repo hasn't shrunk at all. Running the same command again shows:
$ git filter-repo --paths-from-file ../pathsToRemove.txt --invert-paths Parsed 330509 commits New history written in 727.61 seconds; now repacking/cleaning... Repacking your repo and cleaning out old unneeded objects Enumerating objects: 2003932, done. Counting objects: 100% (2003932/2003932), done. Delta compression using up to 8 threads Compressing objects: 100% (473105/473105), done. Writing objects: 100% (2003932/2003932), done. Selecting bitmap commits: 333060, done. Building bitmaps: 100% (371/371), done. Total 2003932 (delta 1522063), reused 1997306 (delta 1515437), pack-reused 0 Expanding reachable commits in commit graph: 337121, done. Completely finished after 1232.85 seconds.
Notice that the number of commits above has nearly doubled for some reason. Running the command a third time results in slightly more commits, but not doubling (maybe 7k additional commits) The file "pathsToRemove.txt" contains lines like the following, which were copy/pasted from some of the --analyze output files:
R/RC/help/R.chm R/CW/help/R.chm RSQL/RSchema.vsd R/Tools/RDM/release R/packages R/lib/Aspose.Pdf.dll R/lib/Aspose.Words.dll R/Server/bin/Debug/.dll R/Server/bin/Debug/.pdb
I've tried running using --path on the command line as well with the same results. This repo lives on Azure Devops. Any ideas? Thanks!
Bryan
Receiving objects: 100% (1512904/1512904), 28.49 GiB | 22.01 MiB/s, done.
That is a huge repository. There's a significant risk that attempting to repack is completely failing, leaving the rewrite of various refs not completed. What kind of memory do you have available on the machine you are doing this rewrite on? Can you retry with a newer version of git-filter-repo, one with commit 44ecf0cd74e1 (filter-repo: notice and signal when cleanup commands fail, 2024-08-01), which is not yet part of any release? That commit won't fix this problem, but it'd at least give you an error message when the intermediate steps fail instead of ignoring errors coming from those other commands.
Thanks for the response. Yes, the repo is huge - hence the reason I'm trying desperately to shrink it! :-)
The version I ran did have that commit (I downloaded the copy of git-filter-repo from the homepage). It contained the lines changed in the commit:
for cmd in cleanup_cmds:
if show_debuginfo:
print("[DEBUG] Running{}: {}".format(location_info, ' '.join(cmd)))
> ret = subproc.call(cmd, cwd=repo)
> if ret != 0:
> raise SystemExit("fatal: running '%s' failed!" % ' '.join(cmd))
if cmd[0:3] == 'git reflog expire'.split():
self._write_stash()
I just reran it. Some details:
- Windows Server 2019 Standard with 18GB of memory
- Running git-filter-repo from a Cygwin bash prompt (running in a command window had the same result)
- As it's processing commits, git.exe is using between 200MB and 250MB of RAM, and python uses between 180MB and 436MB of RAM
- During repacking, enumerating objects, git grows to 550MB
- During compressing, git grows to 670MB
- During writing, git grows to 678MB but then about 80% of the way done git goes down to 126MB and python down to 226MB. As it gets up to 94% done, they shrink further; python is 18MB to 34MB and git 39MB to 110MB.
- Building bitmaps, git is between 580MB and 625MB
After finishing, the pack is about 25MB smaller, which isn't anywhere near what I'm expecting it should be, and the number of commits is still doubling.
Is there anything else that I can do to help debug what might be going wrong? Thanks again for the assistance,
Bryan
Do you have a background job (git maintenance maybe?) which is forcibly fetching the repository and thus updating it with the old history while git-filter-repo is writing the new?
Thanks for the question, but no, there are no background jobs. This is running on my local development machine.