git-filter-repo icon indicating copy to clipboard operation
git-filter-repo copied to clipboard

Perform action only on deleted artifacts

Open ianwilliams1 opened this issue 2 years ago • 1 comments

Love the tool, so convenient, but asking for a little more convenience...

Feature request: Perform action only on deleted artifacts : --deleted only

I would like to apply a command such that I can purge all unwanted artifacts, but only after they have been deleted first. eg:

git-filter-repo --deleted-only --invert-paths --path-regex '.*\.(class|[ejw]ar|zip|z|gz)'
or:
git-filter-repo --deleted-only --strip-blobs-bigger-than 10M

I'm sure this use case is not unusual. As a Lead / Admin of a large group of mixed experience developers, we find often a mis-constructed ignore file has resulted in unwanted artifacts being committed, resulting in repo bloat.

The documentation reads:

Similarly, you could use --paths-from-file to delete many files. For example, you could run git filter-repo --analyze to get reports, look in one such as .git/filter-repo/analysis/path-deleted-sizes.txt and copy all the filenames into a file such as /tmp/files-i-dont-want-anymore.txt and then run

git filter-repo --invert-paths --paths-from-file /tmp/files-i-dont-want-anymore.txt

to delete them all.

But that means I must process the 'path-deleted-sizes.txt' through a regex, create the /tmp file and process again. I'd liek the convenience of a one-shot command, but with the safety net of knowing I am applying my criteria (regex, size, etc.) only to files that have already been deleted.

Hopefully the explanation (and contrived examples) is clear.

ianwilliams1 avatar Mar 29 '23 23:03 ianwilliams1

It's an interesting idea, and might make sense for someone to create a contrib script for.

It would not make sense as part of the main tool because:

  • The output files from --analyze are really only meant as guiding points, not as Truth. In particular, if the repo has some ancient branch still open that just hasn't been updated in years, it may be that some long-deleted file is still present within that branch. And thus, the file will not show as being deleted in the --analyze reports, because it still exists on some branch.
  • The tool uses fast-export and fast-import and is thought of as fast-filter. Any kind of pre-processing that involves walking the entire history of the repository as part of the filtering is going to be horrendously slow on big repositories, at least for getting started. I'd rather anything that took that kind of start-up time go in the contrib scripts.

newren avatar Apr 11 '23 03:04 newren