Perform action only on deleted artifacts
Love the tool, so convenient, but asking for a little more convenience...
Feature request: Perform action only on deleted artifacts : --deleted only
I would like to apply a command such that I can purge all unwanted artifacts, but only after they have been deleted first. eg:
git-filter-repo --deleted-only --invert-paths --path-regex '.*\.(class|[ejw]ar|zip|z|gz)'
or:
git-filter-repo --deleted-only --strip-blobs-bigger-than 10M
I'm sure this use case is not unusual. As a Lead / Admin of a large group of mixed experience developers, we find often a mis-constructed ignore file has resulted in unwanted artifacts being committed, resulting in repo bloat.
The documentation reads:
Similarly, you could use
--paths-from-fileto delete many files. For example, you could rungit filter-repo --analyzeto get reports, look in one such as.git/filter-repo/analysis/path-deleted-sizes.txtand copy all the filenames into a file such as /tmp/files-i-dont-want-anymore.txt and then rungit filter-repo --invert-paths --paths-from-file /tmp/files-i-dont-want-anymore.txtto delete them all.
But that means I must process the 'path-deleted-sizes.txt' through a regex, create the /tmp file and process again.
I'd liek the convenience of a one-shot command, but with the safety net of knowing I am applying my criteria (regex, size, etc.) only to files that have already been deleted.
Hopefully the explanation (and contrived examples) is clear.
It's an interesting idea, and might make sense for someone to create a contrib script for.
It would not make sense as part of the main tool because:
- The output files from
--analyzeare really only meant as guiding points, not as Truth. In particular, if the repo has some ancient branch still open that just hasn't been updated in years, it may be that some long-deleted file is still present within that branch. And thus, the file will not show as being deleted in the--analyzereports, because it still exists on some branch. - The tool uses fast-export and fast-import and is thought of as fast-filter. Any kind of pre-processing that involves walking the entire history of the repository as part of the filtering is going to be horrendously slow on big repositories, at least for getting started. I'd rather anything that took that kind of start-up time go in the contrib scripts.