git-filter-repo
git-filter-repo copied to clipboard
Remove history based on gitignore: clean-ignore should not require extra arguments
Hello. I'm not sure if there's an easy way to do this yet.
On our repo several years ago several bin and obj folders were added without .gitignore.
Now our repo is too big and we wanted to clean it based on present .gitignore rules.
The problem is, most folders were rename/restructured, and there's one bin we need to keep. For reasons.
Our .gitignore understand those rules, but our repository is very big because of that.
Is there a way I could do that? We want to make our repository smaller without losing history (and I unfortunately don't know much about the subject).
edit: I guess I could do something like this:
git filter-repo --path src/xpto/bin
git filter-repo --path-glob 'src/*/bin' --invert-paths
But if the src was only an name used before it was restructured and now it's src2, would it work?
See contrib/filter-repo-demos/clean-ignore in this repository; it'll clean all of history based on the gitignore rules from HEAD. I think that does exactly what you want?
I don't understand what is in contrib/filter-repo-demos/clean-ignore at all. I want to just run git filter-repo --gitignore from command line and this should be default behaviour
Or at least git filter-repo --paths-glob-from-file '.gitignore' --invert-paths
Actually shouldn't we just let --paths-from-file support glob pattern?
I don't understand what is in
contrib/filter-repo-demos/clean-ignoreat all.
It's a program, which deletes files from history found in your current .gitignore rules. If you run it, it'll clean up those files for you.
I want to just run
git filter-repo --gitignorefrom command line and this should be default behaviour
You are unwilling to run the script provided that does what you want, and instead are asking for a new option to be added to filter-repo to implement the same behavior? And, even though you are suggesting a new option name, you want it to "be default behaviour"? Why would it be an option if it's the default? I don't understand what you're even saying/asking.
Or at least
git filter-repo --paths-glob-from-file '.gitignore' --invert-paths
.gitignore files are not in general just globs; they use a different special-to-git syntax. That syntax tends to overlap with globs, but has several special meanings (e.g. ! for negation, leading or trailing slashes, treatment of single vs double asterisk, etc.).
Actually shouldn't we just let
--paths-from-filesupport glob pattern?
It does support it; as per the manual any line beginning with "glob:" in the specified file will be treated as a glob.
First. Because it was a separate command that was not exposed like the main command git filter-repo. I am saying this from user experience that it was unexpected and not intuitive. Not to mention it was from the tools installed that was made it's name can be search anywhere. While clean-ignore only hidden here in github
Second. Making the demo as python while the tools itself is actually command line is also very confusing. I would like to said again that I don't understand anything about the demo at all. Because even if I can read python I don't understand why it is python and calling command line with the name of tool difference from what I have installed
Third
this should be default behaviour
Because I can bet that was the main expectation people would want from filter-repo command. Instead of just saying error because no argument it should just clean repo with .gitignore or at least asking that No argument, do you want to clean with gitignore : (y/n)
Fourth. I have tried using clean-ignore and it was not work. I guess I am doing something wrong I don't remember anymore but today I am very hurry so I just try filter-repo --invert-paths for each folder and it then work as expected. It was intuitive and don't make me need to remember separate command hidden nowhere like clean-ignore
Fifth
.gitignore files are not in general just globs
any line beginning with "glob:" in the specified file will be treated as a glob.
This is the point. .gitignore is glob but each line aren't start with glob:. So --paths-from-file did not work with it. And so when I run git filter-repo --paths-from-file '.gitignore' --invert-paths I am very frustrated that it didn't work. And so I have propose that --paths-glob-from-file should work so we could write a list of glob pattern to run this command with any file without the need to put glob: at every lines, which also include .gitignore
It's a program, which deletes files from history found in your current .gitignore rules.
Does it accumulate rules in all .gitignores on the way of a branch history down the HEAD?
Does the .gitignores in a commit+1 has application as a filter to the sourcetree in a commit+2 or only to the sourcetree in the same commit?
First. Because it was a separate command that was not exposed like the main command
git filter-repo. I am saying this from user experience that it was unexpected and not intuitive. Not to mention it was from the tools installed that was made it's name can be search anywhere. Whileclean-ignoreonly hidden here in github
If you are upset that your installation didn't include the contrib scripts, I would highly recommend directing that complaint at whoever installed it for you, or whoever created the package for you that you installed. I created this GitHub repo, but was not involved in any packaging of it.
Second. Making the demo as python while the tools itself is actually command line is also very confusing.
Odd complaint. Yes, clean-ignore is a python script, but so is git-filter-repo itself.
Third
this should be default behaviour
Because I can bet that was the main expectation people would want from
filter-repocommand.
Perhaps for the folks you are dealing with. It does come up occasionally in discussions or requests, which is why I made a demo of how to do it, but it is somewhat rarely brought up or requested. For that reason, I don't think it merits any special consideration. Given that it also needs to fork an extra process to do its work (discussed more below), I think it's in the right place and doesn't belong in the main tool.
Instead of just saying
error because no argumentit should just clean repo with.gitignoreor at least asking thatNo argument, do you want to clean with gitignore : (y/n)
Ah, hidden here is the real issue. clean-ignore attempts to allow you to combine its special behavior with all other behavior of git-filter-repo so you can do multiple types of filtering at once instead of filtering once with clean-ignore and then again with filter-repo. But, to do so, it passes the command line arguments off to the main filter-repo script, which notices there are no arguments and exits with an error. For filter-repo that is what we want, but for clean-ignore the most common use case will be to not add any additional filtering. This surprised someone else as well (see https://github.com/newren/git-filter-repo/issues/415#issuecomment-1349680982.) So that could be fixed up. I'll retitle the bug accordingly.
Fifth
.gitignore files are not in general just globs
any line beginning with "glob:" in the specified file will be treated as a glob.
This is the point.
.gitignoreis glob
No, it is not; go back and read what I wrote. .gitignore contains things other than globs.
And so I have propose that
--paths-glob-from-fileshould work so we could write a list of glob pattern to run this command with any file without the need to putglob:at every lines
That might be reasonable but...
which also include
.gitignore
the fact that people are likely to try this is a very good reason to not do it -- .gitignore is not restricted to globs. The .gitignore language might even change further in the future, so attempting to implement it anew in filter-repo is just asking for future bugs. That's why clean-ignore calls git check-ignore to determine which files are ignored or not (i.e. it uses git's own tool to determine whether a file is ignored). But since it's invoking a separate external process and slowing things down, I think it belongs in a separate tool.
Anyway, I'll retitle based on the surprise that clean-ignore require extra arguments when it shouldn't.
It's a program, which deletes files from history found in your current .gitignore rules.
Does it accumulate rules in all
.gitignoreson the way of a branch history down the HEAD? Does the.gitignoresin a commit+1 has application as a filter to the sourcetree in a commit+2 or only to the sourcetree in the same commit?
Let me repeat: It's a program which deletes files from history found in your current .gitignore rules.
The word current excludes historical .gitignore files. clean-ignore uses whatever .gitignore files happen to be in your working copy at the time it runs (even if they have uncommitted modifications). A side-effect of that is that clean-ignore will not work in a bare clone (unlike most invocations of filter-repo), because the bare clone won't have the .gitignore files checked out.
I have install this script with pip which is the second best suggestion on google https://superuser.com/a/1589985/1648151
And look, I am not talking about clean-ignore or filter-repo being python. It can be anything that work. What I am pointing here is the demo of it https://github.com/newren/git-filter-repo/blob/main/contrib/filter-repo-demos/clean-ignore which should be the example of how to use this command are being python instead of being just a sample of command line we need to type and enter
I am still confusing about what is that folder. Is it the tool itself or the demo of how to use it. Why it named demos if it required to install it?
At the end. My main complaint is still the same. Normal expectation of command line tool is, the tool itself should contains all functionality and command by flags, not the external script which have difference name from the tools and also hidden in the github
somewhat rarely brought up or requested
That's OK, understandable, so please add me as another one voice requesting this feature, filter by gitignore as cli flag instead of external script
...clean-ignore which should be the example of how to use this command
No, not even close. As noted above, this is just one possible use of filter-repo and is far from even being common, let alone "the example of how to use this command".
I am still confusing about what is that folder. Is it the tool itself or the demo of how to use it. Why it named demos if it required to install it?
The second paragraph of the project's README states:
While most users will probably just use filter-repo as a simple command line tool (and likely only use a few of its flags), at its core filter-repo contains a library for creating history rewriting tools. As such, users with specialized needs can leverage it to quickly create entirely new history rewriting tools.
In fact, filter-repo started life solely as a library for creating history rewriting tools, and had no command line program that you could run directly. You seem to want it to only be a tool which does history rewriting, as opposed to also being a library for creating more general history rewriting tools. I understand that desire of yours, but the library aspect was first because it was my most important priority. It may not be the most important priority anymore, but it is definitely still a high priority. And, as such, the project needs to have examples of how to use it as a library. So, while I understand your request, I am denying it since it undercuts much higher priorities of mine. This functionality will remain available through a separate script.
At the end. My main complaint is still the same. Normal expectation of command line tool is, the tool itself should contains all functionality and command by flags, not the external script which have difference name from the tools and also hidden in the github
And here you make it really clear that you consider this just a command line tool, apropos of nothing, and in complete contravention of the documentation.
somewhat rarely brought up or requested
That's OK, understandable, so please add me as another one voice requesting this feature, filter by gitignore as cli flag instead of external script
You misunderstood this part as well. Your request isn't "somewhat rarely brought up or requested"; I used that phrase to describe the capabilities in clean-ignore -- people only somewhat rarely want to clean up files found in a .gitignore file. Your specific request -- to make that functionality part of filter-repo instead of in a separate script -- hasn't been requested by anyone else ever to my knowledge, so "another one voice" doesn't make sense here.
Anyway, I'll retitle based on the surprise that clean-ignore require extra arguments when it shouldn't.
...and I've now fixed this issue in commit 5c54a5332b8e (clean-ignore: do not require additional arguments to be passed, 2024-07-02), so I'll close this issue out.