git-filter-repo icon indicating copy to clipboard operation
git-filter-repo copied to clipboard

Remove history based on gitignore: clean-ignore should not require extra arguments

Open fdbva opened this issue 3 years ago • 8 comments
trafficstars

Hello. I'm not sure if there's an easy way to do this yet.

On our repo several years ago several bin and obj folders were added without .gitignore. Now our repo is too big and we wanted to clean it based on present .gitignore rules.

The problem is, most folders were rename/restructured, and there's one bin we need to keep. For reasons.

Our .gitignore understand those rules, but our repository is very big because of that.

Is there a way I could do that? We want to make our repository smaller without losing history (and I unfortunately don't know much about the subject).

edit: I guess I could do something like this:

git filter-repo --path src/xpto/bin
git filter-repo --path-glob 'src/*/bin' --invert-paths

But if the src was only an name used before it was restructured and now it's src2, would it work?

fdbva avatar Mar 31 '22 18:03 fdbva

See contrib/filter-repo-demos/clean-ignore in this repository; it'll clean all of history based on the gitignore rules from HEAD. I think that does exactly what you want?

newren avatar May 28 '22 00:05 newren

I don't understand what is in contrib/filter-repo-demos/clean-ignore at all. I want to just run git filter-repo --gitignore from command line and this should be default behaviour

Or at least git filter-repo --paths-glob-from-file '.gitignore' --invert-paths

Actually shouldn't we just let --paths-from-file support glob pattern?

Thaina avatar Mar 29 '23 05:03 Thaina

I don't understand what is in contrib/filter-repo-demos/clean-ignore at all.

It's a program, which deletes files from history found in your current .gitignore rules. If you run it, it'll clean up those files for you.

I want to just run git filter-repo --gitignore from command line and this should be default behaviour

You are unwilling to run the script provided that does what you want, and instead are asking for a new option to be added to filter-repo to implement the same behavior? And, even though you are suggesting a new option name, you want it to "be default behaviour"? Why would it be an option if it's the default? I don't understand what you're even saying/asking.

Or at least git filter-repo --paths-glob-from-file '.gitignore' --invert-paths

.gitignore files are not in general just globs; they use a different special-to-git syntax. That syntax tends to overlap with globs, but has several special meanings (e.g. ! for negation, leading or trailing slashes, treatment of single vs double asterisk, etc.).

Actually shouldn't we just let --paths-from-file support glob pattern?

It does support it; as per the manual any line beginning with "glob:" in the specified file will be treated as a glob.

newren avatar Mar 29 '23 14:03 newren

First. Because it was a separate command that was not exposed like the main command git filter-repo. I am saying this from user experience that it was unexpected and not intuitive. Not to mention it was from the tools installed that was made it's name can be search anywhere. While clean-ignore only hidden here in github

Second. Making the demo as python while the tools itself is actually command line is also very confusing. I would like to said again that I don't understand anything about the demo at all. Because even if I can read python I don't understand why it is python and calling command line with the name of tool difference from what I have installed

Third

this should be default behaviour

Because I can bet that was the main expectation people would want from filter-repo command. Instead of just saying error because no argument it should just clean repo with .gitignore or at least asking that No argument, do you want to clean with gitignore : (y/n)

Fourth. I have tried using clean-ignore and it was not work. I guess I am doing something wrong I don't remember anymore but today I am very hurry so I just try filter-repo --invert-paths for each folder and it then work as expected. It was intuitive and don't make me need to remember separate command hidden nowhere like clean-ignore

Fifth

.gitignore files are not in general just globs

any line beginning with "glob:" in the specified file will be treated as a glob.

This is the point. .gitignore is glob but each line aren't start with glob:. So --paths-from-file did not work with it. And so when I run git filter-repo --paths-from-file '.gitignore' --invert-paths I am very frustrated that it didn't work. And so I have propose that --paths-glob-from-file should work so we could write a list of glob pattern to run this command with any file without the need to put glob: at every lines, which also include .gitignore

Thaina avatar Mar 29 '23 16:03 Thaina

It's a program, which deletes files from history found in your current .gitignore rules.

Does it accumulate rules in all .gitignores on the way of a branch history down the HEAD? Does the .gitignores in a commit+1 has application as a filter to the sourcetree in a commit+2 or only to the sourcetree in the same commit?

andry81 avatar Mar 29 '23 17:03 andry81

First. Because it was a separate command that was not exposed like the main command git filter-repo. I am saying this from user experience that it was unexpected and not intuitive. Not to mention it was from the tools installed that was made it's name can be search anywhere. While clean-ignore only hidden here in github

If you are upset that your installation didn't include the contrib scripts, I would highly recommend directing that complaint at whoever installed it for you, or whoever created the package for you that you installed. I created this GitHub repo, but was not involved in any packaging of it.

Second. Making the demo as python while the tools itself is actually command line is also very confusing.

Odd complaint. Yes, clean-ignore is a python script, but so is git-filter-repo itself.

Third

this should be default behaviour

Because I can bet that was the main expectation people would want from filter-repo command.

Perhaps for the folks you are dealing with. It does come up occasionally in discussions or requests, which is why I made a demo of how to do it, but it is somewhat rarely brought up or requested. For that reason, I don't think it merits any special consideration. Given that it also needs to fork an extra process to do its work (discussed more below), I think it's in the right place and doesn't belong in the main tool.

Instead of just saying error because no argument it should just clean repo with .gitignore or at least asking that No argument, do you want to clean with gitignore : (y/n)

Ah, hidden here is the real issue. clean-ignore attempts to allow you to combine its special behavior with all other behavior of git-filter-repo so you can do multiple types of filtering at once instead of filtering once with clean-ignore and then again with filter-repo. But, to do so, it passes the command line arguments off to the main filter-repo script, which notices there are no arguments and exits with an error. For filter-repo that is what we want, but for clean-ignore the most common use case will be to not add any additional filtering. This surprised someone else as well (see https://github.com/newren/git-filter-repo/issues/415#issuecomment-1349680982.) So that could be fixed up. I'll retitle the bug accordingly.

Fifth

.gitignore files are not in general just globs

any line beginning with "glob:" in the specified file will be treated as a glob.

This is the point. .gitignore is glob

No, it is not; go back and read what I wrote. .gitignore contains things other than globs.

And so I have propose that --paths-glob-from-file should work so we could write a list of glob pattern to run this command with any file without the need to put glob: at every lines

That might be reasonable but...

which also include .gitignore

the fact that people are likely to try this is a very good reason to not do it -- .gitignore is not restricted to globs. The .gitignore language might even change further in the future, so attempting to implement it anew in filter-repo is just asking for future bugs. That's why clean-ignore calls git check-ignore to determine which files are ignored or not (i.e. it uses git's own tool to determine whether a file is ignored). But since it's invoking a separate external process and slowing things down, I think it belongs in a separate tool.

Anyway, I'll retitle based on the surprise that clean-ignore require extra arguments when it shouldn't.

newren avatar Mar 29 '23 18:03 newren

It's a program, which deletes files from history found in your current .gitignore rules.

Does it accumulate rules in all .gitignores on the way of a branch history down the HEAD? Does the .gitignores in a commit+1 has application as a filter to the sourcetree in a commit+2 or only to the sourcetree in the same commit?

Let me repeat: It's a program which deletes files from history found in your current .gitignore rules.

The word current excludes historical .gitignore files. clean-ignore uses whatever .gitignore files happen to be in your working copy at the time it runs (even if they have uncommitted modifications). A side-effect of that is that clean-ignore will not work in a bare clone (unlike most invocations of filter-repo), because the bare clone won't have the .gitignore files checked out.

newren avatar Mar 29 '23 18:03 newren

I have install this script with pip which is the second best suggestion on google https://superuser.com/a/1589985/1648151

And look, I am not talking about clean-ignore or filter-repo being python. It can be anything that work. What I am pointing here is the demo of it https://github.com/newren/git-filter-repo/blob/main/contrib/filter-repo-demos/clean-ignore which should be the example of how to use this command are being python instead of being just a sample of command line we need to type and enter

I am still confusing about what is that folder. Is it the tool itself or the demo of how to use it. Why it named demos if it required to install it?

At the end. My main complaint is still the same. Normal expectation of command line tool is, the tool itself should contains all functionality and command by flags, not the external script which have difference name from the tools and also hidden in the github

somewhat rarely brought up or requested

That's OK, understandable, so please add me as another one voice requesting this feature, filter by gitignore as cli flag instead of external script

Thaina avatar Mar 30 '23 05:03 Thaina

...clean-ignore which should be the example of how to use this command

No, not even close. As noted above, this is just one possible use of filter-repo and is far from even being common, let alone "the example of how to use this command".

I am still confusing about what is that folder. Is it the tool itself or the demo of how to use it. Why it named demos if it required to install it?

The second paragraph of the project's README states:

While most users will probably just use filter-repo as a simple command line tool (and likely only use a few of its flags), at its core filter-repo contains a library for creating history rewriting tools. As such, users with specialized needs can leverage it to quickly create entirely new history rewriting tools.

In fact, filter-repo started life solely as a library for creating history rewriting tools, and had no command line program that you could run directly. You seem to want it to only be a tool which does history rewriting, as opposed to also being a library for creating more general history rewriting tools. I understand that desire of yours, but the library aspect was first because it was my most important priority. It may not be the most important priority anymore, but it is definitely still a high priority. And, as such, the project needs to have examples of how to use it as a library. So, while I understand your request, I am denying it since it undercuts much higher priorities of mine. This functionality will remain available through a separate script.

At the end. My main complaint is still the same. Normal expectation of command line tool is, the tool itself should contains all functionality and command by flags, not the external script which have difference name from the tools and also hidden in the github

And here you make it really clear that you consider this just a command line tool, apropos of nothing, and in complete contravention of the documentation.

somewhat rarely brought up or requested

That's OK, understandable, so please add me as another one voice requesting this feature, filter by gitignore as cli flag instead of external script

You misunderstood this part as well. Your request isn't "somewhat rarely brought up or requested"; I used that phrase to describe the capabilities in clean-ignore -- people only somewhat rarely want to clean up files found in a .gitignore file. Your specific request -- to make that functionality part of filter-repo instead of in a separate script -- hasn't been requested by anyone else ever to my knowledge, so "another one voice" doesn't make sense here.

newren avatar Jul 03 '24 06:07 newren

Anyway, I'll retitle based on the surprise that clean-ignore require extra arguments when it shouldn't.

...and I've now fixed this issue in commit 5c54a5332b8e (clean-ignore: do not require additional arguments to be passed, 2024-07-02), so I'll close this issue out.

newren avatar Jul 03 '24 06:07 newren