git-filter-repo icon indicating copy to clipboard operation
git-filter-repo copied to clipboard

Non-standard glob pattern behavior with `*`

Open racerpeter opened this issue 4 months ago • 0 comments

Hi there, when using git-filter-repo recently (git version 2.49.0, git-filter-repo version a40bce548d2c), I learned "the hard way" that the glob syntax does not match what you'd expect from a glob pattern containing a *. Specifically, the path segment boundary is not respected.

For example, given the following directory structure:

a/
  b/
    c/
      d.sql
      e.sql
    not_deleted.txt
    y.sql

The following glob patterns in a paths file (paths.txt):

glob:a/b/*.sql

And the following command:

git filter-repo --sensitive-data-removal --invert-paths --paths-from-file paths.txt

With typical glob syntax, one would expect that y.sql would be deleted and nothing else (because * does not match on path separators). However, the entire a/b/c directory is also deleted.

On further RTFM-ing of the man page, I discovered a note in the examples describing this as an expected behavior.

It appears that git-filter-repo may actually be using Unix fnmatch syntax rather than glob syntax -- where * in fnmatch will match any character (including path separators).

Short of changing the API to git-filter-repo by renaming path-glob to path-fnmatch, I would suggest making this fact far more prominent in the docs (for example, by mentioning it in the options section of the man page, and in any corresponding --help output) to avoid confusion by people who assume that they are dealing with a standard glob matcher.

Finally, thanks so much for creating this tool -- it's the best at what it does!

racerpeter avatar Aug 19 '25 21:08 racerpeter