glob
glob copied to clipboard
Add caret negation (`[^...]`) in addition to exclamation mark
Closes #116.
This feature has some interesting history. Let's start with the source of everything. Here's POSIX:
If an open bracket introduces a bracket expression as in XBD RE Bracket Expression, except that the
character ( '!' ) shall replace the character ( '^' ) in its role in a non-matching list in the regular expression notation, it shall introduce a pattern bracket expression. A bracket expression starting with an unquoted character produces unspecified results . Otherwise, '[' shall match the character itself.
(emphasis mine)
The glob(7)
man page supports this:
Now that regular expressions have bracket expressions where the negation is indicated by a
^
, POSIX has declared the effect of a wildcard pattern[^...]
to be undefined.
This seems to have had the effect that many implementations have opted to support both [!...]
and [^...]
. This leaves us in a situation where this crate is correct, but sometimes leads to unexpected behaviour for users expecting [^...]
to work. This happened in uutils and could become an issue for other projects as well, such as sudo-rs
(see https://github.com/memorysafety/sudo-rs/issues/834). Fixing this issue with a workaround is quite tricky (see e.g. https://github.com/uutils/coreutils/issues/5584).
Now for a fun round of "What do the others do?":
Python: only `[!...]` is negation
from fnmatch import fnmatch
print(fnmatch("b", "[!a]"))
print(fnmatch("a", "[^a]"))
prints
True
False
libc: both work
#include <stdio.h>
#include <fnmatch.h>
int main() {
int res1 = fnmatch("[^a]", "b", 0);
int res2 = fnmatch("[!a]", "b", 0);
printf("Result: %d, %d", res1, res2);
return 0;
}
Result: 0, 0
sudo's custom fnmatch: both work
See this relevant line in the code: https://github.com/sudo-project/sudo/blob/b6175b78ad1c4c9535cad48cb76addf53352a28f/lib/util/fnmatch.c#L174
npm `glob` package: both work
Documented here: https://www.npmjs.com/package/glob#glob-primer
ruby: both work (but `^` is emphasized in the docs)
puts File::fnmatch("[^a]", "b")
puts File::fnmatch("[!a]", "b")
prints
true
true
Java: only `[!...]` is negation
import java.nio.file.FileSystems;
import java.nio.file.PathMatcher;
import java.nio.file.Paths;
public class Main {
public static void main(String[] args) {
PathMatcher x = FileSystems.getDefault().getPathMatcher("glob:[^a]");
System.out.println(x.matches(Paths.get("b")));
PathMatcher y = FileSystems.getDefault().getPathMatcher("glob:[!a]");
System.out.println(y.matches(Paths.get("b")));
}
}
prints
false
true
Downsides and alternatives
The major downside of this is that it is a breaking change and people might be relying on the current behaviour, but that is difficult to tell. It might also be that people have unknowingly written incorrect patterns using ^
, because that's what they expect from regex. Maybe we can estimate that with a GitHub code search.
So, there is a case to be made that this should belong in a fork. The reason that I believe that this change belongs here is that I think it helps all the projects that re-implement existing C code into Rust and removes an edge case.
Another option is to make the parsing of patterns in this crate configurable, where the default behaviour stays the same. This could potentially also be extended to allow for escape sequences and other features. Again, though, that comes with downsides. Mostly because all combinations of options would have to be tested and maintained. Given that this crate is a cornerstone in the Rust ecosystem, that combinatorical explosion might lead to (security) issues in various projects.
Relevant GitHub Code Searches
I did a couple searches for this pattern on GitHub to estimate the impact of this change. All searches are with lang:rust
:
-
/glob\(\"[^"]*\[\^/
: no relevant results, all hits are with custom glob implementations -
/Pattern::new\(\"[^"]*\[\^/
: no relevant results, mostly matches non-glob type of patterns.