busybox-w32
busybox-w32 copied to clipboard
grep misbehaves and only correctly handles asterisks `*` in pipes from the busybox builtin shell
When I use the grep command in busybox in powershell or cmd to process the piped input data, If the string to be matched contains an asterisk, grep will expand it as a glob, with or without the -F
or -e
flag.
- busybox version
~ busybox
BusyBox v1.36.0-FRP-4621-gf3c5e8bc3 (2022-02-28 07:17:58 GMT)
(mingw64-gcc 11.2.1-3.fc35; mingw64-crt 9.0.0-2.fc35; glob)
If you add a backslash in front of the asterisk, grep will try to traverse the files in the root directory of the current disk, otherwise it will traverse the current working directory. If the current working directory is empty, the command can be successfully executed.
~\test > gc *|busybox grep -v *
1234567
*
890
~\test gc *|busybox grep *
~\test
~\test cat *|busybox grep *
~\test cat *|busybox grep \*
grep: \$WinREAgent: Permission denied
grep: \Documents and Settings: Permission denied
grep: \DumpStack.log.tmp: Permission denied
grep: \hiberfil.sys: Permission denied
grep: \Intel: Permission denied
...
~\test
~\test echo *|busybox grep *
~\test > rm *
~\test echo *|busybox grep *
*
~\test
However, if the pipeline between the two commands is handed over to the ash of busybox, grep behaves fine.
~\test busybox sh -c 'cat *'
1234567
*
890
~\test busybox sh -c 'cat *|grep *'
~\test busybox sh -c 'cat *|grep \*'
*
I used another standalone grep in powershell and cmd and it correctly handles asterisks in pattern.
- Below is the working grep version:
~ scoop info grep
Name : grep
Description : Print lines matching a pattern.
Version : 3.7
Bucket : main
Website : https://www.gnu.org/software/grep
License : GPL-3.0-or-later
Updated at : 2021/12/27 15:20:57
Updated by : Rashil Gandhi
Installed : 3.7
Binaries : grep.exe | egrep.exe | fgrep.exe
~\test > busybox cat *|grep *
*
~\test busybox cat *|grep -v *
1234567
890
~\test busybox cat *|grep -v \*
1234567
890
~\test
Here are some differences in their behavior when dealing with asterisks. The version in busybox will try to expand the asterisk into a path no matter where the asterisk is, while the other standalone version of grep doesn't treat the asterisk as a glob at all (Of course, glob can still be used when recursively traversing files to find) .
~\test grep * *.touch
grep-3.7-x64.exe: *.touch: Invalid argument
~\test grep * test.touch
*
~\test grep * *
grep-3.7-x64.exe: *: Invalid argument
~\test busybox grep * *
~\test busybox grep 123 *
1234567
~\test
~\test busybox grep \* *
grep: \$WinREAgent: Permission denied
grep: \Documents and Settings: Permission denied
grep: \DumpStack.log.tmp: Permission denied
...
Especially about the -f
flag, the behavior of busybox grep
is particularly confusing.
~\test echo * 123|grep *
*
~\test > echo "* 123"|grep *
* 123
~\test > echo * 123|busybox grep *
~\test echo * 123|busybox grep -f *
*
~\test > echo "* 123"|busybox grep -f *
* 123
~\test > cat *|busybox grep -f *
1234567
*
890
~\test cat *|grep *
So as a result both versions produce unexpected behavior, but the behavior of the busybox version is more impactful. The PATTERNS
parameter of the grep command is required, and the [FILE]
path is optional. Obviously the former should not be expanded into a path (without -f FILE Read pattern from file
flag) , and I don't even know what the PATTERNS
or FILE
is set to in this case.
Thank you for the very detailed report.
The point at issue here is a difference in the way command line arguments are handled in Unix shells and Windows cmd and PowerShell:
- In Unix the shell is responsible for expanding wildcards. The shell also provides mechanisms to allow wildcards to be escaped so globbing doesn't happen.
- Windows cmd and PowerShell don't expand wildcards. Instead they're passed to the application unmodified. Because cmd or PowerShell don't expand wildcards they also don't provide a way to strip them of their special meaning. Quoting is used to allow spaces within arguments. Backslash is a path separator, not a way to escape wildcards.
Each console application in Windows is responsible for expanding wildcards in its arguments. Mostly they use the standard expansions provided by the C runtime. This is a good thing, as it would be very confusing if applications interpreted wildcards differently.
Some applications seem to make a special effort to allow literal wildcards in certain cases. I expect this is what your other grep is doing. It's also been reported that zip does the same.
busybox-w32 is in an awkward position. It's a standard Windows console application and might be expected to handle wildcards as such. However, there are circumstances where doing that has resulted in incorrect behaviour of the shell.
For several years I provided two binaries: one with Windows' globbing enabled and one without. More recently this was changed such that when busybox.exe is run from cmd or PowerShell it uses Windows globbing but any child processes (such as children of a shell) don't.
This covers the most common use cases but it doesn't handle cases where some wildcards need to be expanded and some don't. In a Unix shell the user can choose which arguments are expanded and which aren't by escaping them appropriately, but this isn't possible from cmd or PowerShell.
The problem isn't limited to grep: it applies to any BusyBox applet which sometimes requires literal wildcard characters. I haven't been able to come up with a good general solution to this problem. What we currently have is an imperfect compromise.
Thank you for your very attentive answer, by reading the documentation link you provided and looking at related topics, now I know that it is a common problem on mingw, and the environment variable BB_GLOBBING can control its behavior to a certain extent, this is very helpful.
Each console application in Windows is responsible for expanding wildcards in its arguments. Mostly they use the standard expansions provided by the C runtime. This is a good thing, as it would be very confusing if applications interpreted wildcards differently.
Windows doesn't care whether the user uses backslashes or forward slashes in the path, and the file or directory name cannot contain symbols such as \/:*?<>|
, so it should be safe to use backslashes to escape wildcards, we can still use forward slashes to glob all the paths we want. But if this is to be done, it should be done by the mingw project, I totally agree about the app not implementing its own globbing logic is a good thing.