sh
sh copied to clipboard
cmd/shfmt: support null byte separators in --list and --find
@msanders
https://github.com/mvdan/sh/issues/288#issuecomment-579582229
It might make sense to add a -z or -0 flag for shfmt -f to use safely with xargs -0. (See e.g. find -print0, git ls-files -z, grep --null, etc.)
How can I find and safely handle file names containing newlines, spaces or both?
Also for shfmt -l.
Adding --find0 and --list0 would not lead to ambiguous/erroneous cases like shfmt -w -0 ..
It also mirrors find's -print/-print0 and -fprint /fprint0.
Thanks. How often does this realistically come up?
If we add this, I would rather not add new flags. We can tweak --find and --list so that they also accept an optional zero parameter like --find=0 and --list=0. I adapted an existing boolean flag to take optional string values too in a different project, you're free to copy paste the idea:
https://github.com/mvdan/xurls/blob/c5b6e994e5db07c3f691bbb8c044b34dd20720f8/cmd/xurls/main.go#L36-L44
Thanks. How often does this realistically come up?
Every time one wants to use --find/--list with xargs or any other tool like sort.
-l[=0], --list[=0] list files whose formatting differs from shfmt's;
paths are separated by a newline or a null character if -l=0
(corresponding to the -0 option of xargs)
-f[=0], --find[=0] recursively find all shell files and print the paths;
paths are separated by a newline or a null character if -l=0
(corresponding to the -0 option of xargs)
⇑
Your preference?
I am not sure, if that is more readable/discoverable than:
-l, --list list files whose formatting differs from shfmt's;
paths are separated by a newline
-l0, --list0 list files whose formatting differs from shfmt's;
paths are separated by a null character
corresponding to the -0 option of xargs
-f, --find recursively find all shell files and print the paths;
paths are separated by a newline
-f0, --find0 recursively find all shell files and print the paths;
paths are separated by a null character
corresponding to the -0 option of xargs
-l, --list list files whose formatting differs from shfmt's;
paths are separated by a newline
-f, --find recursively find all shell files and print the paths;
paths are separated by a newline
-z \0 line termination on -l and -f output
Precedent
https://www.man7.org/linux/man-pages/man1/find.1.html
If no expression is given, the expression -print is used (but you should probably consider using -print0 instead, anyway).
https://git-scm.com/docs/git-ls-files#Documentation/git-ls-files.txt--z
-z \0 line termination on output and do not quote filenames. See OUTPUT below for more information.
https://www.gnu.org/software/sed/manual/sed.html
-z Treat the input as a set of lines, each terminated by a zero byte (the ASCII ‘NUL’ character) instead of a newline. This option can be used with commands like ‘sort -z’ and ‘find -print0’ to process arbitrary file names.
Every time one wants to use
--find/--listwithxargsor any other tool likesort.
You only need to null-separate arguments when they could contain newlines though. I have never in my life encountered any filenames containing newlines, so I would assume that shfmt -find . | xargs whatever works just fine with newline separators. I still don't oppose the feature, for the sake of better compatibility with other tools.
I am not sure, if that is more readable/discoverable than:
Yes, I would rather add the optional value to the two flags rather than duplicating the flags or adding a new flag which can only be combined with either of the other two. You could drop the mention of xargs in the help text for brevity.
Note that the -z flag you mention from other tools affects the entire tool. Here, our -z flag would not work on its own, so it would be a rather awkward flag.
You only need to null-separate arguments when they could contain newlines though.
Maybe you wanted to say “spaces”?
I have never in my life encountered any filenames containing newlines,
That is because the file systems I know of do not allow it.
File names with spaces are quite common on the other hand.
macOS: ~/Library/Application Support
Ah of course. For some reason I misremembered xargs to only split on newlines, but it does split on spaces too.
As far as I can tell the flag package does not differentiate between
--flag=x and --flag x.
If you have a String value then you must have a value, i.e. --flag is not allowed anymore.
This is important because if l is String instead of Bool:
shfmt -l . is the same as shfmt -l=.
type stringFlag struct {
set bool
val string
}
func (sf *stringFlag) Set(s string) error {
sf.val = s
// TODO validate s
sf.set = true
return nil
}
func (sf *stringFlag) String() string {
return sf.val
}
list = &multiFlag[stringFlag]{"l", "list", stringFlag{false, ""}}
case *multiFlag[stringFlag]:
if name := f.short; name != "" {
flag.StringVar(&f.val.val, name, f.val.val, "")
}
if name := f.long; name != "" {
flag.StringVar(&f.val.val, name, f.val.val, "")
}
With the above we would have list.val.val = "." for shfmt -l .
See the snippet I shared earlier:
https://github.com/mvdan/xurls/blob/c5b6e994e5db07c3f691bbb8c044b34dd20720f8/cmd/xurls/main.go#L36-L44
In particular the presence of IsBoolFlag.
Adding
func (*stringFlag) IsBoolFlag() bool { return true }
does not change a thing.
shfmt -l . ⇒ list.val.val = "."
Because you're using flag.StringVar rather than flag.Var; that takes Value, which is where IsBoolFlag may be present.
If you get stuck it's not a problem, I'll get to it at some point.
It have it working now, but one side effect is:
./shfmt2 -l=true . is the same as ./shfmt2 -l ..
Should that be documented?
That's already the case with all boolean flags in Go:
$ shfmt --space-redirects <<<'foo >bar'
foo > bar
$ shfmt --space-redirects=true <<<'foo >bar'
foo > bar
It's just not a very well known fact because it's not very useful in general.
Do you want just one boolFlag or one listFlag and one findFlag.
Considering now both have the same allowed values, but that might change in the future.
func (s * boolFlag) Set(val string) error {
if val != "true" && val != "0" {
return fmt.Errorf("only \"0\" supported\n")
}
s.val = val
s.set = true
return nil
}
list = &multiFlag[boolFlag]{"l", "list", boolFlag{false, ""}}
find = &multiFlag[boolFlag]{"f", "find", boolFlag{false, ""}}
vs.
list = &multiFlag[listFlag]{"l", "list", listFlag{false, ""}}
find = &multiFlag[findFlag]{"f", "find", findFlag{false, ""}}
Share the same type. FWIW a single string value is still enough as long as you set the default to "false" - then the only allowed defaults can be "false", "true", and "0". Note that -list calls Set("true") as it's a short alias for -list=true.
I have updated the PR.
Verified:
$ shfmt --version
3.11.0
$ mkdir -p /tmp/test/node_modules && cd $_/..
$ echo 'echo "no space"' >test.sh
$ echo 'echo "with space"' >a\ test.sh
$ chmod -R u+x *.sh
$ shfmt -f=0 . | xargs -0 realpath
/private/tmp/test/a test.sh
/private/tmp/test/test.sh
$ shfmt -f=0 . | xargs -0 cat
echo "with space"
echo "no space"