sh icon indicating copy to clipboard operation
sh copied to clipboard

cmd/shfmt: support null byte separators in --list and --find

Open sdavids opened this issue 1 year ago • 15 comments

@msanders

https://github.com/mvdan/sh/issues/288#issuecomment-579582229

It might make sense to add a -z or -0 flag for shfmt -f to use safely with xargs -0. (See e.g. find -print0, git ls-files -z, grep --null, etc.)

How can I find and safely handle file names containing newlines, spaces or both?

Also for shfmt -l.


Adding --find0 and --list0 would not lead to ambiguous/erroneous cases like shfmt -w -0 ..

It also mirrors find's -print/-print0 and -fprint /fprint0.

sdavids avatar Sep 20 '24 13:09 sdavids

Thanks. How often does this realistically come up?

If we add this, I would rather not add new flags. We can tweak --find and --list so that they also accept an optional zero parameter like --find=0 and --list=0. I adapted an existing boolean flag to take optional string values too in a different project, you're free to copy paste the idea:

https://github.com/mvdan/xurls/blob/c5b6e994e5db07c3f691bbb8c044b34dd20720f8/cmd/xurls/main.go#L36-L44

mvdan avatar Sep 26 '24 10:09 mvdan

Thanks. How often does this realistically come up?

Every time one wants to use --find/--list with xargs or any other tool like sort.

sdavids avatar Sep 26 '24 11:09 sdavids

-l[=0], --list[=0]  list files whose formatting differs from shfmt's;
                    paths are separated by a newline or a null character if -l=0
                    (corresponding to the -0 option of xargs)
-f[=0], --find[=0]  recursively find all shell files and print the paths;
                    paths are separated by a newline or a null character if -l=0
                    (corresponding to the -0 option of xargs)

Your preference?

I am not sure, if that is more readable/discoverable than:

-l,  --list    list files whose formatting differs from shfmt's;
               paths are separated by a newline
-l0, --list0   list files whose formatting differs from shfmt's;
               paths are separated by a null character
               corresponding to the -0 option of xargs
-f, --find     recursively find all shell files and print the paths;
               paths are separated by a newline
-f0, --find0   recursively find all shell files and print the paths;
               paths are separated by a null character
               corresponding to the -0 option of xargs
-l, --list     list files whose formatting differs from shfmt's;
               paths are separated by a newline
-f, --find     recursively find all shell files and print the paths;
               paths are separated by a newline
-z             \0 line termination on -l and -f output

Precedent

https://www.man7.org/linux/man-pages/man1/find.1.html

If no expression is given, the expression -print is used (but you should probably consider using -print0 instead, anyway).

https://git-scm.com/docs/git-ls-files#Documentation/git-ls-files.txt--z

-z \0 line termination on output and do not quote filenames. See OUTPUT below for more information.

https://www.gnu.org/software/sed/manual/sed.html

-z Treat the input as a set of lines, each terminated by a zero byte (the ASCII ‘NUL’ character) instead of a newline. This option can be used with commands like ‘sort -z’ and ‘find -print0’ to process arbitrary file names.

sdavids avatar Sep 26 '24 11:09 sdavids

Every time one wants to use --find/--list with xargs or any other tool like sort.

You only need to null-separate arguments when they could contain newlines though. I have never in my life encountered any filenames containing newlines, so I would assume that shfmt -find . | xargs whatever works just fine with newline separators. I still don't oppose the feature, for the sake of better compatibility with other tools.

I am not sure, if that is more readable/discoverable than:

Yes, I would rather add the optional value to the two flags rather than duplicating the flags or adding a new flag which can only be combined with either of the other two. You could drop the mention of xargs in the help text for brevity.

Note that the -z flag you mention from other tools affects the entire tool. Here, our -z flag would not work on its own, so it would be a rather awkward flag.

mvdan avatar Sep 29 '24 08:09 mvdan

You only need to null-separate arguments when they could contain newlines though.

Maybe you wanted to say “spaces”?

I have never in my life encountered any filenames containing newlines,

That is because the file systems I know of do not allow it.


File names with spaces are quite common on the other hand.

macOS: ~/Library/Application Support

sdavids avatar Sep 29 '24 09:09 sdavids

Ah of course. For some reason I misremembered xargs to only split on newlines, but it does split on spaces too.

mvdan avatar Sep 29 '24 11:09 mvdan

As far as I can tell the flag package does not differentiate between

--flag=x and --flag x.

If you have a String value then you must have a value, i.e. --flag is not allowed anymore.

This is important because if l is String instead of Bool:

shfmt -l . is the same as shfmt -l=.


type stringFlag struct {
	set bool
	val string
}

func (sf *stringFlag) Set(s string) error {
	sf.val = s
        // TODO validate s
	sf.set = true
	return nil
}

func (sf *stringFlag) String() string {
	return sf.val
}

list        = &multiFlag[stringFlag]{"l", "list", stringFlag{false, ""}}


		case *multiFlag[stringFlag]:
			if name := f.short; name != "" {
				flag.StringVar(&f.val.val, name, f.val.val, "")
			}
			if name := f.long; name != "" {
				flag.StringVar(&f.val.val, name, f.val.val, "")
			}

With the above we would have list.val.val = "." for shfmt -l .

sdavids avatar Oct 01 '24 13:10 sdavids

See the snippet I shared earlier:

https://github.com/mvdan/xurls/blob/c5b6e994e5db07c3f691bbb8c044b34dd20720f8/cmd/xurls/main.go#L36-L44

In particular the presence of IsBoolFlag.

mvdan avatar Oct 01 '24 13:10 mvdan

Adding

func (*stringFlag) IsBoolFlag() bool { return true }

does not change a thing.

shfmt -l .list.val.val = "."

sdavids avatar Oct 01 '24 14:10 sdavids

Because you're using flag.StringVar rather than flag.Var; that takes Value, which is where IsBoolFlag may be present.

If you get stuck it's not a problem, I'll get to it at some point.

mvdan avatar Oct 01 '24 14:10 mvdan

It have it working now, but one side effect is:

./shfmt2 -l=true . is the same as ./shfmt2 -l ..

Should that be documented?

sdavids avatar Oct 01 '24 14:10 sdavids

That's already the case with all boolean flags in Go:

$ shfmt --space-redirects <<<'foo >bar'
foo > bar
$ shfmt --space-redirects=true <<<'foo >bar'
foo > bar

It's just not a very well known fact because it's not very useful in general.

mvdan avatar Oct 01 '24 14:10 mvdan

Do you want just one boolFlag or one listFlag and one findFlag.

Considering now both have the same allowed values, but that might change in the future.

func (s * boolFlag) Set(val string) error {
	if val != "true" && val != "0" {
		return fmt.Errorf("only \"0\" supported\n")
	}
	s.val = val
	s.set = true
	return nil
}

	list        = &multiFlag[boolFlag]{"l", "list", boolFlag{false, ""}}
        find        = &multiFlag[boolFlag]{"f", "find", boolFlag{false, ""}}

vs.

	list        = &multiFlag[listFlag]{"l", "list", listFlag{false, ""}}
        find        = &multiFlag[findFlag]{"f", "find", findFlag{false, ""}}

sdavids avatar Oct 01 '24 14:10 sdavids

Share the same type. FWIW a single string value is still enough as long as you set the default to "false" - then the only allowed defaults can be "false", "true", and "0". Note that -list calls Set("true") as it's a short alias for -list=true.

mvdan avatar Oct 01 '24 14:10 mvdan

I have updated the PR.

sdavids avatar Oct 01 '24 15:10 sdavids

Verified:

$  shfmt --version
3.11.0

$ mkdir -p /tmp/test/node_modules && cd $_/..
$ echo 'echo  "no space"' >test.sh
$ echo 'echo  "with space"' >a\ test.sh
$ chmod -R u+x *.sh

$ shfmt -f=0 . | xargs -0 realpath
/private/tmp/test/a test.sh
/private/tmp/test/test.sh
$ shfmt -f=0 . | xargs -0 cat
echo  "with space"
echo  "no space"

sdavids avatar Mar 18 '25 11:03 sdavids