script
script copied to clipboard
Should `ExecForEach` sanitize or escape arguments differently?
After a recent conversation about script
on Lobste.rs, I'm concerned about how ExecForEach
handles arguments (especially filenames?) with tricky characters (e.g., spaces, single quotes, and double quotes).
The current example for ExecForEach
is ListFiles("*").ExecForEach("touch {{.}}").Wait()
. I'll take that as a baseline, and start with the following minimal script:
package main
import "github.com/bitfield/script"
func main() {
script.ListFiles("*").ExecForEach("touch {{.}}").Stdout()
}
Imagine that lives in a folder with the following files:
$ ls -l
total 24
-rw-r--r-- 1 telemachus wheel 0 Aug 21 07:14 Bobby" Tables"
-rw-r--r-- 1 telemachus wheel 0 Aug 21 07:14 Bobby' Tables'
-rw-r--r-- 1 telemachus wheel 210 Aug 21 07:17 go.mod
-rw-r--r-- 1 telemachus wheel 3221 Aug 21 07:17 go.sum
-rw-r--r-- 1 telemachus wheel 0 Aug 21 07:19 hello world
-rw-r--r-- 1 telemachus wheel 125 Aug 21 07:16 main.go
Here's the result of running the script:
$ ls -l
total 24
-rw-r--r-- 1 telemachus wheel 0 Aug 21 09:32 Bobby Tables
-rw-r--r-- 1 telemachus wheel 0 Aug 21 07:14 Bobby" Tables"
-rw-r--r-- 1 telemachus wheel 0 Aug 21 07:14 Bobby' Tables'
-rw-r--r-- 1 telemachus wheel 210 Aug 21 09:32 go.mod
-rw-r--r-- 1 telemachus wheel 3221 Aug 21 09:32 go.sum
-rw-r--r-- 1 telemachus wheel 0 Aug 21 09:32 hello
-rw-r--r-- 1 telemachus wheel 0 Aug 21 07:19 hello world
-rw-r--r-- 1 telemachus wheel 124 Aug 21 09:32 main.go
-rw-r--r-- 1 telemachus wheel 0 Aug 21 09:32 world
I would expect (hope?) that script
would sanitize the filenames, so that touch
ran once with each filename (including the ones with tricky characters).
Instead, here are some things that go wrong (I think).
-
touch
seems to receive the argumentBobby Tables
twice. In a way, this is a good result: a single name with a space! On the other hand, all quotes seem to be stripped away from the two original filenames, one of which has single quotes and one of which has double quotes. If you change the code toscript.ListFiles("*").ExecForEach("ls {{.}}").Stdout()
, you'll seels: Bobby Tables: No such file or directory
(without quotes of any kind) twice in the output. These results suggest that somethingscript
does (the templating, perhaps?) removes quotations from filenames. -
ExecForEach
seems to split the filenamehello world
into two arguments so that the program runstouch hello
andtouch world
.
Normally, in actual shell scripting, you could fix problems like these with proper quoting.
for file in *
do
touch "$file"
done
But I can't see any way to do the equivalent using script
. Ideally, script
itself would handle this for the user. (E.g., to protect users who don't know about such dangers from themselves.)
I noodled around with this a bit last night using https://bitbucket.org/creachadair/shell and https://github.com/mvdan/sh, but nothing I did helped. I'd be interested in working on this, but at the moment I can't see a way forward.
Thanks for putting so much thought into this, @telemachus—it's most helpful.
I think I understand what's going on here now. Let's take the spaces issue first. Something like this:
script.Echo("my file").ExecForEach("touch {{.}}").Wait()
doesn't work as expected, because we end up with two files: my
and file
. Well, this wouldn't work in the shell either, as we've discussed, so the solution would be to quote the filename part of the command like this:
script.Echo("my file").ExecForEach("touch \"{{.}}\"").Wait()
That takes care of the spaces issue, but what about files with quotes in their names?
script.Echo("my \"quoted\" file").ExecForEach("touch \"{{.}}\"").Wait()
This creates a file named my quoted file
, which isn't quite right: it should be literally my "quoted" file
. The Go template parser is eating the quotes. Again, the solution is to escape the quotes before they get interpolated:
script.Echo("my \\\"quoted\\\" file").ExecForEach("touch \"{{.}}\"").Wait()
But when the list of filenames comes from some arbitrary source, what can we do? This must be a problem whenever filenames are interpolated into a Go template, mustn't it?
Honestly, I'll confess this to you, but keep it to yourself: I don't much care for Go templates. I try not to use them if at all possible. However, if we can figure out the exact right way to rewrite arbitrary data in script
pipes such that it expands to the correct values in templates, I'd certainly consider adding it to the package.
So, to coin a phrase, PRs are welcome!
This problem has the same shape as the classic problem of HTML escaping: how do you make sure 1 < 2
turns into exactly 1 < 2
and not 1 &lt; 2
etc. The thorough solution would be to do what the Go html/template engine does: have a separate type for "safe" strings and turn raw strings into safe strings before final output. That seems like too much effort though. A hacky but easy solution is to add a function to the template that looks like this:
type SafeString string
func quote(s any) SafeString {
switch v := s.(type) {
case SafeString:
return v
case string:
return SafeString(shell.Quote(v))
default:
panic("bad type!")
}
}
Then you use it on the client side like .ExecForEach(`touch {{.|quote}}`
.
You mean this one ?
https://github.com/google/safehtml
It sounds like we have a plan!
is the plan for someone to write up a PR with (something like?) Carl's quote helper?
That certainly seems like it would be a good value-add, doesn't it? You've constructed the failing test case, @carlmjohnson has pointed out what the solution should look like, and @gedw99 has identified that the necessary code already exists. Between us, I think we've got this!
If so, should the docs recommend that people always use that when they feed arguments to external commands? (I think that means that people should always use the helper when they use ExecForEach, but I am not 100% sure about that.)
Yes, at first I thought we should simply always quote-sanitise input data to the template, but that's not quite good enough, because we don't know that the data contains filenames, for example. Therefore, it needs to be up to the user to call this function in their template if they're interpolating filenames.
Adding this to the documentation should also appease the wrath of @matklad:
What I do find objectionable is not mentioning this in the docs for a function!
Some gopherology by an ex-crustacean:
It feels like there should be a way to configure default escaping rule for Template, because that's what you'll just need for more or less any use-case. And, like, there is html escaping, it should work somehow?
And, look, indeed, Templae exports the underlying parse tree:
https://pkg.go.dev/text/template#Template
type Template struct {
*parse.Tree
// contains filtered or unexported fields
}
So I guess that's the way! You could walk the parse tree manually, applying your custom escaping rules.
Now, how do I actually use that?
Template is the representation of a parsed template. The *parse.Tree field is exported only for use by html/template and should be treated as unexported by all other clients.
Oh...
:-)
In the current version of Go, if you put a package inside a directory called internal, it becomes unimportable by external packages. The Go team has said the parse tree for templates would have been in internal if internal had existed when it was created, but they don't want to move it now. Cf. https://github.com/golang/go/issues/25357
Two things, and please just ignore me if I'm going overboard with the terseness here, but in this use case of "replacing shell" instead of "writing a rather large program" I'm left wondering why {{.}}
(no matter how default to what templating it may be) feels so verbose, and isn't just {}
or %s
. I mean, this is not supposed to be verbatim Go, you're putting the input "strings-as-in-cli-args" through your own functions.
Second thing, yes find ... -print0
and xargs -0
is kinda clunky, but see above, if you're wrapping this up in the lib, wouldn't that be a possibility to implicitly use something like that?
@winks, I agree that Go template syntax isn't easy to love. But my thinking is that it's better to use something standard, perfect or not, than to invent my own way of doing this.
Second thing, yes find ... -print0 and xargs -0 is kinda clunky, but see above, if you're wrapping this up in the lib, wouldn't that be a possibility to implicitly use something like that?
Can you elaborate on this point a little bit? I'm not sure I understand what you're referring to.