composure
composure copied to clipboard
metafor's "sed-fu" seems overly complicated. Triggers bug in busybox with musl. Can it be simplified?
The buggy behavior, which only happens in a busybox sed which uses musl's libc (take a look in alpine:latest
, with apk add sed
), is that the leading quotation marks are not removed (this is counter to the POSIX regex spec):
→ keyword=hello
→ LINE="$keyword 'foobar'" # a trivial use with quotes
→ echo $LINE | busybox sed -n "/$keyword / s/['\";]*\$//;s/^[ ]*\(: _\)*$keyword ['\"]*\([^([].*\)*\$/\2/p" # musl bug
'foobar
→ echo $LINE | sed -n "/$keyword / s/['\";]*\$//;s/^[ ]*\(: _\)*$keyword ['\"]*\([^([].*\)*\$/\2/p"
foobar
I'd like to suggest an edit to this sed-fu, but I'm not certain which parts of this functionality are intentional, and which are accidental. The guiding principle seems to be "this should work a little like echo, but we definitely do not want to eval
"
The behavior of the current sed-fu appears to be:
- Search for the keyword (we will only print lines that match it)
- The
keyword
must be the first word on the line, but it can be: _keyword
because of Allow functions with metadata to run before load- The keyword and the whitespace preceding it are removed
- All leading and trailing quotes on the argument are removed (even if they are of different lengths [0], or are not symmetrical [1], though we'd expect the outer pair to match in normal usage)
- A trailing semicolon is removed, whether or not [2] it is within the quotes
([^([].*\)*
looks like it's trying to do something like exclude open braces, but all that does is start the match earlier [3]- Attempting to quote two different arguments is interpreted as allowed internal quotes [4]
- The character immediately following the keyword must be exactly a space. Tabs are disallowed. (
typeset
probably normalizes this so it wouldn't matter)
→ LINE="$keyword 'foobar;\"''\""; !echo # Allowed: uneven[0], mismatched[1] quotes and a semicolon within the quotes [2]
foobar
→ LINE="$keyword '(foobar;'"; !echo # Odd: A leading paren in the quotes [3], semicolon within the quotes [2]
'(foobar
→ LINE="$keyword 'foo' 'bar'"; !echo # Unsupported: two different quoted arguments [4]
foo' 'bar
I'm treading on Chesterton's fence, which prohibits me from removing these unless I know what they're for. But several of these don't look like they do anything for us. If they don't, I'd propose a shorter sed script:
sed -En "s/['\"]?;?\$//g; s/^\s*(:\s+_)?${keyword}\s+['\"]?//p"
This sed invocation does not break in musl, but it also makes several opinionated changes. I don't want to pass over them without calling them out:
- Only one quote is removed from the start and end - quotes are still allowed to mismatch in kind and number, but that number is only zero or one.
- The trailing semicolon is not removed if it's within the quotation marks
- Whatever
([^([].*\)*
is trying to do; it doesn't - Whitespace is more flexible - no constraints on spaces vs. tabs or how many of them there are (beyond syntax requirements). This can harm readability where tabs are never expected,
→ keyword=hello
→ LINE="$keyword 'foobar'" # a trivial use with quotes
→ echo $LINE | busybox sed -n "/$keyword / s/['\";]*\$//;s/^[ ]*\(: _\)*$keyword ['\"]*\([^([].*\)*\$/\2/p" # musl bug
'foobar
→ echo $LINE | busybox sed -En "s/['\"]?;?\$//g; s/^\s*(:\s+_)?${keyword}\s+['\"]?//p" # worked around
foobar
Do these changes seem like improvements, or do they miss the mark?