sh icon indicating copy to clipboard operation
sh copied to clipboard

syntax: ambiguity when parsing globs in indexed array elements

Open averms opened this issue 5 years ago • 9 comments

When trying to format the following file:

#!/bin/bash
files=([^_]*.txt)

shfmt fails with test.sh:2:9: ^ must follow an expression. I'm not sure what that means. The manual describes the syntax for pattern matching (globbing) here.

averms avatar May 03 '20 03:05 averms

Thanks for the report! This is probably an edge case in the lexer that we missed.

mvdan avatar May 03 '20 21:05 mvdan

Ah, the parser is getting confused because it sees name=([ and it thinks it's parsing name=([index]=value).

Unfortunately, arrays is one of the big areas where Bash syntax is ambiguous, and it can't be parsed statically without backtracking. See the mention of array indexes in https://github.com/mvdan/sh#caveats, for example.

I'm not sure what to do here. I definitely don't want to add backtracking. Parsing arrays is just a nightmare in general.

mvdan avatar May 03 '20 21:05 mvdan

I have a similar issue and simplified it to this:

P=(*.tx[t])

gives an array of files (*.txt), but shfmt says:

"[x]" must be followed by =

Here, this isn't an index, but a file pattern.

sotho avatar Jun 29 '20 17:06 sotho

Maybe the "solution" sections from https://www.oilshell.org/blog/2016/10/20.html can give a solution? It says

Fortunately, there's a simple solution: always quote string literals used as array keys. The parser will then know that unquoted strings are variables. That is:

array[mystring]=x    # BAD: is it a string or a variable?
array['mystring']=x  # GOOD: it's a string

We can extend that to defining an array: if the characters inside the brackets aren't quoted they are a glob, otherwise they are treated as an index. Does this work?

averms avatar Oct 03 '20 19:10 averms

https://github.com/mvdan/sh#caveats already makes the same suggestion that oilshell makes, and I think it's generally sensible to write new shell code in a way that isn't ambiguous.

When it comes to files=([^_]*.txt), I'm not sure there's a way to truly make it non-ambiguous. A glob expression can contain quote characters just fine. I think this is one of the cases that will only be fixed by the parser supporting ambiguous input. Other people have brought up similar issues due to the lack of handling ambiguity, so I'm currently leaning towards supporting it. I'll let you know if/when that issue with a proposed implementation is created.

mvdan avatar Oct 04 '20 19:10 mvdan

Other people have brought up similar issues due to the lack of handling ambiguity, so I'm currently leaning towards supporting it. I'll let you know if/when that issue with a proposed implementation is created.

I collected my thoughts into a new issue here: https://github.com/mvdan/sh/issues/686

We can leave this issue open as it's the problem statement, not the proposed solution. But it can't be fixed until the new issue is implemented, I think.

mvdan avatar Mar 17 '21 15:03 mvdan

I'm running in to what I think is the same issue:

lib/helpers.bash: lib/helpers.bash:615:34: [ must follow a name

EDIT: the note on your GitHub sponsorship page says suggestions are welcome, I'd like to contribute towards this refactor for using [ in arrays! (I'm trying to use file globbing to populate an array.) ♥

gaelicWizard avatar Jan 01 '22 04:01 gaelicWizard

@gaelicWizard thanks for the nudge and for the support :) You actually made me notice that we had a silly regression when printing errors in shfmt. I've sent https://github.com/mvdan/sh/pull/786 if you would like to give a review.

I can't seem to find what file is giving you that error. Could you please post a perma-link to one such file, or paste the bit of shell here?

mvdan avatar Jan 01 '22 20:01 mvdan

@mvdan, here's the line it's throwing errors on:

plugins=("${BASH_IT}/enabled"/[[:digit:]][[:digit:]][[:digit:]]"${BASH_IT_LOAD_PRIORITY_SEPARATOR}${file_entity}.${suffix}.bash" "${BASH_IT}/$subdirectory/enabled/"{[[:digit:]][[:digit:]][[:digit:]]"${BASH_IT_LOAD_PRIORITY_SEPARATOR}${file_entity}.${suffix}.bash","${file_entity}.${suffix}.bash"})

Alsö throws errors on this variation:

plugins=("${BASH_IT}/enabled"/[0-9][0-9][0-9]"${BASH_IT_LOAD_PRIORITY_SEPARATOR}${file_entity}.${suffix}.bash" "${BASH_IT}/$subdirectory/enabled/"{[0-9][0-9][0-9]"${BASH_IT_LOAD_PRIORITY_SEPARATOR}${file_entity}.${suffix}.bash","${file_entity}.${suffix}.bash"})

Currently, I'm just using * in bash-it/bash-it#1934 to work around it.

Thanks!!

gaelicWizard avatar Jan 01 '22 21:01 gaelicWizard