bash-parser Status report

I completed all features discussed so far. From this point on, I think I only do bug fixes, source code improvement and refactors and add more tests.

Please open an issue if you find some feature is missing, otherwise, I plan to add specific bash extension to the grammar in a future release.

I wrote a document of implemented options and another one that document the AST.

I think I would refactor the AST node type names to be CamelCased, so they will become:

CompleteCommand
Pipeline
AndOr
SimpleCommand
Function
Name
CompoundList
Subshell
Case
CaseItem
If
While
Until
Word
AssignmentWord
ArithmeticExpansion
CommandExpansion
ParameterExpansion
IoRedirect

This should be in line with how typical js AST nodes are named.

Also, @wooorm suggested following changes:

and_or > LogicalOperator simple_command > BuiltIn io_redirect > Redirect complete_command > Root

If anyone has other suggestions, please let me know...

I also wrote an AST tree traverser to help implement the visitor pattern, and I think I'll write a web page to parse source code online and show the resulting AST tree.

Sep 05 '16 19:09 parro-it

any reason why all node types are upper-cased camelCase? E.g., why ParameterExpansion over parameterExpansion? I don’t really care but I was wondering if there’s a reason;
IoRedirect or IORedirect? Both are fine IMO, but don’t go funky like XMLHttpRequest;
and_or > LogicalOperator, maybe, if this is the only operator, drop that and go for Logical? Not sure though.

Other than that, 👍

Sep 05 '16 19:09 wooorm

any reason why all node types are upper-cased camelCase? E.g., why ParameterExpansion over parameterExpansion? I don’t really care but I was wondering if there’s a reason;

It better fit with what seems to be a de facto standard with js AST. Since I'm parsing arithmetic expression with Babylon parser, we would have a part of the AST that is PascalCase anyway.
We skip the problem of reserved words, If seems better to me than ifStatement etc.

IoRedirect or IORedirect? Both are fine IMO, but don’t go funky like XMLHttpRequest;

IORedirect seems better, but since it's the only kind of redirection, I would prefer Redirect as you suggested

and_or > LogicalOperator, maybe, if this is the only operator, drop that and go for Logical? Not sure though.

and_or is not representing an operator, but the whole logical expression, so LogicalExpression would be better?

complete_command > Root

That would be fine to me

simple_command > BuiltIn

simple_command does not necessary represent a builtin, but also any kind of command to invoke. E.g. ls /home would be represented by a simple_command node. So Command could be better according to me

Sep 06 '16 07:09 parro-it

OK looks good u agree with all your points!

Sep 06 '16 07:09 wooorm

I think I would refactor the AST node type names to be CamelCased

I find that good, since they are objects (classes) on the AST node.

If anyone has other suggestions, please let me know...

Sugar syntax :-) Currently the parser is POSIX shell strict that's really cool, but the prefix and sufix attributes are shitty to easily build a shell interpreter (like nsh). Don't know if add attributes like argv or env and similar automatically in all cases or enable them with a flag or as a function that add them to a provided AST tree or that generates a totally new one (or maybe also if it should be an independent project in that case...).

Since I'm parsing arithmetic expression with Babylon parser

We should cut the Babylon arithmetic expression parser and put it in an independent module, maybe Babylon project would use it as dependency too.

Sep 08 '16 08:09 piranna

Sugar syntax :-) Currently the parser is POSIX shell strict that's really cool, but the prefix and sufix attributes are shitty to easily build a shell interpreter (like nsh). Don't know if add attributes like argv or env and similar automatically in all cases or enable them with a flag or as a function that add them to a provided AST tree or that generates a totally new one (or maybe also if it should be an independent project in that case...).

That's fine. Please open individual issues so we can discuss each suggestion, if you have many ideas.

Regarding the two you cited, all nodes in suffix property are either args (type Word) or redirections (type IoRedirect). And all nodes in prefix prop are either env variable assignment (type AssignmentWord) or redirections (type IoRedirect). I think It could be useful to add all redirections together in a new redirections property, so remaining suffixes are all args, remaining prefix are all env assignment (and we could rename the two props accordingly).

We should cut the Babylon arithmetic expression parser and put it in an independent module, maybe Babylon project would use it as dependency too.

I think that's a very difficult step to accomplish, anyway some days ago @forivall said me on twitter he could port the arithmetic expression parser he wrote months ago for js-shell-parse and make a PR here.

Sep 08 '16 08:09 parro-it

Regarding the two you cited, all nodes in suffix property are either args (type Word) or redirections (type IoRedirect). And all nodes in prefix prop are either env variable assignment (type AssignmentWord) or redirections (type IoRedirect). I think It could be useful to add all redirections together in a new redirections property, so remaining suffixes are all args, remaining prefix are all env assignment (and we could rename the two props accordingly).

AFAIK, there can be environment variables inside sufixes and redirections inside prefixes, or at least bash allow it:

[piranna@HBP:/tmp]
 > ls
config-err-drtg5X  orbit-piranna  ssh-mIWCOg7TodBm  systemd-private-f2eccf6bb8e94182af5639cee407a65c-rtkit-daemon.service-NF893W

[piranna@HBP:/tmp]
 > > prueba.txt ls

[piranna@HBP:/tmp]
 > ls
config-err-drtg5X  orbit-piranna  prueba.txt  ssh-mIWCOg7TodBm  systemd-private-f2eccf6bb8e94182af5639cee407a65c-rtkit-daemon.service-NF893W

[piranna@HBP:/tmp]
 > cat prueba.txt 
config-err-drtg5X
orbit-piranna
prueba.txt
ssh-mIWCOg7TodBm
systemd-private-f2eccf6bb8e94182af5639cee407a65c-rtkit-daemon.service-NF893W

But yes, the idea is to add env, argv and redirections entries.

I think that's a very difficult step to accomplish, anyway some days ago @forivall said me on twitter he could port the arithmetic expression parser he wrote months ago for js-shell-parse and make a PR here.

That would be a good thing too :-) But I still consider it as a DSL, so I would have it in an independent reusable module.

Sep 08 '16 08:09 piranna

redirections inside prefix

Yes, I cited that

environment variables inside sufix

If you mean environment variables reference, yes, they could... but they must be part of a the normal args property, because they could appear inside normal words, they are resolved by parameter expansion:

echo first_arg second${sep}arg

in this example, the echo command shall receive two argument, the second one having ${sep} resolved to the env var value.

If you , instead, mean env variable assignment, I don't think they are allowed in suffixes, only in prefixes. Please provide some example if you think it's working in bash

Furthermore, it could be useful to distinguish env assignment that should only occurs in command process from env assignment that should occurs in shell process:

in this example, only echo process environment has $a == 42

bash-3.2$ a=42 echo
bash-3.2$ $a

In this other, $a persist in the shell process:

bash-3.2$ a=42
bash-3.2$ echo $a
42

Actually, the last example is parsed as a SimpleCommand node with an empty command name. It could be useful to parse it as new node Type.

Sep 08 '16 09:09 parro-it

redirections inside prefix Yes, I cited that

Sorry, I missed that :-(

If you , instead, mean env variable assignment, I don't think they are allowed in suffixes, only in prefixes. Please provide some example if you think it's working in bash

[piranna@HBP:~/HBP/imagextractor]
 > echo $BLAH


[piranna@HBP:~/HBP/imagextractor]
 > echo $BLAH BLAH=hola
BLAH=hola

[piranna@HBP:~/HBP/imagextractor]
 > bash --version
bash --version
GNU bash, versión 4.3.46(1)-release (x86_64-pc-linux-gnu)

Actually, the last example is parsed as a SimpleCommand node with an empty command name. It could be useful to parse it as new node Type.

They are, in fact: first one is defining an environment variable for the echo command and second one is assigning a variable on the current shell process. Does the POSIX spec says something about this difference? Maybe this could be just one example of sugar sintaxis...

Sep 08 '16 13:09 piranna

Sorry, I missed that :-(

No worries :smile:

Does the POSIX spec says something about this difference? Maybe this could be just one example of sugar sintaxis...

Yes, the relevant part of the grammar is:

cmd_prefix       :            io_redirect
                 | cmd_prefix io_redirect
                 |            ASSIGNMENT_WORD
                 | cmd_prefix ASSIGNMENT_WORD
                 ;
cmd_suffix       :            io_redirect
                 | cmd_suffix io_redirect
                 |            WORD
                 | cmd_suffix WORD

Beware, what your examples did is that:

echo $BLAH

You are reading the BLAH variable value. Since it is not defined, you get empty string (so no output)

 > echo $BLAH BLAH=hola
BLAH=hola

You are calling echo with two arguments: the first is the value of BLAH var. Again, being it undefined, it resolve to an empty string; the second is the word BLAH=hola. It is interpreted as a a normal WORD, so you get it verbatim in the output. Look:

bash-3.2$ BLAH=ciao #BLAH is now set to ciao
bash-3.2$ echo $BLAH BLAH=hola
ciao BLAH=hola
bash-3.2$

Do you get the same output?

Sep 08 '16 13:09 parro-it

@piranna after more reflection on this matter, I think you are right, there commands that accepts assignments in suffix, e.g.

export name=value;
readonly name=value;

Anyway, I think the parser should identify them as Word nodes (normal arguments) as it does now, it's up to the builtin utility to parse them as assignments...

We could eventually build a visitor module that enhance the current AST and parse builtins utility arguments. The argument parsing of builtins utilities should respect another part of POSIX standard.

Sep 09 '16 19:09 parro-it

This should be the standard for argument parsing for builtins utilities: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html#tag_12_01

Sep 09 '16 19:09 parro-it