Status report
I completed all features discussed so far. From this point on, I think I only do bug fixes, source code improvement and refactors and add more tests.
Please open an issue if you find some feature is missing, otherwise, I plan to add specific bash extension to the grammar in a future release.
I wrote a document of implemented options and another one that document the AST.
I think I would refactor the AST node type names to be CamelCased, so they will become:
- CompleteCommand
- Pipeline
- AndOr
- SimpleCommand
- Function
- Name
- CompoundList
- Subshell
- Case
- CaseItem
- If
- While
- Until
- Word
- AssignmentWord
- ArithmeticExpansion
- CommandExpansion
- ParameterExpansion
- IoRedirect
This should be in line with how typical js AST nodes are named.
Also, @wooorm suggested following changes:
and_or > LogicalOperator simple_command > BuiltIn io_redirect > Redirect complete_command > Root
If anyone has other suggestions, please let me know...
I also wrote an AST tree traverser to help implement the visitor pattern, and I think I'll write a web page to parse source code online and show the resulting AST tree.
- any reason why all node types are upper-cased camelCase? E.g., why
ParameterExpansionoverparameterExpansion? I don’t really care but I was wondering if there’s a reason; -
IoRedirectorIORedirect? Both are fine IMO, but don’t go funky likeXMLHttpRequest; -
and_or > LogicalOperator, maybe, if this is the only operator, drop that and go forLogical? Not sure though.
Other than that, 👍
any reason why all node types are upper-cased camelCase? E.g., why ParameterExpansion over parameterExpansion? I don’t really care but I was wondering if there’s a reason;
-
It better fit with what seems to be a de facto standard with js AST. Since I'm parsing arithmetic expression with Babylon parser, we would have a part of the AST that is PascalCase anyway.
-
We skip the problem of reserved words,
Ifseems better to me thanifStatementetc.
IoRedirect or IORedirect? Both are fine IMO, but don’t go funky like XMLHttpRequest;
IORedirect seems better, but since it's the only kind of redirection, I would prefer Redirect as you suggested
and_or > LogicalOperator, maybe, if this is the only operator, drop that and go for Logical? Not sure though.
and_or is not representing an operator, but the whole logical expression, so LogicalExpression would be better?
complete_command > Root
That would be fine to me
simple_command > BuiltIn
simple_command does not necessary represent a builtin, but also any kind of command to invoke. E.g. ls /home would be represented by a simple_command node.
So Command could be better according to me
OK looks good u agree with all your points!
I think I would refactor the AST node type names to be CamelCased
I find that good, since they are objects (classes) on the AST node.
If anyone has other suggestions, please let me know...
Sugar syntax :-) Currently the parser is POSIX shell strict that's really cool, but the prefix and sufix attributes are shitty to easily build a shell interpreter (like nsh). Don't know if add attributes like argv or env and similar automatically in all cases or enable them with a flag or as a function that add them to a provided AST tree or that generates a totally new one (or maybe also if it should be an independent project in that case...).
Since I'm parsing arithmetic expression with Babylon parser
We should cut the Babylon arithmetic expression parser and put it in an independent module, maybe Babylon project would use it as dependency too.
Sugar syntax :-) Currently the parser is POSIX shell strict that's really cool, but the prefix and sufix attributes are shitty to easily build a shell interpreter (like nsh). Don't know if add attributes like argv or env and similar automatically in all cases or enable them with a flag or as a function that add them to a provided AST tree or that generates a totally new one (or maybe also if it should be an independent project in that case...).
That's fine. Please open individual issues so we can discuss each suggestion, if you have many ideas.
Regarding the two you cited, all nodes in suffix property are either args (type Word) or redirections (type IoRedirect). And all nodes in prefix prop are either env variable assignment (type AssignmentWord) or redirections (type IoRedirect). I think It could be useful to add all redirections together in a new redirections property, so remaining suffixes are all args, remaining prefix are all env assignment (and we could rename the two props accordingly).
We should cut the Babylon arithmetic expression parser and put it in an independent module, maybe Babylon project would use it as dependency too.
I think that's a very difficult step to accomplish, anyway some days ago @forivall said me on twitter he could port the arithmetic expression parser he wrote months ago for js-shell-parse and make a PR here.
Regarding the two you cited, all nodes in suffix property are either args (type Word) or redirections (type IoRedirect). And all nodes in prefix prop are either env variable assignment (type AssignmentWord) or redirections (type IoRedirect). I think It could be useful to add all redirections together in a new redirections property, so remaining suffixes are all args, remaining prefix are all env assignment (and we could rename the two props accordingly).
AFAIK, there can be environment variables inside sufixes and redirections inside prefixes, or at least bash allow it:
[piranna@HBP:/tmp]
> ls
config-err-drtg5X orbit-piranna ssh-mIWCOg7TodBm systemd-private-f2eccf6bb8e94182af5639cee407a65c-rtkit-daemon.service-NF893W
[piranna@HBP:/tmp]
> > prueba.txt ls
[piranna@HBP:/tmp]
> ls
config-err-drtg5X orbit-piranna prueba.txt ssh-mIWCOg7TodBm systemd-private-f2eccf6bb8e94182af5639cee407a65c-rtkit-daemon.service-NF893W
[piranna@HBP:/tmp]
> cat prueba.txt
config-err-drtg5X
orbit-piranna
prueba.txt
ssh-mIWCOg7TodBm
systemd-private-f2eccf6bb8e94182af5639cee407a65c-rtkit-daemon.service-NF893W
But yes, the idea is to add env, argv and redirections entries.
I think that's a very difficult step to accomplish, anyway some days ago @forivall said me on twitter he could port the arithmetic expression parser he wrote months ago for js-shell-parse and make a PR here.
That would be a good thing too :-) But I still consider it as a DSL, so I would have it in an independent reusable module.
redirections inside prefix
Yes, I cited that
environment variables inside sufix
- If you mean environment variables reference, yes, they could... but they must be part of a the normal args property, because they could appear inside normal words, they are resolved by parameter expansion:
echo first_arg second${sep}arg
in this example, the echo command shall receive two argument, the second one having ${sep} resolved to the env var value.
- If you , instead, mean env variable assignment, I don't think they are allowed in suffixes, only in prefixes. Please provide some example if you think it's working in bash
Furthermore, it could be useful to distinguish env assignment that should only occurs in command process from env assignment that should occurs in shell process:
in this example, only echo process environment has $a == 42
bash-3.2$ a=42 echo
bash-3.2$ $a
In this other, $a persist in the shell process:
bash-3.2$ a=42
bash-3.2$ echo $a
42
Actually, the last example is parsed as a SimpleCommand node with an empty command name. It could be useful to parse it as new node Type.
redirections inside prefix Yes, I cited that
Sorry, I missed that :-(
If you , instead, mean env variable assignment, I don't think they are allowed in suffixes, only in prefixes. Please provide some example if you think it's working in bash
[piranna@HBP:~/HBP/imagextractor]
> echo $BLAH
[piranna@HBP:~/HBP/imagextractor]
> echo $BLAH BLAH=hola
BLAH=hola
[piranna@HBP:~/HBP/imagextractor]
> bash --version
bash --version
GNU bash, versión 4.3.46(1)-release (x86_64-pc-linux-gnu)
Actually, the last example is parsed as a SimpleCommand node with an empty command name. It could be useful to parse it as new node Type.
They are, in fact: first one is defining an environment variable for the echo command and second one is assigning a variable on the current shell process. Does the POSIX spec says something about this difference? Maybe this could be just one example of sugar sintaxis...
Sorry, I missed that :-(
No worries :smile:
Does the POSIX spec says something about this difference? Maybe this could be just one example of sugar sintaxis...
Yes, the relevant part of the grammar is:
cmd_prefix : io_redirect
| cmd_prefix io_redirect
| ASSIGNMENT_WORD
| cmd_prefix ASSIGNMENT_WORD
;
cmd_suffix : io_redirect
| cmd_suffix io_redirect
| WORD
| cmd_suffix WORD
Beware, what your examples did is that:
echo $BLAH
You are reading the BLAH variable value. Since it is not defined, you get empty string (so no output)
> echo $BLAH BLAH=hola
BLAH=hola
You are calling echo with two arguments: the first is the value of BLAH var. Again, being it undefined, it resolve to an empty string; the second is the word BLAH=hola. It is interpreted as a a normal WORD, so you get it verbatim in the output. Look:
bash-3.2$ BLAH=ciao #BLAH is now set to ciao
bash-3.2$ echo $BLAH BLAH=hola
ciao BLAH=hola
bash-3.2$
Do you get the same output?
@piranna after more reflection on this matter, I think you are right, there commands that accepts assignments in suffix, e.g.
export name=value;
readonly name=value;
Anyway, I think the parser should identify them as Word nodes (normal arguments) as it does now, it's up to the builtin utility to parse them as assignments...
We could eventually build a visitor module that enhance the current AST and parse builtins utility arguments. The argument parsing of builtins utilities should respect another part of POSIX standard.
This should be the standard for argument parsing for builtins utilities: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html#tag_12_01