bashlex
bashlex copied to clipboard
parse empty lines and comments
Allows to parse whole files.
The idea is that I force-reduce on inputunit -> NEWLINE rule and accept by creating a dummy newline node that only retains position (needed for further parsing). These nodes are filtered in the end to prevent output such as: NewlineNode NewlineNode CommandNode ...
I handpicked the relevant changes, so I hope I didn't miss anything.
Thanks for sending this. I'm curious how GNU bash handles new lines, ideally we should do the same so we stay close to the source. Though if this works, I guess I'm fine with taking it.
Can you add some tests? On Feb 20, 2016 02:12, "blurrymoi" [email protected] wrote:
Allows to parse whole files.
The idea is that I force-reduce on inputunit -> NEWLINE rule and accept by creating a dummy newline node that only retains position (needed for further parsing). These nodes are filtered in the end to prevent output such as: NewlineNode NewlineNode CommandNode ...
I handpicked the relevant changes, so I hope I didn't miss anything.
You can view, comment on, or merge this pull request online at:
https://github.com/idank/bashlex/pull/8 Commit Summary
- parse empty lines and comments
File Changes
- M bashlex/ast.py https://github.com/idank/bashlex/pull/8/files#diff-0 (4)
- M bashlex/parser.py https://github.com/idank/bashlex/pull/8/files#diff-1 (5)
Patch Links:
- https://github.com/idank/bashlex/pull/8.patch
- https://github.com/idank/bashlex/pull/8.diff
— Reply to this email directly or view it on GitHub https://github.com/idank/bashlex/pull/8.
I did some thinking prior to making this and on line 388 they do create a NULL command and yacc accept: http://git.savannah.gnu.org/cgit/bash.git/tree/parse.y?id=df2c55de9c87c2ee8904280d26e80f5c48dd6434#n388
The "None" approach would not really be viable, because a pointer can be of specific type, but None is of no type, so I assumed this to be error-prone. Also, in parser.py around lines 600-620, the index is being set and should there be None, the position would have to be remembered differently (not to mention testing for None-ness).
Perhaps this has to do with actually executing the code?
Maybe. I tried a more naive approach:
@@ -593,6 +597,9 @@ def parse(s, strictmode=True, expansionlimit=None, convertpos=False):
ef.visit(parts[-1])
index = max(parts[-1].pos[1], ef.end) + 1
while index < len(s):
+ if s[index].isspace():
+ index += 1
+ continue
part = _parser(s[index:], strictmode=strictmode).parse()
if not isinstance(part, ast.node):
But then started thinking about recursive parsing and indeed this fails when there's a newline in a command substitution, try: a $(b\nc)
. I also tried your fix and it fails as well. I'll try debugging this for a bit and see if I can find how the C parser deals with this.
I'll still merge this, but let's see if we can fix this case too.
Ok, I think I got it working. It deviates from bash a little because I got tired of debugging it. Can you try https://github.com/idank/bashlex/tree/newlines
I played with it a bit and one assert ( assert p[2].kind == 'list'
from parser.py/215 (while or until) ) fails for example on:
while [ sth ]
do
sth
done
It works without the assert, though.
The two issues are barely related, so you could still combine the commits and both parse newlines and comments and fix the newline in substitution problem. I am not trying to enforce my changes, on the contrary, I'd love to see comment-parsing being handled non-trivially, it's just that each and every file that I have to parse contains comments and having a whole different version of the parser is not in my interest.
Thanks for the advice, I would not dare alter your former test cases.
What do you mean have a whole different version of the parser?
I added my proposition for the combination of the two solutions. It should be able to parse all of the aforementioned problems: newlines, comments and newlines in command substitution.
What I meant is that if you choose to go with your newlines version and disregard comments altogether I'd have to use my approach for comment-parsing and digress from the mainstream bashlex and I would like to avoid this.
Squashed into a single commit for your convenience~
My changes shouldn't ignore comments. I made them in haste, I'm sure it's fixable.
Does your original patch work with new lines in substitutions? Changing the grammar the way you did is a little scary and it's hard to see what else might be affected by it.
No, my original patch does not regard new lines in substitutions as that was in no way related to what I was trying to accomplish. And I didn't know such problem existed. Neither did I touch the structure of the grammar. Although I admit that it cannot be said what else the changes may affect.
Your proposed version by itself also seems not to parse \n a
.
That's easy to fix.
What I meant by touching the grammar is this line yaccparser.action[8][tt.name] = -2
. I can't really tell if that's harmless or not outside of what you were trying to fix. I think my approach is safer but happy to discuss any concerns you might have.
If you want to refine my patch, feel free. Otherwise I'll work on it when I have the time.
FYI I tried this PR with moderate complexity multiline bash files and it doesn't work.
Did you debug further? What was giving it trouble?
Can you provide an example?
I analysed the testsuite of the abrt project (some 100+ files) and only a handful of tests failed to parse for reasons unrelated to newline parsing. Can you maybe try with my devel branch and let me know if the problem persists? ( https://github.com/blurrymoi/bashlex/tree/devel )
I'll try again, gimme a min
btw can you merge master to your branch? You're 18 commits behind, and have conflicts here
so it fails on this script:
#!/bin/bash -xe
docker build --pull --tag s2-build .
docker run -it --rm -v `pwd`:/s2 s2-build
with error:
File "/Users/amohr/dev/third_party/bashlex/bashlex/parser.py", line 583, in parse
parts = [p.parse()]
File "/Users/amohr/dev/third_party/bashlex/bashlex/parser.py", line 642, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "/Users/amohr/dev/third_party/bashlex/bashlex/yacc.py", line 277, in parse
return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
File "/Users/amohr/dev/third_party/bashlex/bashlex/yacc.py", line 1079, in parseopt_notrack
tok = self.errorfunc(errtoken)
File "/Users/amohr/dev/third_party/bashlex/bashlex/parser.py", line 539, in p_error
p.lexer.source, p.lexpos)
bashlex.errors.ParsingError: unexpected token '\n' (position 16)
fixed by ea0c135b88615c81a5a7adfc5873b7670b4449ad