bashlex icon indicating copy to clipboard operation
bashlex copied to clipboard

parse empty lines and comments

Open blurrymoi opened this issue 8 years ago • 17 comments

Allows to parse whole files.

The idea is that I force-reduce on inputunit -> NEWLINE rule and accept by creating a dummy newline node that only retains position (needed for further parsing). These nodes are filtered in the end to prevent output such as: NewlineNode NewlineNode CommandNode ...

I handpicked the relevant changes, so I hope I didn't miss anything.

blurrymoi avatar Feb 20 '16 10:02 blurrymoi

Thanks for sending this. I'm curious how GNU bash handles new lines, ideally we should do the same so we stay close to the source. Though if this works, I guess I'm fine with taking it.

Can you add some tests? On Feb 20, 2016 02:12, "blurrymoi" [email protected] wrote:

Allows to parse whole files.

The idea is that I force-reduce on inputunit -> NEWLINE rule and accept by creating a dummy newline node that only retains position (needed for further parsing). These nodes are filtered in the end to prevent output such as: NewlineNode NewlineNode CommandNode ...

I handpicked the relevant changes, so I hope I didn't miss anything.

You can view, comment on, or merge this pull request online at:

https://github.com/idank/bashlex/pull/8 Commit Summary

  • parse empty lines and comments

File Changes

  • M bashlex/ast.py https://github.com/idank/bashlex/pull/8/files#diff-0 (4)
  • M bashlex/parser.py https://github.com/idank/bashlex/pull/8/files#diff-1 (5)

Patch Links:

  • https://github.com/idank/bashlex/pull/8.patch
  • https://github.com/idank/bashlex/pull/8.diff

— Reply to this email directly or view it on GitHub https://github.com/idank/bashlex/pull/8.

idank avatar Feb 20 '16 18:02 idank

I did some thinking prior to making this and on line 388 they do create a NULL command and yacc accept: http://git.savannah.gnu.org/cgit/bash.git/tree/parse.y?id=df2c55de9c87c2ee8904280d26e80f5c48dd6434#n388

The "None" approach would not really be viable, because a pointer can be of specific type, but None is of no type, so I assumed this to be error-prone. Also, in parser.py around lines 600-620, the index is being set and should there be None, the position would have to be remembered differently (not to mention testing for None-ness).

Perhaps this has to do with actually executing the code?

blurrymoi avatar Feb 21 '16 16:02 blurrymoi

Maybe. I tried a more naive approach:

@@ -593,6 +597,9 @@ def parse(s, strictmode=True, expansionlimit=None, convertpos=False):
     ef.visit(parts[-1])
     index = max(parts[-1].pos[1], ef.end) + 1
     while index < len(s):
+        if s[index].isspace():
+            index += 1
+            continue
         part = _parser(s[index:], strictmode=strictmode).parse()

         if not isinstance(part, ast.node):

But then started thinking about recursive parsing and indeed this fails when there's a newline in a command substitution, try: a $(b\nc). I also tried your fix and it fails as well. I'll try debugging this for a bit and see if I can find how the C parser deals with this.

I'll still merge this, but let's see if we can fix this case too.

idank avatar Feb 21 '16 21:02 idank

Ok, I think I got it working. It deviates from bash a little because I got tired of debugging it. Can you try https://github.com/idank/bashlex/tree/newlines

idank avatar Feb 22 '16 06:02 idank

I played with it a bit and one assert ( assert p[2].kind == 'list' from parser.py/215 (while or until) ) fails for example on:

while [ sth ]
do
   sth
done

It works without the assert, though.

The two issues are barely related, so you could still combine the commits and both parse newlines and comments and fix the newline in substitution problem. I am not trying to enforce my changes, on the contrary, I'd love to see comment-parsing being handled non-trivially, it's just that each and every file that I have to parse contains comments and having a whole different version of the parser is not in my interest.

Thanks for the advice, I would not dare alter your former test cases.

blurrymoi avatar Feb 27 '16 20:02 blurrymoi

What do you mean have a whole different version of the parser?

idank avatar Feb 28 '16 21:02 idank

I added my proposition for the combination of the two solutions. It should be able to parse all of the aforementioned problems: newlines, comments and newlines in command substitution.

What I meant is that if you choose to go with your newlines version and disregard comments altogether I'd have to use my approach for comment-parsing and digress from the mainstream bashlex and I would like to avoid this.

blurrymoi avatar Mar 01 '16 10:03 blurrymoi

Squashed into a single commit for your convenience~

blurrymoi avatar Mar 01 '16 10:03 blurrymoi

My changes shouldn't ignore comments. I made them in haste, I'm sure it's fixable.

Does your original patch work with new lines in substitutions? Changing the grammar the way you did is a little scary and it's hard to see what else might be affected by it.

idank avatar Mar 01 '16 12:03 idank

No, my original patch does not regard new lines in substitutions as that was in no way related to what I was trying to accomplish. And I didn't know such problem existed. Neither did I touch the structure of the grammar. Although I admit that it cannot be said what else the changes may affect.

Your proposed version by itself also seems not to parse \n a.

blurrymoi avatar Mar 01 '16 14:03 blurrymoi

That's easy to fix.

What I meant by touching the grammar is this line yaccparser.action[8][tt.name] = -2. I can't really tell if that's harmless or not outside of what you were trying to fix. I think my approach is safer but happy to discuss any concerns you might have.

If you want to refine my patch, feel free. Otherwise I'll work on it when I have the time.

idank avatar Mar 07 '16 02:03 idank

FYI I tried this PR with moderate complexity multiline bash files and it doesn't work.

thehesiod avatar Sep 20 '17 19:09 thehesiod

Did you debug further? What was giving it trouble?

idank avatar Sep 22 '17 09:09 idank

Can you provide an example?

I analysed the testsuite of the abrt project (some 100+ files) and only a handful of tests failed to parse for reasons unrelated to newline parsing. Can you maybe try with my devel branch and let me know if the problem persists? ( https://github.com/blurrymoi/bashlex/tree/devel )

blurrymoi avatar Sep 22 '17 19:09 blurrymoi

I'll try again, gimme a min

thehesiod avatar Sep 22 '17 22:09 thehesiod

btw can you merge master to your branch? You're 18 commits behind, and have conflicts here

thehesiod avatar Sep 22 '17 22:09 thehesiod

so it fails on this script:

#!/bin/bash -xe

docker build --pull --tag s2-build .
docker run -it --rm -v `pwd`:/s2 s2-build

with error:

  File "/Users/amohr/dev/third_party/bashlex/bashlex/parser.py", line 583, in parse
    parts = [p.parse()]
  File "/Users/amohr/dev/third_party/bashlex/bashlex/parser.py", line 642, in parse
    tree = theparser.parse(lexer=self.tok, context=self)
  File "/Users/amohr/dev/third_party/bashlex/bashlex/yacc.py", line 277, in parse
    return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
  File "/Users/amohr/dev/third_party/bashlex/bashlex/yacc.py", line 1079, in parseopt_notrack
    tok = self.errorfunc(errtoken)
  File "/Users/amohr/dev/third_party/bashlex/bashlex/parser.py", line 539, in p_error
    p.lexer.source, p.lexpos)
bashlex.errors.ParsingError: unexpected token '\n' (position 16)

thehesiod avatar Sep 22 '17 22:09 thehesiod

fixed by ea0c135b88615c81a5a7adfc5873b7670b4449ad

idank avatar Jan 17 '24 17:01 idank