bashlex
bashlex copied to clipboard
Unexpected token \n when parsing bash *script*
I am trying to parse the following bash script (simplified example) and print the produced AST as JSON using bashlex 0.12:
function a {
a;
}
# Comment
But it fails:
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 614, in parse
part = _parser(s[index:], strictmode=strictmode).parse()
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 682, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "/usr/local/lib/python2.7/dist-packages/bashlex/yacc.py", line 277, in parse
return self.parseopt_notrack(input,lexer,debug,tracking,tokenfunc,context)
File "/usr/local/lib/python2.7/dist-packages/bashlex/yacc.py", line 1079, in parseopt_notrack
tok = self.errorfunc(errtoken)
File "/usr/local/lib/python2.7/dist-packages/bashlex/parser.py", line 539, in p_error
p.lexer.source, p.lexpos)
bashlex.errors.ParsingError: unexpected token '\n' (position 10)
A trivial workaround is to wrap the code in any other construct, the simplest being a set of curly braces. Then everything works just fine:
{
function a {
a;
}
# Comment
}
Of course I can live with the workaround but I think it would be great if you took a look at it.
Thanks a lot for the great job you've done!
Yeah this is silly that the library doesn't allow newlines. There's an old bug somewhere around here that has a fix for it. I just never got around to merging it in properly.
@idank Is there a fix for this somewhere on the horizon? I'd like to use this library for some transpilers.
There's some prior work at https://github.com/idank/bashlex/pull/8 that doesn't cover all cases. It might work for your use case.
In a Python interactive session with the following setup:
$ python
Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bashlex
Attempted the
>>> bashlex.parse('function a {\
... a;\
... }\
... \
... # Comment')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 610, in parse
parts = [p.parse()]
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 691, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "/home/user/.local/lib/python3.10/site-packages/bashlex/yacc.py", line 537, in parse
tok = self.errorfunc(errtoken)
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 544, in p_error
raise errors.ParsingError('unexpected EOF',
bashlex.errors.ParsingError: unexpected EOF (position 28)
The original error doesn't match, but an error non-the-less. The workaround provided still works:
>>> bashlex.parse('{\
... function a {\
... a;\
... }\
... \
... # Comment\
... }')
[ListNode(parts=[CommandNode(parts=[WordNode(parts=[] pos=(0, 9) word='{function'), WordNode(parts=[] pos=(10, 11) word='a'), WordNode(parts=[] pos=(12, 13) word='{'), WordNode(parts=[] pos=(17, 18) word='a')] pos=(0, 18)), OperatorNode(op=';' pos=(18, 19)), CommandNode(parts=[WordNode(parts=[] pos=(19, 21) word='}#'), WordNode(parts=[] pos=(22, 30) word='Comment}')] pos=(19, 30))] pos=(0, 30))]
The error seems to be as a result of the comment:
>>> bashlex.parse('function a {\
... a;\
... }\
... ')
[FunctionNode(body=CompoundNode(list=[ReservedwordNode(pos=(11, 12) word='{'), ListNode(parts=[CommandNode(parts=[WordNode(parts=[] pos=(16, 17) word='a')] pos=(16, 17)), OperatorNode(op=';' pos=(17, 18))] pos=(16, 18)), ReservedwordNode(pos=(18, 19) word='}')] pos=(11, 19) redirects=[]) name=WordNode(parts=[] pos=(9, 10) word='a') parts=[ReservedwordNode(pos=(0, 8) word='function'), WordNode(parts=[] pos=(9, 10) word='a'), CompoundNode(list=[ReservedwordNode(pos=(11, 12) word='{'), ListNode(parts=[CommandNode(parts=[WordNode(parts=[] pos=(16, 17) word='a')] pos=(16, 17)), OperatorNode(op=';' pos=(17, 18))] pos=(16, 18)), ReservedwordNode(pos=(18, 19) word='}')] pos=(11, 19) redirects=[])] pos=(0, 19))]
Even if further newlines are added:
>>> bashlex.parse('function a {\
... a;\
... }\
... \
... # Comment\
... \
... ')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 610, in parse
parts = [p.parse()]
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 691, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "/home/user/.local/lib/python3.10/site-packages/bashlex/yacc.py", line 537, in parse
tok = self.errorfunc(errtoken)
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 544, in p_error
raise errors.ParsingError('unexpected EOF',
bashlex.errors.ParsingError: unexpected EOF (position 28)
Adding a statement after the comment doesn't seem to resolve this error:
>>> bashlex.parse('function a {\
... a;\
... }\
... \
... # Comment\
... \
... a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 610, in parse
parts = [p.parse()]
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 691, in parse
tree = theparser.parse(lexer=self.tok, context=self)
File "/home/user/.local/lib/python3.10/site-packages/bashlex/yacc.py", line 537, in parse
tok = self.errorfunc(errtoken)
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 544, in p_error
raise errors.ParsingError('unexpected EOF',
bashlex.errors.ParsingError: unexpected EOF (position 28)
Moving the comment to the beginning didn't prevent an error, just provided a different one:
>>> bashlex.parse('# Comment\
... function a {\
... a;\
... }')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/.local/lib/python3.10/site-packages/bashlex/parser.py", line 620, in parse
ef.visit(parts[-1])
File "/home/user/.local/lib/python3.10/site-packages/bashlex/ast.py", line 38, in visit
k = n.kind
AttributeError: 'str' object has no attribute 'kind'. Did you mean: 'find'?
Removing the comment all together seems to be the only prevention without the "workaround".