A crash in Python 3.5: lib2to3.pgen2.parse.ParseError: bad input: type=16, value='*'
This crash will only happen in Python 3.5.
The Test file
def func(iterable, *args, **kwargs):
other(*iterable, *args, **kwargs)
def other(*args, **kwargs):
print(args)
print(kwargs)
func([1, 2], 'arg0', 'arg1', arg2=2, arg3=3)
Its runtime result is just as expected.
$ python3.5 test.py
(1, 2, 'arg0', 'arg1')
{'arg3': 3, 'arg2': 2}
Crash
$ yapf test.py
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/yapf/yapflib/pytree_utils.py", line 115, in ParseCodeToTree
tree = parser_driver.parse_string(code, debug=False)
File "/usr/lib/python3.5/lib2to3/pgen2/driver.py", line 106, in parse_string
return self.parse_tokens(tokens, debug)
File "/usr/lib/python3.5/lib2to3/pgen2/driver.py", line 71, in parse_tokens
if p.addtoken(type, value, (prefix, start)):
File "/usr/lib/python3.5/lib2to3/pgen2/parse.py", line 159, in addtoken
raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=16, value='*', context=(' ', (2, 21))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/yapf", line 8, in <module>
sys.exit(run_main())
File "/usr/local/lib/python3.5/dist-packages/yapf/__init__.py", line 344, in run_main
sys.exit(main(sys.argv))
File "/usr/local/lib/python3.5/dist-packages/yapf/__init__.py", line 226, in main
verbose=args.verbose)
File "/usr/local/lib/python3.5/dist-packages/yapf/__init__.py", line 278, in FormatFiles
in_place, print_diff, verify, quiet, verbose)
File "/usr/local/lib/python3.5/dist-packages/yapf/__init__.py", line 305, in _FormatFile
logger=logging.warning)
File "/usr/local/lib/python3.5/dist-packages/yapf/yapflib/yapf_api.py", line 91, in FormatFile
verify=verify)
File "/usr/local/lib/python3.5/dist-packages/yapf/yapflib/yapf_api.py", line 129, in FormatCode
tree = pytree_utils.ParseCodeToTree(unformatted_source)
File "/usr/local/lib/python3.5/dist-packages/yapf/yapflib/pytree_utils.py", line 121, in ParseCodeToTree
tree = parser_driver.parse_string(code, debug=False)
File "/usr/lib/python3.5/lib2to3/pgen2/driver.py", line 106, in parse_string
return self.parse_tokens(tokens, debug)
File "/usr/lib/python3.5/lib2to3/pgen2/driver.py", line 71, in parse_tokens
if p.addtoken(type, value, (prefix, start)):
File "/usr/lib/python3.5/lib2to3/pgen2/parse.py", line 159, in addtoken
raise ParseError("bad input", type, value, context)
lib2to3.pgen2.parse.ParseError: bad input: type=16, value='*', context=(' ', (2, 21))
It will crash at other(*iterable, *args, **kwargs).
Environment
$ python3.5 --version
Python 3.5.2
$ yapf --version
yapf 0.29.0
Does this happen with Python 3.7 or 3.8?
No, only with Python 3.5 (or maybe below).
Probably you're using a syntax feature that was introduced in Python 3.6.
I think that Yapf uses the Python grammar that comes with the runtime, so if you use Yapf with Python 3.5, you're restricted to a Python 3.5 grammar.
Its runtime result is just as expected.
So, it is not a Python 3.6 grammar.
Python 3.5 doesn't seem to be available for Ubuntu 18.0.4, so I can't reproduce this. (Python 3.5 was released Sept 2015.)
The error message is typical of what happens when there's a syntax error according to the grammar. For the fun of it, I tried your code with Python 2.7; it failed at line 2 column 21 (at *args), so it's possible that Python 3.5's lib2to3 grammar isn't quite in sync with the grammar that Python 3.5 will accept.
You can use docker to reproduce it.
docker pull python:3.5-buster
docker run --rm -it python:3.5-buster bash
...
In the crash code (yapf/yapflib/pytree_utils.py):
def ParseCodeToTree(code):
"""Parse the given code to a lib2to3 pytree.
Arguments:
code: a string with the code to parse.
Raises:
SyntaxError if the code is invalid syntax.
parse.ParseError if some other parsing failure.
Returns:
The root node of the parsed tree.
"""
# This function is tiny, but the incantation for invoking the parser correctly
# is sufficiently magical to be worth abstracting away.
try:
# Try to parse using a Python 3 grammar, which is more permissive (print and
# exec are not keywords).
parser_driver = driver.Driver(_GRAMMAR_FOR_PY3, convert=pytree.convert)
tree = parser_driver.parse_string(code, debug=False)
except parse.ParseError:
# Now try to parse using a Python 2 grammar; If this fails, then
# there's something else wrong with the code.
try:
parser_driver = driver.Driver(_GRAMMAR_FOR_PY2, convert=pytree.convert)
tree = parser_driver.parse_string(code, debug=False)
except parse.ParseError:
# Raise a syntax error if the code is invalid python syntax.
try:
ast.parse(code)
except SyntaxError as e:
raise e
else:
raise
return _WrapEndMarker(tree)
yapf tried to parse the code with _GRAMMAR_FOR_PY3. After failed, it went to _GRAMMAR_FOR_PY2, which crashed.
I agree with you. It seems to be a bug of lib2to3.
But my code target to Python 3 only, why yapf use lib2to3?
I've hit similar crashes using Python3.5, which also seem to be related to the syntax used for unpacking generalizations (https://www.python.org/dev/peps/pep-0448/).
Based on this bug (https://bugs.python.org/issue25969), lib2to3 should support this syntax, so maybe there's a bug in lib2to3.
Is there any workaround for this? Yapf seems to be broken in all recent Python versions. I get this error in Python3.8.
Lib2to3 is on its way to deprecation because future Pythons will use a different parsing technology.
I proposed doing some work to make a lib2to3-like interface to the new Python parser, but nobody seemed interested, and it's a fair bit of work.
There are some alternative parsers, but they'd require some work to integrate into yapf, and it's not clear how well they'll work in future either. I've looked at another parser for "leo-editor" but it seems overly complicated for what it does ... I might be persuaded to make a simpler version of that if enough people are interested.
Is there any workaround for this? Yapf seems to be broken in all recent Python versions. I get this error in Python3.8.
Hi, how did you solve the problem? My env is python3.8, got the same problem
As I said a few days ago, it's a significant amount of work to switch from lib2to3 parser to the new parser. I might work on it, one of these days, but don't have an immediate need and there are more interesting things that I'd rather do first.
@kamahen You are right though that if an alternate parser isn't eventually used, yapf will unfortunately end up broken. Have you looked at parso? That's what the Python docs recommend as a replacement.
Also, you asked if people were interested. Please put me down as interested 😄
Does parso provide access to the "white space" and comments in the source, and maps the tree to the source? I didn't see anything about that in the API, but I might have missed it.
(I don't think that libCST has this either)
In principle, it should be straightforward to wrap the existing Python parser (libary ast) to do what lib2to3 does for yapf. (and ast maps from the parse tree to the source; dealing with the whitespace should be straightforward) But it's a moderate amount of work ...
(There's also leoAst.py, but it seems to be a rather complicated way of doing what should be a fairly simple thing)
On Thu, 1 Jul 2021 at 23:04, Neil Girdhar @.***> wrote:
@kamahen https://github.com/kamahen You are right though that if an alternate parser isn't eventually used, yapf will unfortunately end up broken. Have you looked at parso https://parso.readthedocs.io/en/latest/? That's what the Python docs recommend as a replacement.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/yapf/issues/825#issuecomment-872740766, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIIGNNMB5WXFJQG3OMLZ4N3TVVJHTANCNFSM4LLSTF2Q .
It appears that the "black" formatter uses a slightly modified lib2to3, so it would likely have a similar problem. https://github.com/psf/black/tree/main/src/blib2to3
But it might be worthwhile tracking black's version, as it could handle some things that have been reported in this thread.
See also https://github.com/kamahen/pykythe/issues/27
Why not just use black's blib2to3 in yapf?
AFAICT black's blib2to3 uses the same compiler technology that lib2to3 uses, and therefore won't work in the future -- the PEG parsers can handle things that the lib2to3 parsers can't.
I'm going to contact one of the black developers about ... hopefully, I'll report back soon. (I need to read up on a few things first ...)
@kamahen Good point, and good idea.
It appears that asttokens could be used with the new PEG parser. I'll try converting some of my code to use asttokens and see how it goes. Don't expect a quick response ... I'm going to be out of town for a while.
It seems that there's now a PEG parser (implemented in Rust) that's aimed towards ASTs. Somebody might want to investigate whether it'll suffice for yapf. https://github.com/Instagram/LibCST/pull/566
Same problem in Python 3.9
Closed with 7c408b9d7750292760ed255f744211d1ef535668.