LibCST
LibCST copied to clipboard
Parsing Code with Syntax Errors
Is it possible to parse code that has syntax errors in part of the code? I'd like to refactor code that might still be incomplete (and this, might have syntax errors).
For example:
print(x) def f(x,y,z):
is not valid code by itself (because the function needs a body). When I call libcst.parse_expression('print(x)\ndef :'), it throws an error. While this is an understandable response, I was wondering if it would be possible to not throw the baby out with the bathwater; that is to say is it possible to recover the print(x) in the output (instead of just throwing an error).
Why I want this: When editing a python file, usually in between edits, I have invalid syntax (like in writing the above example; before adding a function body). But I'd like to be able to run refactorings anyway, like PyCharm does. Is this possible with this library?
print function is an expression, but def is a statement; you might want to try parse_module instead. I don't think this idea works in the general case very well, and suspect PyCharm is just doing regex if it works for cases like these, but here's some untested code that splits some source into a valid cst tree, and everything after that you can merge back together later.
def lenient_parse(data)
try:
mod = cst.parse_module(data)
return mod, None
except cst.ParserSyntaxError as e:
lines = data.splitlines(True)
for n in range(e.raw_line, -1, -1):
try:
mod = cst.parse_module("".join(lines[:n]))
bad = lines[n:]
return mod, bad
raise
mod, rest = lenient_parse(...)
do_refactor(mod)
return mod.code + rest
I think forgiving parsing would be a reasonable feature for LibCST. It's a common feature for parsers used in IDE contexts, and Parso supports it, which might make it easier to implement in LibCST.
It's really just a question of someone having sufficient motivation to contribute it, IMO.