LibCST
LibCST copied to clipboard
[BUG] Generated Code not equivalent to AST
When doing replacements in a node, it's easy to end up with a tree, which won't get correctly converted to source code. The issue is that libcst
is not adding the necessary parentheses.
For example, the code below
import libcst.matchers as m
import libcst
module = libcst.parse_module('x * y')
matcher = m.Name(value='x')
new_module = m.replace(module, matcher, libcst.parse_expression("1 + 2"))
print(new_module.code)
will output
1 + 2 * y
whereas the expected code is:
(1 + 2) * y
I think it's reasonable to expect that a module
is equivalent to parse_module(module.code)
.
Good catch! The parentheses are added by attributes lpar
/rpar
and their values are a sequence.
For parentheses, we didn't introduce something like MaybeSentinel
used on other whilespace attributes.
To fix this issue, we can automatically add the required parentheses in _visit_and_replace_children
of BinaryOperation
by checking the operator precedence. https://docs.python.org/3/reference/expressions.html#operator-precedence
That way, when any of the children of BinaryOperation
is updated, the missing parentheses will be automatically added if needed.
Feel free to submit a PR to fix it and I'm more than happy to review it.
@jimmylai I'm not sure _visit_and_replace_children
is the right place to add this logic, as one could programmatically create a Module
without using the visitors and then calling module.code
, which won't trigger a visit, so the nodes won't be updated.
Maybe an option would be to change _codegen_impl
and _BaseParenthesizedNode._parenthesize
to receive the parent node.
Then _parerenthesize
could be something like:
@contextmanager
def _parenthesize(self, parent: CSTNode, state: CodegenState) -> Generator[None, None, None]:
need_pars = need_parentheses(self, parent) # new code
if need_pars: # new code
LeftParen()._codegen(state) # new code
for lpar in self.lpar:
lpar._codegen(state)
with state.record_syntactic_position(self):
yield
for rpar in self.rpar:
rpar._codegen(state)
if need_pars: # new code
RightParen()._codegen(state) # new code
Note that this does not only affect BinaryOperation
, but also UnaryOperation
, Comparison
, etc.
There are also two special cases:
- generator expressions, which need to be parenthesized when they are in a call with 2 or more arguments
- tuples when they become part of an expression (addition, call argument, comparison, etc).
I do have a custom implementation of replace
which checks the precedence and the edge cases above when replacing the node, but it requires the ParentNodeProvider
. This of course has its limitations as metadata is not recomputed. I'm missing though the same logic with a transformer. Would happily contribute it.
@jimmylai I'm not sure
_visit_and_replace_children
is the right place to add this logic, as one could programmatically create aModule
without using the visitors and then callingmodule.code
, which won't trigger a visit, so the nodes won't be updated.
In our design, every LibCST node is an immutable dataclass instance and thus we only provide transformer patter for modifying the tree.
E.g. https://github.com/Instagram/LibCST/blob/master/libcst/_nodes/module.py#L33-L35
The matcher replace()
API also uses a transformer to replace target nodes.
Do you still think adding the logic in _visit_and_replace_children
won't work?
Hi @jimmylai, I still think that _visit_and_replace_children
won't work, as one can generate an AST without using the visitor pattern.
For example, the code below also triggers the bug:
import libcst
module = libcst.Module(body=[
libcst.BinaryOperation(
left = libcst.parse_expression("1 + 2"),
operator = libcst.Multiply(),
right = libcst.Integer("3")
)
])
print(module.code)
producing the output 1 + 2 * 3
, but it should have been (1 + 2) * 3
.
In this case _visit_and_replace_children
won't be called. I checked it by adding print
statements to the method definition in both Module
and BinaryOperation
@jimmylai WDYT?
Yeah, the example tree you provided is created from bottom up and it's possible to create a tree like that.
Use ParentProvider is not ideal. We don't use metadata in codegen and tree validation so far.
Maybe we can run the transformer you mentioned when user calls code_for_node
? That is the entry point of code generation.
https://github.com/Instagram/LibCST/blob/c023fa7c4caff3fd2b3946080f9a58b539b10363/libcst/_nodes/module.py#L128
I'm not suggesting to use the metadata in codegen, just to change _codegen_impl
and _parenthesize
to receive the parent, which is already available as the traversal is top to bottom.
With your proposal you will be traversing the tree twice, once to fix the parentheses and a second time to generate the code.
@jimmylai Any chance to revisit this? It'd be great if we could discuss and agree on a proposal to fix this.
After looking deeper into how other syntactic elements are treated, I think we should also discuss whether parens
should accept a MaybeSentinel
as a default, or whether an empty sequence would always mean MaybeSentinel
.
Also related to how some other elements are generated, another alternative is to check in the parent whether a child needs parentheses.
I just ran into this bug as well.
It may not be pretty but a simple work around is:
class FixParens(cst.CSTTransformer):
def _wrap_with_parens(self, node):
return node.with_changes(
lpar=[cst.LeftParen()],
rpar=[cst.RightParen()],
)
def leave_BinaryOperation(self,
original_node: cst.BinaryOperation,
update_node: cst.BinaryOperation
) -> cst.BinaryOperation:
if not update_node.lpar:
assert not update_node.rpar
return self._wrap_with_parens(update_node)
return update_node
# similar for leave_UnaryOperation
Now this has the downside that it inserts parens where they are not needed but does guarantee hand built expressions are properly grouped.
That solution would make it impracticable to use in a code refactoring tool, as lots of unneeded parentheses would be added and unfortunately code formatters do not remove extra parens.
I do have a more comprehensive transformer that only adds the parenthesis when needed but the problem is that it requires the parent and ParentNodeProvider
does not recompute the metadata, that's why I'm asking for a baked in solution.
@sk- agreed. I actually use an additional guard (updated_node is not original_node
) to minimize the "damage" it does. Certainly not ideal though. Definitely something that should be handled in the code generator.