cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Provide a way to compare AST nodes for equality recursively

Open 8a0c69e7-f5ea-4ea3-8880-4903ed9740d7 opened this issue 13 years ago • 18 comments

BPO 15987
Nosy @benjaminp, @Julian, @serhiy-storchaka, @ilevkivskyi, @mlouielu, @tonybaloney, @tonybaloney, @isidentical
PRs
  • python/cpython#1368
  • python/cpython#1375
  • python/cpython#14970
  • python/cpython#19211
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2012-09-20.18:43:56.747>
    labels = ['3.8', 'type-feature', 'library']
    title = 'Provide a way to compare AST nodes for equality recursively'
    updated_at = <Date 2020-03-28.18:12:54.152>
    user = 'https://github.com/Julian'
    

    bugs.python.org fields:

    activity = <Date 2020-03-28.18:12:54.152>
    actor = 'BTaskaya'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2012-09-20.18:43:56.747>
    creator = 'Julian'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 15987
    keywords = ['patch']
    message_count = 17.0
    messages = ['170826', '170907', '170909', '171000', '185288', '185289', '292656', '292665', '292717', '341541', '341546', '341555', '342118', '348545', '349185', '365213', '365215']
    nosy_count = 9.0
    nosy_names = ['benjamin.peterson', 'Julian', 'serhiy.storchaka', 'levkivskyi', 'louielu', 'anthonypjshaw', 'anthony shaw', 'BTaskaya', 'Philip Dye']
    pr_nums = ['1368', '1375', '14970', '19211']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue15987'
    versions = ['Python 3.8']
    

    As is, as far as I can tell, there's no way to easily compare two AST nodes to see if they have the same children and same fields (recursively).

    I'm writing some unit tests for a NodeTransformers, so I've settled for comparing ast.dump()s of each, which is kind of dirty, but 1) works and 2) produces reasonable failure messages. (As a side note of what else I've tried, comparing, say, a list of the iter_child_nodes is not a good alternative, since the tests I'm writing are making assertions that a given node was not modified, which means they deepcopy the node and then want to assert that the "transformed" node is unchanged.)

    I don't know the global implications of changing ast.AST.__eq__ to know if that's feasible (hopefully someone will comment on that), but if it isn't, another provided way would be nice.

    This is a reasonable request. Should comparison include lineno/col_offset or not? If you implement, __eq__, you should also implement __hash__. Maybe, it would be best to start with a helper comparison function in the ast module.

    benjaminp avatar Sep 21 '12 17:09 benjaminp

    I'd say yes (to both lineno/col_offset). And yeah that sounds like what I had in mind (a helper function).

    If I'm specific for a moment about implementation, perhaps something like ast.diff, which yielded tuples of differing nodes (in say, breadth first order down the tree) from two given nodes, and took args for configuration, compare_lineno and compare_col_offset (both defaulted to True), and then eq was just next(ast.diff(self, other, compare_lineno=True, compare_col_offset=True), None) is None.

    Sound good to you?

    Yes, though some things like what to return if one has an entire subtree that the other doesn't have will have to be worked out.

    benjaminp avatar Sep 22 '12 14:09 benjaminp

    I have a use for this as well, but w/o the lineno/col_offset comparison and all I need is a True/False result.

    brettcannon avatar Mar 26 '13 18:03 brettcannon

    IOW I think a function that uses ast.walk() and has flags for specifying whether _attributes should also be checked and then uses a class check and then uses _fields to do all other checking.

    brettcannon avatar Mar 26 '13 18:03 brettcannon

    If we only need a binary True/False result, could we just return a compare of dump(a) == dump(b)?

    This can also add the include_attributes flags for need.

    Provide a recursive way to compare AST nodes, it will compare with fields, but no attributes.

    The performance compare with ast.dump methods in unittest

    # Recursive compare ./python -m unittest test.test_ast.ASTCompareTest ...... ---------------------------------------------------------------------- Ran 6 tests in 0.669s

    OK

    # ast.dump compare ./python -m unittest test.test_ast.ASTCompareTest ...... ---------------------------------------------------------------------- Ran 6 tests in 2.192s

    Update to AST base type richcompare.

    the unittest finished time was better than python version:

    Ran 7 tests in 0.278s
    

    This discussion is inconclusive and targets an old version of CPython, can this issue be closed?

    Closing issue, PR branch has since been removed and targets Python 3.4

    Please don't close an issue too fast.

    The PR 1368 is still open and contains valuable code.

    Moreover, I don't see anyone here saying that the feature is a bad idea. The feature has not been implemented, so the issue should remain open, even if PR 1368 is outdated.

    ...issue... targets Python 3.4

    Well, it's just that the issue has been created a long time ago, but the feature request remain value for Python 3.8.

    vstinner avatar May 06 '19 16:05 vstinner

    Btw, I am +1 on this feature (preferably with an option to check line, column, end line, and end column). I always wanted this, but never had time to actually implement this.

    ilevkivskyi avatar May 10 '19 18:05 ilevkivskyi

    If consensus has been reached on this, I am willing to do the work.

    If consensus has been reached on this, I am willing to do the work.

    It looks like there is already an active PR https://github.com/python/cpython/pull/14970, there are some non-implemented comments from a core review.

    ilevkivskyi avatar Aug 07 '19 17:08 ilevkivskyi

    I am not sure that implementing a rich comparison of AST nodes is the right way. There are too much options (compare attributes or not, compare recursively or not, compare types or not) and __eq__ does not support options. In addition, it requires implementing compatible __hash__, but how do you implement a hash of mutable object? In addition, recursive comparison can introduce a regression in existing code -- instead of fast returning False the comparison will spent a nontrivial amount of time.

    If implement a comparison of AST nodes, it should be a separate function which support multiple options. Would be nice also to have examples where this feature can be used before implementing it.

    See also bpo-37792.

    serhiy-storchaka avatar Mar 28 '20 14:03 serhiy-storchaka

    The solution to hashing used in PR 14970 was just using the default hashing function which is kind of a workaround to the real problem. I now concur with your comments. Will try to draft out an ast.compare function.

    isidentical avatar Mar 28 '20 15:03 isidentical

    So, before implementing this, I would like to create a discussion on discuss.python.org to gather user feedback on what this comparison should do.

    Is it a full comparison that includes comparing each field, or is it a structural comparison nodes, where's these nodes are treated as equal BinOp(left=Constant(5), op=Mul(), right=Constant(2)) and BinOp(left= Constant(1), op = Mul(), right = Constant(1))?

    The second thing is about hashing: our last nodes are mutable, so should we make them immutable and provide a ast.replace() function?

    Eclips4 avatar May 11 '24 18:05 Eclips4

    @Eclips4 I missed this discussion when I was working on pr19211 at the sprints. Happy to continue discussion on the topic, as we have a little time before 3.14 is released.

    As you see in the PR, this is a separate compare() function that totally side steps things like eq and hash. It also focuses on full-field-level equality. Seems hard to say that two ASTs should be equal if they are "1 + 2" and "2 + 3" since they are literally different (literally). A comparison function that says "some differences in the code are allowed" might be framed as more of a pattern-matching problem that could answer questions like "Are these both arithmetic expressions that have the same structure except for constants?" Or "Are these both functions of two-positional arguments that don't call other functions?"

    jeremyhylton avatar May 22 '24 17:05 jeremyhylton

    @Eclips4 I missed this discussion when I was working on pr19211 at the sprints. Happy to continue discussion on the topic, as we have a little time before 3.14 is released.

    As you see in the PR, this is a separate compare() function that totally side steps things like eq and hash. It also focuses on full-field-level equality. Seems hard to say that two ASTs should be equal if they are "1 + 2" and "2 + 3" since they are literally different (literally). A comparison function that says "some differences in the code are allowed" might be framed as more of a pattern-matching problem that could answer questions like "Are these both arithmetic expressions that have the same structure except for constants?" Or "Are these both functions of two-positional arguments that don't call other functions?"

    I think the current implementation is good enough. Actually, if we have compare maybe it's good to have something like a contains?

    Eclips4 avatar Jun 23 '24 07:06 Eclips4