parsimonious icon indicating copy to clipboard operation
parsimonious copied to clipboard

Inconsistent and undocumented production of Node vs list for children

Open alexchandel opened this issue 1 year ago • 3 comments

Parsimonious seems to arbitrarily produce a Node vs a list of Nodes for some children of a rule via some undocumented rule. Consider the grammar fragment:

r"""
definition_list   = ((comment ws)? definition ws)+
"""

First, in visit_definition_list(self, node, visited_children), visited_children will be a list. This is not documented anywhere, but ok.

It is also a list of lists; again, not explicitly documented anywhere whether or why this is the case, but ok.

But the real inconsistency is visited_children[0][0]. If the optional match doesn't match, then this is a <Node matching ""> (like Node(<Quantifier (comment ws)?>, s, 26, 26)). But if the optional does match, then it's a 1-element list. Why? This is documented nowhere.

And this 1-element list itself contains a list (presumably of the parenthesized group elements) (meaning type(visited_children[0][0][0]) == list). Again, why? This isn't documented.

Even worse, the list elements visited_children[0][0][0][0] and visited_children[0][0][0][1] that correspond to comment and ws (which one might naively expect to be Nodes, given that the empty optional match is a Node) are instead also lists. Perhaps this is the default behavior for rules, but again, this is not documented anywhere, and the documentation strongly implies that a rule match becomes a Node.

Please document the precise behavior for how the types of the intermediate children of a node are determined.

alexchandel avatar Sep 09 '24 02:09 alexchandel

This unpredictable alternation between lists and Nodes for optional matches makes visiting annoying, as instead of either testing for list length (if it were always a list) or testing for children (if it were always a node), instead we must do something less obvious and efficient.

alexchandel avatar Sep 09 '24 02:09 alexchandel

This behavior is controlled by you, not parsimonious. It isn't documented because it can't be documented, your code and your code alone is 100% responsible for this behavior.

If you parse something and there is no visitor method for a given rule, parsimonious throws and error and stops. So every single behavior is completely determined by your visitor methods and has nothing to do with parsimonious itself.

Based on your description, it sounds like you're using the catch-all visitor method shown in one of the examples:

    def generic_visit(self, node, visited_children):
        """The generic visit method."""
        return visited_children or node

There is no inconsistency there. visited_children is always a list of what was returned from each child node. Your code returns the visited children for every rule except terminal rules, and nodes themselves for terminal rules (rules without children).

So I would expect to see exactly what you're seeing: you're going to have nested lists with the nesting depth corresponding to tree depth. You get a node whenever there is a terminal node. Optional matches that don't match obviously will not have any visited children, so the node object is returned, exactly as you and the code you wrote says should happen. If it does match, and it isn't a terminal node (like in your grammar, it's an implicit rule with only one visited child, the thing in parentheses, which in turn has two visited children, comment and ws), then it will return the list of visited children. Again, exactly as you've written your code to do.

If you dislike the behavior you've chosen, then I suggest choosing different behavior. If you want optional rules that don't match to always return an empty list instead of a node, then just tell your code to do that:

    def generic_visit(self, node, visited_children):
        """The generic visit method."""
        if not visited_children and node.text == "":
            return []

        return visited_children or node

Regardless, this is not an issue with parsimonious or it's documentation.

metacollin avatar Sep 11 '25 08:09 metacollin

Regardless, this is not an issue with parsimonious or it's documentation.

It's not an issue with the code of parsimonious, whose generic_visit method is has a fairly clear docstring:

    def generic_visit(self, node, visited_children):
        """Default visitor method

        :arg node: The node we're visiting
        :arg visited_children: The results of visiting the children of that
            node, in a list

        I'm not sure there's an implementation of this that makes sense across
        all (or even most) use cases, so we leave it to subclasses to implement
        for now.

        """
        raise NotImplementedError('No visitor method was defined for this expression: %s' %
                                  node.expr.as_rule())

However, the example in the README does use the return visited_children or node, so perhaps that example could be improved, especially the docstring on the example generic_visit method.

lucaswiman avatar Sep 11 '25 17:09 lucaswiman