lark Support efficient InteractiveParser cloning

When using the InteractiveParser for things like backtracking, or checkpoint-parsing, it is necessary to make a copy of it. To do so, it makes a copy of ParserState, which in turn makes a deepcopy of the value_stack.

However, deepcopying the value_stack is only necessary because of a left-recursion optimization that I implemented in the parse-tree-builder, which modifies trees in-place. Otherwise, making a shallow copy would be enough. While this optimization is still important for performance of a straightforward parser (i.e. parses the text once without making copies of itself), it is a hindrance for more complicated use-cases.

I believe it is therefor desirable for Lark to support both modes - optimized for regular parsing, and for interactive parsing.

Switching between those modes can be done simply when initializing Lark, say with modify_trees_in_place=True/False.

Another, slightly more complicated option, but nicer for the users, is that when a copy of an interactive parser is made, it automatically switches to this mode. It's more complicated, because such a change requires re-building the ParseTreeBuilder instance, which otherwise the InteractiveParser shouldn't be aware of.

I'm not sure it's worth to complicate the code this much for an uncommon use-case that is meant for advanced users. Maybe it's okay to put the burden of choosing the right optimization on them. But maybe it is?

@MegaIng Do you have an opinion?

Apr 30 '22 09:04 erezsh

I don't think setting the option automatically when using parse_interactive is required, it can just be in all examples and mentioned in that functions documentation that it's suggested to have that option active.

Also, you might want to do use InteractiveParser without creating copies, i.e. if you only gradually get the inputs, but don''t care about backtracking.

Apr 30 '22 11:04 MegaIng

Yes, I meant to only switch mode when a copy is made, not when using the InteractiveParser.

But yeah, it's a bit of an overkill.

Apr 30 '22 11:04 erezsh

I subclassed the InteractiveParser. These changes resulted in ~10x faster accepts(). Pretty hacky, but might help.

class FastParserState(ParserState):
    copy_memo = {}
    def __copy__(self):
        new_value_stack = []
        for value in self.value_stack:
            key = f"{id(self)}_{id(value)}"
            if key not in self.copy_memo:
                self.copy_memo[key] = deepcopy(value, self.copy_memo)
            new_value_stack.append(self.copy_memo[key])

        new_instance = type(self)(
            self.parse_conf,
            self.lexer, # XXX copy
            copy(self.state_stack),
            new_value_stack,
        )

        self.copy_memo[id(self)] = new_instance
        return new_instance


class FastInteractiveParser(InteractiveParser):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.parser_state = FastParserState(
            self.parser_state.parse_conf,
            self.parser_state.lexer,
            self.parser_state.state_stack,
            self.parser_state.value_stack,
        )

    def __copy__(self):
        return type(self)(
            self.parser,
            copy(self.parser_state),
            copy(self.lexer_thread),
        )

Dec 19 '23 17:12 lapp0

lark lark copied to clipboard

Support efficient InteractiveParser cloning

lark
lark copied to clipboard