lark
lark copied to clipboard
Support efficient InteractiveParser cloning
When using the InteractiveParser for things like backtracking, or checkpoint-parsing, it is necessary to make a copy of it. To do so, it makes a copy of ParserState, which in turn makes a deepcopy of the value_stack.
However, deepcopying the value_stack is only necessary because of a left-recursion optimization that I implemented in the parse-tree-builder, which modifies trees in-place. Otherwise, making a shallow copy would be enough. While this optimization is still important for performance of a straightforward parser (i.e. parses the text once without making copies of itself), it is a hindrance for more complicated use-cases.
I believe it is therefor desirable for Lark to support both modes - optimized for regular parsing, and for interactive parsing.
Switching between those modes can be done simply when initializing Lark, say with modify_trees_in_place=True/False.
Another, slightly more complicated option, but nicer for the users, is that when a copy of an interactive parser is made, it automatically switches to this mode. It's more complicated, because such a change requires re-building the ParseTreeBuilder instance, which otherwise the InteractiveParser shouldn't be aware of.
I'm not sure it's worth to complicate the code this much for an uncommon use-case that is meant for advanced users. Maybe it's okay to put the burden of choosing the right optimization on them. But maybe it is?
@MegaIng Do you have an opinion?
I don't think setting the option automatically when using parse_interactive is required, it can just be in all examples and mentioned in that functions documentation that it's suggested to have that option active.
Also, you might want to do use InteractiveParser without creating copies, i.e. if you only gradually get the inputs, but don''t care about backtracking.
Yes, I meant to only switch mode when a copy is made, not when using the InteractiveParser.
But yeah, it's a bit of an overkill.
I subclassed the InteractiveParser. These changes resulted in ~10x faster accepts(). Pretty hacky, but might help.
class FastParserState(ParserState):
copy_memo = {}
def __copy__(self):
new_value_stack = []
for value in self.value_stack:
key = f"{id(self)}_{id(value)}"
if key not in self.copy_memo:
self.copy_memo[key] = deepcopy(value, self.copy_memo)
new_value_stack.append(self.copy_memo[key])
new_instance = type(self)(
self.parse_conf,
self.lexer, # XXX copy
copy(self.state_stack),
new_value_stack,
)
self.copy_memo[id(self)] = new_instance
return new_instance
class FastInteractiveParser(InteractiveParser):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.parser_state = FastParserState(
self.parser_state.parse_conf,
self.parser_state.lexer,
self.parser_state.state_stack,
self.parser_state.value_stack,
)
def __copy__(self):
return type(self)(
self.parser,
copy(self.parser_state),
copy(self.lexer_thread),
)