lark
lark copied to clipboard
Contextual Lexer Leaking "Spam" Terminals
Describe the bug
In a contextual lexing setup, where numerous BasicLexer instances get spawned for the various contexts, many of these instances are being created with toxic "spam" terminals. In other words, their terminals lists are including terminals that have no possibility of contributing to the fulfilment of any rule in their context.
This is causing text to be mis-classified in places, resulting in creation of tokens which are illegal in the context, which in turn crashes the parse.
To Reproduce
At this stage I would struggle to distil my large proprietary parsing codebase into a simple example. With this bug report, I'm asking if there are any general troubleshooting tips for finding out why invalid terminals are leaking into the context, and how to prevent this, apart from writing some very aggressive introspection into my BasicLexer subclass.
I've tried juggling terminal priorities, but this is just an 'arms race' that doesn't resolve. If a terminal priority change fixes one context, it breaks others, and vice versa.