lark Contextual Lexer Leaking "Spam" Terminals

Contextual Lexer Leaking "Spam" Terminals

Open davidmcnabnz opened this issue 10 months ago • 11 comments

Describe the bug

In a contextual lexing setup, where numerous BasicLexer instances get spawned for the various contexts, many of these instances are being created with toxic "spam" terminals. In other words, their terminals lists are including terminals that have no possibility of contributing to the fulfilment of any rule in their context.

This is causing text to be mis-classified in places, resulting in creation of tokens which are illegal in the context, which in turn crashes the parse.

To Reproduce

At this stage I would struggle to distil my large proprietary parsing codebase into a simple example. With this bug report, I'm asking if there are any general troubleshooting tips for finding out why invalid terminals are leaking into the context, and how to prevent this, apart from writing some very aggressive introspection into my BasicLexer subclass.

I've tried juggling terminal priorities, but this is just an 'arms race' that doesn't resolve. If a terminal priority change fixes one context, it breaks others, and vice versa.

Aug 30 '23 04:08 davidmcnabnz

lark lark copied to clipboard

Contextual Lexer Leaking "Spam" Terminals

lark
lark copied to clipboard