lark icon indicating copy to clipboard operation
lark copied to clipboard

lex - When an error happens, how can I display all tokens matched so far?

Open mbBRCM opened this issue 2 years ago • 5 comments

When an error happens (lark.exceptions.UnexpectedCharacters), there is usually some "Previous tokens" information such as this:

Previous tokens: Token('__ANON_0', 'CL79')` 

That only seems to contain the token immediately preceding the error, but not the ones before. Am I doing something wrong, or is there a way to display all tokens matched so far?

mbBRCM avatar Apr 13 '22 06:04 mbBRCM

It's possible to collect all the tokens by writing a postlexer.

Another way is to parse using the interactive parser.

Mind if I ask what you need it for?

erezsh avatar Apr 13 '22 06:04 erezsh

@erezsh I was trying to see what tokens were being matched so I could debug the lexer rules

mbBRCM avatar Apr 13 '22 07:04 mbBRCM

Can I still use the postlexer in my case, where an exception is thrown (so the lexing process isn't yet complete)?

mbBRCM avatar Apr 13 '22 07:04 mbBRCM

Yes, the postlexer gets the tokens one by one, so if you save them somewhere (like in a global list, or inside the postlexer instance), you will have the latest list.

Lark doesn't save those tokens, because we want to support memory-efficient streaming. But perhaps we could do it when debug=True.

erezsh avatar Apr 13 '22 08:04 erezsh

Lark doesn't save those tokens, because we want to support memory-efficient streaming. But perhaps we could do it when debug=True.

That would be wonderful for ease of development

mbBRCM avatar Apr 13 '22 13:04 mbBRCM