lark icon indicating copy to clipboard operation
lark copied to clipboard

Is it possible to parse a list of terminals?

Open Daniel63656 opened this issue 1 year ago • 2 comments

I already have my tokens as a list of terminals, like this in a toy grammar:

start: A B C A: "a" B: "b" C: "c"

tokens = ["a", "b", "c"]

Can I use a parser that accepts this lists to prevent the unnecessary scanning step? All lexers throw TypeError: expected string or bytes-like object, got 'list

Daniel63656 avatar Nov 16 '23 09:11 Daniel63656

Yes. See this example: https://github.com/lark-parser/lark/blob/master/examples/advanced/custom_lexer.py

erezsh avatar Nov 16 '23 10:11 erezsh

You need to add Token types to your list of strings and construct lark.Token instances, otherwise lark has no idea what to do with your strings. This is the primary job the lexers. In your case, the corresponding token types are all just str.upper, so Token(c.upper(), c) constructs the correct token. For your actual usecase, you probably will need to do something more complex.

MegaIng avatar Nov 16 '23 14:11 MegaIng