[PoC] Support Token Categories

Open ydah opened this issue 10 months ago • 0 comments

I considered introducing Token Categories into Lrama, inspired by Chevrotain's Token Categories. This mechanism allows multiple token types to be logically grouped together so they can be matched collectively in grammar rules.

Reference: Chevrotain Token Categories

For example, in parse.y, we currently define grammar rules like this:

p_cases : opt_else
        | p_case_body
        ;

This rule matches two token types: opt_else and p_case_body. If we introduce a Token Category named p_cases to group these token types, we could define the rule more concisely:

%token-categories <node> p_cases: opt_else p_case_body

By grouping multiple token types into a single category, grammar rules only need to consume the category, simplifying rule definitions.

In Chevrotain, tokens can belong to multiple categories by specifying categories in their definitions. However, this approach makes it difficult to see which tokens belong to a given category at a glance. To address this, Lrama explicitly lists token members in the category definition.

Feb 21 '25 17:02 ydah