lexgen icon indicating copy to clipboard operation
lexgen copied to clipboard

A fully-featured lexer generator, implemented as a proc macro

Results 18 lexgen issues
Sort by recently updated
recently updated
newest added

I need to match a token which contains unicode scalar values in the categories L, M, N, P, S and Cf. I see three different ways to solve this: 1....

Recently I debugged a lexer with this rule: ``` ("0b" | "0o" | "0x")? ($digit | '_')* $id? = ..., ``` This regex accepts empty string, so the lexgen state...

For the algorithm, in addition to dragon book, there's a paper "Fast brief practical DFA minimization" which is paywalled but available on sci-hub. (doi:10.1016/j.ipl.2011.12.004) (edit: also available here https://www.cs.cmu.edu/~cdm/papers/Valmari12.pdf) A...

perf
code size

If we generate the DFA directly without going through NFA: - Should be more efficient - Should generate a slightly better DFA

perf

This would improve performance, as no utf8 decoding is necessary. This is what re2 does too.

perf

Suppose I'm trying to lex this invalid Rust code: `b"\xa"`. The problem here is `\x` needs to be followed by two hex digits, not one. If I run this with...

feature
design

Sometimes I want a lexer rule to be able to return multiple tokens, e.g. to emit a dummy token so parser can use it as an end-marker for some syntax....

feature

Currently we have these transitions in NFAs: ```rust struct State { char_transitions: Map, range_transitions: RangeMap, empty_transitions: Set, any_transitions: Set, end_of_input_transitions: Set, ... } ``` (I was confused for a few...

refactoring

In the Lua lexer I see code like ```rust '>' => { self.0.set_accepting_state(Lexer_ACTION_13); // 2 match self.0.next() { None => { self.0.__done = true; match self.0.backtrack() { // 6 ......

perf

Some of the search tables for built-in unicode regular expressions are quite large, but I think 99.9999% of the time they will match ASCII characters, so we should implement a...

perf