gccrs icon indicating copy to clipboard operation
gccrs copied to clipboard

`DelimTokenTree`-based parsing

Open powerboat9 opened this issue 7 months ago • 3 comments

For a future project, it would be nice to convert the parser to run on DelimTokenTree, rather than a stream of tokens. It would make parsing simpler and help catch edge cases like #3790.

Implementation wise, we'd probably want to do a first pass through the list of tokens, verifying that there aren't any mismatched delimiters. During that pass we could also cache distances between delimiter token pairs, sorted by left delimiter location. For example, [ ( { a } b ) c d ] could produce the distance cache [8, 4, 1]. Then we could have something like a ManagedTokenSource, say, TreeSource, that gives out either tokens or delimited token trees (via new instances of TreeSource). It would only have to store an iterator into the list of tokens, an ending iterator into the list of tokens, and an iterator into the distance cache.

Producing an initial TreeSource would be simple. Additionally, spinning off TreeSource instances for nested delimited pairs would be cheap, and the distance cache would make advancing the original TreeSource instance past the nested delimited pair cheap as well.

powerboat9 avatar May 10 '25 03:05 powerboat9

It looks like this would require storing a file's worth of lexed tokens in a vector, instead of single-passing them, but that doesn't seem like too much of a drawback.

powerboat9 avatar May 10 '25 04:05 powerboat9

It could also allow us to move away from having Parser as a template, by providing some indirection away from Lex and ProcMacroInvocLexer.

powerboat9 avatar May 10 '25 04:05 powerboat9

Not going to work on it right now though, since I'm more focused on getting nr2.0 through

powerboat9 avatar May 10 '25 04:05 powerboat9