design-patterns-for-parser-combinators icon indicating copy to clipboard operation
design-patterns-for-parser-combinators copied to clipboard

In-Place Lexing

Open j-mie6 opened this issue 3 years ago • 2 comments

Although it's often not specified by the grammar, parsers should be careful to consider how whitespace is consumed and different tokens separated lexically. This can get really messy though, and in the worst case a parser writer might be tempted to insert whitespace handling logic all over the parser!

Dealing with tokens while parsing is intrusive to the overall structure of the parser and introduces clutter.

Most of the time, parsers handle this by separating lexing and parsing into two separate stages. The lexer handles whitespace and distinguishing tokens, and then they parser finds structure in the token stream. With parser combinators, however, this isn't always the case. Often, we want to incorporate the two stages together into a single parser. This allows token selection to be dependent on what part of the grammar is being explored: this is really helpful for distinguishing SUBTRACT, NEGATE, and negative integer literals, for instance!

Surely there has to be a way of leveraging the reusable higher-order combinators to abstract lexical parts of the parser nicely and uniformly?

j-mie6 avatar Aug 23 '21 14:08 j-mie6