alex icon indicating copy to clipboard operation
alex copied to clipboard

Agda-style Lexing

Open Ericson2314 opened this issue 3 years ago • 3 comments

I have been mulling this for a while, but the difficulties in fixing #197 made it feel more urgent.

As a (rare) user of Adga, I have been very fond of it's lexing, which seems very simple, and more concerned with the boundaries between tokens rather than the contents of tokens themselves. (You can seem me singing its praises in, e.g. https://github.com/ghc-proposals/ghc-proposals/discussions/444#discussioncomment-1509256).

I have a few questions on this.

  1. Do the people implementing Agda agree with this premise, that lexing in Agda is significantly different and/or simpler than that in other languages? Or am I reading to much into it as a user guessing how it works?

  2. If the premise is valid (per question 1), is there anything Alex might do to make this easier / a more obvious way to do things? I suppose I should study https://github.com/agda/agda/blob/master/src/full/Agda/Syntax/Parser/Lexer.x

  3. Should we transition Alex itself to lex more in this style, basically requiring more things to be space-separated?

CC @andreasabel who conveniently works on both Alex and Agda, and @int-index who spearheaded the similar left right lexing context rules for Haskell.

Ericson2314 avatar Jan 23 '22 22:01 Ericson2314

In Agda, identifiers and operators need to be white-space separated.
The only tokens that need not be white-space separated are (, ), {, }, ; (maybe I am forgetting one). However, if you are suggesting that Agda is doing some post-processing on tokens to e.g. split 2+3 into 2 + 3, this is not the case, so 2+3 is simply an identifier and has nothing to do with numbers or summation whatsoever.

Frankly, I do not understand what you are intending here, or how Alex should be changed. At its core Alex implements traditional 1960s style lexing (classic "formal languages and automata" stuff).

andreasabel avatar Jan 25 '22 09:01 andreasabel

@andreasabel Well, for example, does the Agda lexer use copious right contexts to to find those whitespace boundaries? The current Alex docs warn that right contexts can make things slow, but I suspect either the warning is overly pessimistic, or the situation can be improved.

Ericson2314 avatar Jan 28 '22 23:01 Ericson2314

Dunno. Right contexts are used in several places: For comments: https://github.com/agda/agda/blob/798be60d51a56a8c74cfd309c1498b070240e686/src/full/Agda/Syntax/Parser/Lexer.x#L110-L125 For layout: https://github.com/agda/agda/blob/798be60d51a56a8c74cfd309c1498b070240e686/src/full/Agda/Syntax/Parser/Lexer.x#L135-L140 Also for: {{ https://github.com/agda/agda/blob/798be60d51a56a8c74cfd309c1498b070240e686/src/full/Agda/Syntax/Parser/Lexer.x#L224-L226

andreasabel avatar Jan 30 '22 16:01 andreasabel