parsec
parsec copied to clipboard
`decimal`, `hexadecimal`, and `octal` parsers should consume trailing whitespace
Since these are lexemes (at least, they should be), they should behave like lexemes: consume trailing white space (but not leading). Currently they don't do it and I find it confusing. If you care to fix it, be careful, definitions of these parsers are used to define other things, so it's easy to introduce new bugs if you're not careful.
Descriptions of these parsers carefully don't call them lexemes, but this is not enough. If you want to preserve current behavior you should explicitly state that these are not lexemes and they do not consume trailing whitespace unlike all other members of GenTokenParser
.
They're not lexemes alone in many languages because those languages require a typesize suffix without whitespace - which finally makes them into a lexeme:
lexeme $ integer <* (char 'F' | char 'U' | char 'L') -- a numerical literal lexeme that's a bit C-like.
I think the documentation could be enhanced by describing that use-case and also it would be more intuitive if there were a non-lexeme variant of all those parsers for which it can make sense to prefix or suffix them for a language's "literal" features - and to say in the documentation that this has been done because it helps people to stop searching for a better way and just get stuck in with confidence that they've got the best program they could have.
That's exactly what we've had in Megaparsec for about two years now, e.g.: https://hackage.haskell.org/package/megaparsec-6.2.0/docs/Text-Megaparsec-Char-Lexer.html#v:octal. Documentation enhanced, all these things do not consume white space unless you wrap them in lexeme
combinator, then they start to consume it.
Parsec could copy some of this, but I'm not sure the changes are actually welcome.
@mrkkrp Well, if it's enhancing the documentation we're talking about, that's almost always welcome! :-)