parsec
parsec copied to clipboard
Token parser consumes newlines
As described in this SO question, many of the token parser functions consume newlines, which is often not the desired behaviour. It's also not easy, without copying most of the code, to change that. It'd be nice if this behaviour were configurable!
It would be nice indeed
Looking at the source, we could see that isSpace
is used. I think we could add one more field in the LanguageDef
record, like treatNewLineAsSpace :: Bool
, which controls how isSpace
from Data.Char
is called. I could create a PR if this proposal sounds sensible.
I could create a PR if this proposal sounds sensible.
This would be cool. I'm also interested in hearing @aslatter 's opinion about the improvement.
@albertnetymk What if someone wants to treat separately other space characters like tab? I have never had the need for this, but I'm wondering if a more general solution can be achieved.
@doppioandante One simple solution could be creating one more field, like treatTabAsSpace :: Bool
. By that time, it's probably good to come up with a more general solution. For now, I think treating new lines specially is enough, avoid over engineering.
@jkarni, @doppioandante Work has been started on this issue in Megaparsec, see branch new-lexer
. While it may be not entirely usable for now, you could try it and give your feedback. I've posted a comment in dedicated issue thread: https://github.com/mrkkrp/megaparsec/issues/5.
@mrkkrp Cool, I had solved forking parsec and introducing a dirty hack. My project has a really dumb parser (basically some fake asm) but I'll certainly try to switch to megaparsec and see if it clicks.
@doppioandante, Thanks! If it doesn't click, please describe your experience, so we can correct our shortcomings. Megaparsec is almost ready for its first release, lexer (token parser is Parsec's terminology) is the only thing that's left.