parsec Token parser consumes newlines

As described in this SO question, many of the token parser functions consume newlines, which is often not the desired behaviour. It's also not easy, without copying most of the code, to change that. It'd be nice if this behaviour were configurable!

Jan 29 '15 14:01 jkarni

It would be nice indeed

May 07 '15 17:05 doppioandante

Looking at the source, we could see that isSpace is used. I think we could add one more field in the LanguageDef record, like treatNewLineAsSpace :: Bool, which controls how isSpace from Data.Char is called. I could create a PR if this proposal sounds sensible.

Jul 03 '15 22:07 albertnetymk

I could create a PR if this proposal sounds sensible.

This would be cool. I'm also interested in hearing @aslatter 's opinion about the improvement.

Jul 06 '15 05:07 mrkkrp

@albertnetymk What if someone wants to treat separately other space characters like tab? I have never had the need for this, but I'm wondering if a more general solution can be achieved.

Jul 06 '15 19:07 doppioandante

@doppioandante One simple solution could be creating one more field, like treatTabAsSpace :: Bool. By that time, it's probably good to come up with a more general solution. For now, I think treating new lines specially is enough, avoid over engineering.

Jul 06 '15 19:07 albertnetymk

@jkarni, @doppioandante Work has been started on this issue in Megaparsec, see branch new-lexer. While it may be not entirely usable for now, you could try it and give your feedback. I've posted a comment in dedicated issue thread: https://github.com/mrkkrp/megaparsec/issues/5.

Sep 02 '15 14:09 mrkkrp

@mrkkrp Cool, I had solved forking parsec and introducing a dirty hack. My project has a really dumb parser (basically some fake asm) but I'll certainly try to switch to megaparsec and see if it clicks.

Sep 02 '15 15:09 doppioandante

@doppioandante, Thanks! If it doesn't click, please describe your experience, so we can correct our shortcomings. Megaparsec is almost ready for its first release, lexer (token parser is Parsec's terminology) is the only thing that's left.

Sep 02 '15 15:09 mrkkrp

parsec parsec copied to clipboard

Token parser consumes newlines

parsec
parsec copied to clipboard