parsec icon indicating copy to clipboard operation
parsec copied to clipboard

Token parser consumes newlines

Open jkarni opened this issue 10 years ago • 8 comments

As described in this SO question, many of the token parser functions consume newlines, which is often not the desired behaviour. It's also not easy, without copying most of the code, to change that. It'd be nice if this behaviour were configurable!

jkarni avatar Jan 29 '15 14:01 jkarni

It would be nice indeed

doppioandante avatar May 07 '15 17:05 doppioandante

Looking at the source, we could see that isSpace is used. I think we could add one more field in the LanguageDef record, like treatNewLineAsSpace :: Bool, which controls how isSpace from Data.Char is called. I could create a PR if this proposal sounds sensible.

albertnetymk avatar Jul 03 '15 22:07 albertnetymk

I could create a PR if this proposal sounds sensible.

This would be cool. I'm also interested in hearing @aslatter 's opinion about the improvement.

mrkkrp avatar Jul 06 '15 05:07 mrkkrp

@albertnetymk What if someone wants to treat separately other space characters like tab? I have never had the need for this, but I'm wondering if a more general solution can be achieved.

doppioandante avatar Jul 06 '15 19:07 doppioandante

@doppioandante One simple solution could be creating one more field, like treatTabAsSpace :: Bool. By that time, it's probably good to come up with a more general solution. For now, I think treating new lines specially is enough, avoid over engineering.

albertnetymk avatar Jul 06 '15 19:07 albertnetymk

@jkarni, @doppioandante Work has been started on this issue in Megaparsec, see branch new-lexer. While it may be not entirely usable for now, you could try it and give your feedback. I've posted a comment in dedicated issue thread: https://github.com/mrkkrp/megaparsec/issues/5.

mrkkrp avatar Sep 02 '15 14:09 mrkkrp

@mrkkrp Cool, I had solved forking parsec and introducing a dirty hack. My project has a really dumb parser (basically some fake asm) but I'll certainly try to switch to megaparsec and see if it clicks.

doppioandante avatar Sep 02 '15 15:09 doppioandante

@doppioandante, Thanks! If it doesn't click, please describe your experience, so we can correct our shortcomings. Megaparsec is almost ready for its first release, lexer (token parser is Parsec's terminology) is the only thing that's left.

mrkkrp avatar Sep 02 '15 15:09 mrkkrp