parslet icon indicating copy to clipboard operation
parslet copied to clipboard

automatic whitespace handling

Open ghazel opened this issue 14 years ago • 8 comments

Parslet grammers are littered with whitespace checks, making them harder to read. Leaving them out fails to parse valid things properly. Take the javascript parser as an example: https://github.com/matthewd/capuchin/blob/d47f4b19eb888b6a4fc5428d3d1fdfcdb551b183/lib/capuchin/parser.rb

There is sp? everywhere. There are very few cases where whitespace is not allowed, and decorating those cases with a different operator to join the atoms seems sufficient.

So, this is a feature request for some sort of functionality like this. pyPEG has a skipws option which seems to work ok.

ghazel avatar Jul 24 '11 00:07 ghazel

I can see why you would want this, but am not convinced if we really need it. After all, we can process parslet atoms as if they were data, so appending whitespace to all and everything will not be hard. This really belongs to the mailing list - and if you provide a patch/ an implementation idea, we'll consider it more thoroughly.

kschiess avatar Jul 26 '11 06:07 kschiess

I have some code that implements this: https://github.com/kschiess/parslet/compare/master...mikeyhew:ignore-whitespace. It changes the >> operator so that it consumes 0 or more spaces in between parslets, and adds << for when you don't want to allow spaces. I'm been using it in this project and it has worked well so far, making it more pleasant to write the grammar.

@kschiess It would be interesting to hear what you think about the general idea, as well as whether this would break anything. (I think it caused an error with the infix_expression helper already, but didn't spend much time debugging.)

mikeyhew avatar Nov 15 '16 04:11 mikeyhew

I'll take a look soon.

kschiess avatar Nov 24 '16 13:11 kschiess

I like the idea that this is an option you give to the whole parse process. Perhaps we could (as an implementation) create a source that skips whitespace? I do realize this is a problem for a lot of people.

kschiess avatar Jan 16 '17 08:01 kschiess

Hi, any progress on this? This would be a valuable addition. Thanks.

aaronlippold avatar Sep 09 '17 14:09 aaronlippold

We would welcome a PR that solves this, however we won't be able to dedicate our time to this.

kschiess avatar Nov 19 '17 15:11 kschiess

@kschiess the problem with a global option is that it restricts what you can parse. Even if your grammar is mostly whitespace-insensitive, there are still times when you need >> without whitespace in between. For example, parsing identifiers:

rule(:ident) { match['a-zA-Z'] >> match['a-zA-Z0-9'] }
# how would you do this if the `Source` ignores whitespace?

mikeyhew avatar Nov 19 '17 17:11 mikeyhew

I'll merge any kind of solution that doesn't lock people into whitespace-agnostic parsers. The default should be not to ignore whitespace. But I think we can make it easy to have a choice.

kschiess avatar Feb 13 '18 08:02 kschiess