neotoma icon indicating copy to clipboard operation
neotoma copied to clipboard

Fix charclass parser to ignore whitespace and quote backslash

Open mkurkov opened this issue 9 years ago • 6 comments

Hi, working on Javasript PEG grammar for neotoma, I found that it is not very convinient to use charclasses - it uses whitespace and doesn't quote backslashes, so I did this patch to fix it. PEG ignores whitespace everywhere so I thought it is right thing to do in charclasses too and this makes JS grammar whith pretty big charclasses more readable. Like this: Before:

UnicodeLetter   <- [\\p{Lu}\\p{Ll}\\p{Lt}\\p{Lm}\\p{Lo}\\p{Nl}];

After:

UnicodeLetter   <- [ \p{Lu} 
                     \p{Ll}
                     \p{Lt}
                     \p{Lm} 
                     \p{Lo} 
                     \p{Nl} ];

mkurkov avatar Jun 19 '15 18:06 mkurkov

Travis doesn't seem to like R14

bookshelfdave avatar Jun 22 '15 11:06 bookshelfdave

Looks like this is due to outdated travis config. #35 should fix this.

mkurkov avatar Jun 23 '15 12:06 mkurkov

It would be nice to have the wiki reflect this change too.

bookshelfdave avatar Jun 23 '15 13:06 bookshelfdave

@metadave @mkurkov I'm not sure I will accept this change yet. It seems innocuous but might break other users' grammars.

seancribbs avatar Jun 23 '15 14:06 seancribbs

@seancribbs Well, I see, it is not backward compatible. I of course can rewrite grammar so charclasses will not contain whitespaces and newlines, but I think this change in line with PEG syntax. Maybe we can have it in version 2.0, I see you are working on it in separate branch?

mkurkov avatar Jun 23 '15 14:06 mkurkov

@mkurkov Right, and sadly that's really far off, given the huge task I have taken on there. In an ideal world, the character class will be compiled out to a case expression rather than using re internally -- unless I find that re has better and more predictable performance.

seancribbs avatar Jun 23 '15 16:06 seancribbs