peggy
peggy copied to clipboard
Implementation of the ranges (with delimiters)
Reincarnation of the https://github.com/pegjs/pegjs/pull/209. Previous attempt, closed by technical reasons: #208
This is an implementation of the ranges proposal with batteries, i.e.:
- ability to use numeric constants to specify minimum, maximum or exact repetition count:
more2 = "a"|2..|; upTo3 = "a"|..3|;
- ability to use preceding label as a range boundary:
list = count:n5 @n5|count|; n5 = n:[0-9]|5| { return parseInt(n); };
- ability to use function as a boundary:
list = "a"|{ return options.listSize; }|;
The syntax chosen saves the <
and >
characters for the template definitions where their are more natural.
[
and ]
already used for character class definitions and
(
and )
already used for grouping.
What if there was a way to define parser-functions, and it would be just repeat(x, 2..)
and repeat(x, ..3)
?
I think by creating a new syntax for every feature it will resemble Perl and its regular expressions quite soon (arguably, it already resembles it way too much).
The thing not mentioned in PR description, but probably the most awaited by me, is delimited repetition (a|.., b|
).
- Is there a CI version of this PR to play with it?
- What AST does it generate?
- I remember there was a discussion in PEG.js issues to add it as
a % b
operator. Does this PR subsume that feature request? - I didn't find tests attempting to break parsing of
||
brackets. Can we be sure that some.. | .. | | .. , .. | .. , .. | |
will get parsed properly?
- You can checkout this branch, build minified version of a parser
and go tonpm run build # or only npm run rollup npm run terser npm run deploy
docs/online.html
- You can use a recently added
peggy --ast
option to look at it - I personally against any syntax which does not clearly indicate where the repeat expression ends. Using
a % b
falls into this category. Last time I tried to summarize possible options here - Yes, but you should take into account, that you cannot put two suffix operators one after another, you must wrap the first in parentheses:
// This is all forbidden start1 = .. | .. | | .. , .. | .. , .. | |; start2 = .**; start3 = .++; start4 = .??; // This will work start5 = .(. | .. |) | .. , .. | .. , .. | |; start6 = (.*)*; start7 = (.+)+; start8 = (.?)?;
What if there was a way to define parser-functions
I think we will soon need to determine what our long-term syntax extensibility approach is, before we run out of adequate syntax to act as the extensibility point. I don't think repeat(n)
will work, since that looks like a rule named repeat
followed by a group. <repeat min="a" max="b">...</repeat>
would work, but that's pretty ugly, and even I don't like XML that much.
@Mingun do you have time to work on this? If not, I'll try to rebase it at least.
I'll look at weekend
Ok, the bug seems to be fixed, I'll check it once again on this week and add a changelog entry for plugins' authors.
The diff between the 7-month old version and the current:
- added forgotten tests for function-based boundaries
- fixes some minor errors in tests for variable-based boundaries
- added a notice for plugins' authors
- fix some minor misprints in tests
All done, ready for merge.
This is fantastic work. Thank you very much.
I'm going to merge as-is, then when I do a full review of the documentation page, there may be a few nits I clean up while fixing other things.