peggy icon indicating copy to clipboard operation
peggy copied to clipboard

Implementation of the ranges (with delimiters)

Open Mingun opened this issue 2 years ago • 3 comments

Reincarnation of the https://github.com/pegjs/pegjs/pull/209. Previous attempt, closed by technical reasons: #208

This is an implementation of the ranges proposal with batteries, i.e.:

  • ability to use numeric constants to specify minimum, maximum or exact repetition count:
    more2 = "a"|2..|;
    upTo3 = "a"|..3|;
    
  • ability to use preceding label as a range boundary:
    list = count:n5 @n5|count|;
    n5 = n:[0-9]|5| { return parseInt(n); };
    
  • ability to use function as a boundary:
    list = "a"|{ return options.listSize; }|;
    

The syntax chosen saves the < and > characters for the template definitions where their are more natural. [ and ] already used for character class definitions and ( and ) already used for grouping.

Mingun avatar Jun 11 '22 19:06 Mingun

What if there was a way to define parser-functions, and it would be just repeat(x, 2..) and repeat(x, ..3)?

I think by creating a new syntax for every feature it will resemble Perl and its regular expressions quite soon (arguably, it already resembles it way too much).

The thing not mentioned in PR description, but probably the most awaited by me, is delimited repetition (a|.., b|).

  1. Is there a CI version of this PR to play with it?
  2. What AST does it generate?
  3. I remember there was a discussion in PEG.js issues to add it as a % b operator. Does this PR subsume that feature request?
  4. I didn't find tests attempting to break parsing of || brackets. Can we be sure that some .. | .. | | .. , .. | .. , .. | | will get parsed properly?

reverofevil avatar Jun 11 '22 20:06 reverofevil

  1. You can checkout this branch, build minified version of a parser
    npm run build
    # or only
    npm run rollup
    npm run terser
    npm run deploy
    
    and go to docs/online.html
  2. You can use a recently added peggy --ast option to look at it
  3. I personally against any syntax which does not clearly indicate where the repeat expression ends. Using a % b falls into this category. Last time I tried to summarize possible options here
  4. Yes, but you should take into account, that you cannot put two suffix operators one after another, you must wrap the first in parentheses:
    // This is all forbidden
    start1 = .. | .. | | .. , .. | .. , .. | |;
    start2 = .**;
    start3 = .++;
    start4 = .??;
    
    // This will work
    start5 = .(. | .. |) | .. , .. | .. , .. | |;
    start6 = (.*)*;
    start7 = (.+)+;
    start8 = (.?)?;
    

Mingun avatar Jun 11 '22 21:06 Mingun

What if there was a way to define parser-functions

I think we will soon need to determine what our long-term syntax extensibility approach is, before we run out of adequate syntax to act as the extensibility point. I don't think repeat(n) will work, since that looks like a rule named repeat followed by a group. <repeat min="a" max="b">...</repeat> would work, but that's pretty ugly, and even I don't like XML that much.

hildjj avatar Jun 11 '22 21:06 hildjj

@Mingun do you have time to work on this? If not, I'll try to rebase it at least.

hildjj avatar Feb 15 '23 17:02 hildjj

I'll look at weekend

Mingun avatar Feb 15 '23 19:02 Mingun

Ok, the bug seems to be fixed, I'll check it once again on this week and add a changelog entry for plugins' authors.

Mingun avatar Feb 19 '23 20:02 Mingun

The diff between the 7-month old version and the current:

  • added forgotten tests for function-based boundaries
  • fixes some minor errors in tests for variable-based boundaries
  • added a notice for plugins' authors
  • fix some minor misprints in tests

All done, ready for merge.

Mingun avatar Feb 21 '23 17:02 Mingun

This is fantastic work. Thank you very much.

I'm going to merge as-is, then when I do a full review of the documentation page, there may be a few nits I clean up while fixing other things.

hildjj avatar Feb 21 '23 19:02 hildjj