chevrotain
chevrotain copied to clipboard
User Defined Macros
Description
Now that Chevrotain no longer depends on Function.prototype.toString it is possible to allow end users to define their own MACROS. This means create their own Parsing DSL methods, e.g:
-
Repetition
-> Identical to MANY but simply named differently. - Twice -> Will Parse the provided grammar action twice
- RepetitionSep -> Like MANY_SEP but supports complex separators (not only single token separators).
This is effectively already supported, This feature is mainly about creating relevant docs and examples.
Tasks
- [x] Increase the number of bits allocated to DSL method indices, to avoid collisions when Macros are defined by end users.
- [x] Expose generic DSL methods without suffixes, e.g
many(1, ()=>{})
instead ofMANY1(()=>{})
to provide easy to use building blocks for users constructing macros. - [ ] Create a guide for macros
- [ ] Create runnable macros examples
- [ ] Move the detection of duplicate indices to the recording phase.
- Collisions are much more likely to occur with macros and may even be at a different level of the stack now, so a better stack track in a throw error will ease development flows.
Bonus Task
Not sure if this should be done/with/after this topic. Anyhow the MANY_SEP and AT_LEAST_ONE_SEP methods could be replaced with macros thus simplifying the internals of the Parser, this is even more interesting when one considers that the _SEP methods have limitations and are not quite consistent with the other APIs provided by Chevrotain:
- No GATE support.
- SEP is limited to a single token.
A possible complex macro example would be OR_BACKTRACK
which would try N subrules
in sequence backtracking each time a failure is encountered and perhaps even having a default fallback. (e.g fuzzy MATCH_ALL as the default).
This may be too complex for to introduce the concepts of macros though...
I would be interested in writing my own macro. Any pointers on where to get started?
My use case is extending my parser to have a OR_LONGEST
, which applies all rules, and takes the longest successful match. And yes, I'm aware of that being a tad inefficient (exponential in the worst case).
My use case is extending my parser to have a OR_LONGEST, which applies all rules, and takes the longest successful match.
I don't think this would be possible, Macros are just "sugar" for patterns which are already possible with Chevrotain. The concept of only matching the longest successful match is in conflict with a fixed lookahead parser.
You could implement it with back-tracking but it would be in-efficient. The https://github.com/TypeFox/chevrotain-allstar plugin for chevrotain may also be able to assist in longest match.
FYI, chevrotain-allstar
indeed always finds the longest matching OR
sequence.