Maximum munch?
Hi,
this is more of a question than an issue about Moo, so here goes:
I have the following lexer:
const lexer = moo.compile({
TERM: /[a-z]+/,
PREFIXTERM: /\*|(?:[a-z]+\*)/,
});
On input moo, this will return:
{"type":"TERM","value":"moo","text":"moo","offset":0,"lineBreaks":0,"line":1,"col":1}
On input moo* I would want it to return a single PREFIXTERM, but I'm getting this instead:
{"type":"TERM","value":"moo","text":"moo","offset":0,"lineBreaks":0,"line":1,"col":1}
{"type":"PREFIXTERM","value":"*","text":"*","offset":3,"lineBreaks":0,"line":1,"col":4}
How can I get it to go for a single PREFIXTERM?
Have you tried swapping the order of the rules? Earlier rules take precedence.
I can't really do that, because I also have:
WILDTERM: /(?:[a-z*?]+)/,
which is a superset of TERM phrases. In this setup, if the input is mo*o, TERM consumes the prefix, and then PREFIXTERM consumes the asterisk, etc.
This would work, if the longest match was picked. Instead, the earliest match is. I was wondering how to get around this issue.
I'm afraid I don't exactly understand what you're trying to do.
Moo doesn't choose the regexp with the longest match -- indeed, because it combines all the regexps into a single JS regexp for speed, it can't do this. Instead, the first regexp will match: earlier rules take precedence.
It's hard to provide a solid recommendation without knowing more about the language you're trying to parse. But usually people seem to solve problems that sound like this by:
- varying the order of the rules
- using keywords
- using (negative) lookahead.