moo Maximum munch?

Hi,

this is more of a question than an issue about Moo, so here goes:

I have the following lexer:

const lexer = moo.compile({
  TERM: /[a-z]+/,
  PREFIXTERM: /\*|(?:[a-z]+\*)/,
});

On input moo, this will return:

{"type":"TERM","value":"moo","text":"moo","offset":0,"lineBreaks":0,"line":1,"col":1}

On input moo* I would want it to return a single PREFIXTERM, but I'm getting this instead:

{"type":"TERM","value":"moo","text":"moo","offset":0,"lineBreaks":0,"line":1,"col":1}
{"type":"PREFIXTERM","value":"*","text":"*","offset":3,"lineBreaks":0,"line":1,"col":4}

How can I get it to go for a single PREFIXTERM?

May 29 '21 15:05 molnarp

Have you tried swapping the order of the rules? Earlier rules take precedence.

May 29 '21 19:05 tjvr

I can't really do that, because I also have:

WILDTERM: /(?:[a-z*?]+)/,

which is a superset of TERM phrases. In this setup, if the input is mo*o, TERM consumes the prefix, and then PREFIXTERM consumes the asterisk, etc.

This would work, if the longest match was picked. Instead, the earliest match is. I was wondering how to get around this issue.

May 30 '21 00:05 molnarp

I'm afraid I don't exactly understand what you're trying to do.

Moo doesn't choose the regexp with the longest match -- indeed, because it combines all the regexps into a single JS regexp for speed, it can't do this. Instead, the first regexp will match: earlier rules take precedence.

It's hard to provide a solid recommendation without knowing more about the language you're trying to parse. But usually people seem to solve problems that sound like this by:

varying the order of the rules
using keywords
using (negative) lookahead.

Jun 04 '21 13:06 tjvr