bennu icon indicating copy to clipboard operation
bennu copied to clipboard

Non-Greedy Parsers

Open mattbierner opened this issue 12 years ago • 1 comments

Probably not something for core library but defining parsers that are locally non-greedy but consume as much input as required for a successful parsing would be interesting.

One example are the quantifiers trailed by ? in regular expressions:

 /a+?b/.exec('aab');

A naive non greedy strategy is to just consume the minimum number of character until the individual parser succeeds like:

sequence(
     atLeast(1, character('a')),
     character('b'));

But for the input aab, that stops at ab after consuming only one a and then fails.

If we know the parser in advance we can also write:

sequence(
     character('a'),
     manyTil(character('b'), character('a')));

But it is not possible to write a parser that exhibits this behavior without knowing what the next parser, in this case character('b').

parse-re

parse-re does this by attempting completions with the continuation and feeding more data on failure until either the completions succeeds or the parser fails:

atMost = \max, p ->
    (max === 0 ? always(stream.end) :
        \state, m, cok, cerr, eok, eerr -> 
            let r = parse.trampoline <| eok(stream.end, state, m) in
                (r ? eok(stream.end, state, m) :
                    parse.cons(p, atMost(max, p))(state, m, cok, cerr, eok, eerr)));

But this only works because failure always returns null. If failure does something like throw an error, this would crash.

mattbierner avatar Nov 14 '13 16:11 mattbierner

You mention manyTil - what could a implementation for manyTil look like, given that we do know the other parser?

felixSchl avatar Dec 21 '14 07:12 felixSchl