npeg icon indicating copy to clipboard operation
npeg copied to clipboard

Implement handling of left recursion

Open zevv opened this issue 3 years ago • 5 comments

Handling left recursion is problematic in PEGs, see https://en.wikipedia.org/wiki/Parsing_expression_grammar#Indirect_left_recursion for a nice explanation of the problem.

zevv avatar Nov 04 '22 18:11 zevv

People smarter than me (@sqmedeiros) have thought hard about this problem, and there seems to be a viable solution for this, described in this paper: https://github.com/zevv/npeg/blob/master/doc/papers/Left_recursion_in_parsing_expression_grammars.pdf

I've been trying to get a good grasp of this paper, but the formal description seems to be quite over my head and I'm not sure how this model would fit into NPeg.

zevv avatar Nov 07 '22 15:11 zevv

Hi, @zevv . Thanks for mentioning my paper.

I will try to explain the idea without the formalism. Essentially, you should have a map where the keys have the form (A, i), where A is left-recursive variable and i is a position of the input. The initial value associated with this key should be a value as fail, indicating that it's not possible to match A at position i. The value stored in this map will be the result of matching A at input position i.

When we try to match the right-hand side of A, the matching of the left-recursive invocation of A will fail, but the matching of the right-hand side o A can still succeed, usually using a non-left-recursive alternative. See the example below: A -> A a / b

Input: "bac" Position: 123

When matching the first letter of the input, the alternative A a will fail, as the result of matching A at position 1 is fail. Fortunately, the second alternative succeeds. In this case, we need to try to increment the input prefix matched by A. Thus, we should update our map to something as mymap[(A, 1)] = 2 and try to match the right-hand side of A again at input position 1. In case it is possible to match a larger input prefix, we update our map again and try another matching of the right-hand side of A at input position 1, otherwise we stop the matching and give 2 as the result of matching A at position 1. In this example,

I hope this helps. Feel free to ask anything else.

By the way, this extended version https://arxiv.org/pdf/1207.0443.pdf also discusses (see Section 4) the problem of operator precedence.

sqmedeiros avatar Nov 09 '22 12:11 sqmedeiros

Hi @sqmedeiros, thanks for taking the time to respond, much appreciated.

After chewing some more time on the paper, rereading a few times and and spending the evening at the kitchen table with pencil and notepad I was able to get a better grasp of the process; my current understanding more or less matches your above description, so it seems I'm on the right track! I will let this sink in for a bit and see if I can prototype an implementation one of these days.

Thanks for the updated paper, I'll include this one as well. Operator precedence is - I think - mostly a solved issue in NPeg; the NPeg language has been extended with some constructs to indicate precedence for specific rules, which NPeg then uses for precedence-climbing in the parser. (https://github.com/zevv/npeg#precedence-operators)

zevv avatar Nov 09 '22 12:11 zevv

Ah, what a shame I did not find the updated paper before: I think chapter 5 of the updated version (A Parsing Machine for Left-recursive PEGs) is going to help me out a lot.

Thanks again!

zevv avatar Nov 09 '22 12:11 zevv

Yes, chapter 5 is another way to describe how left-recursion could work in PEGs. You could use a similar idea without having to follow an approach based on a parsing machine. By the way, when implementing left-recursion, I think it is important to distinguish left-recursive rules from non-left recursive ones, so we only try to increase the matching of left-recursive rules.

sqmedeiros avatar Nov 09 '22 13:11 sqmedeiros