nom icon indicating copy to clipboard operation
nom copied to clipboard

left recursion support?

Open Geal opened this issue 8 years ago • 5 comments
trafficstars

nom can recognize left recursive grammars, but at the price of losing some information. Something like res <- expr OP expr with expr possibly using res can be recognized, but left associativity must be infered from the AST.

Here are two papers about supporting left recursive grammars: http://richard.myweb.cs.uwindsor.ca/PUBLICATIONS/PADL_08.pdf https://arxiv.org/pdf/1207.0443.pdf?ref=driverlayer.com/web

How could those be implemented, at what cost?

Geal avatar Feb 14 '17 16:02 Geal

Not sure if this answer is entirely relevant (as I use it, expr does not use res, and I'm not certain it would work if that were the case), but here is how I do left recursion:

pub fn binop_precedence_n<'a>(i: &'a str) -> IResult<&'a str, Expression<'a>> {
        do_parse!(i,
            first: binop_precedence_n_plus_1 >>
            fold: fold_many0!(
                do_parse!(
                    tag_s!("OP") >> // or whatever parser
                    expr: binop_precedence_n_plus_1 >>
                    (expr)
                ),
                first,
                |acc: Expression<'a>, item: Expression<'a>| {
                    Expression::BinOpPrecedenceN(BinOpPrecedenceN{
                        left: Box::new(acc),
                        right: Box::new(item),
                    })
                }
            ) >>
            (fold)
        )
}

Knowing that I have an Expression enum with all possible expression types in it.

tbourvon avatar Apr 26 '17 14:04 tbourvon

I talked with @Geal briefly about reviving this discussion on irc.

Currently, I have implemented the algorithm described in [1] with pvm.

There is also another reference implementation of [1] with IronMeta. I have only briefly looked at there source code so I can not speak at length about there implementation.

The basic idea behind the algorithm is that a parser must halt or fail because it is handling a finite amount of input. Thus, if bounded recursive calls are strictly increasing (in terms of input consumed), you allow for the number of "recurses" to increase until the amount of consumed input is less than the prior trial. For PEGs it is in fact the case that recursive calls are strictly increasing. It is my belief that nom will also share this property.

Ultimately, this means that every call that is part of a left recursive cycle must do the bureaucracy of bounded recursion. This means that we'll have to backtrack to appropriate positions in the subject (the data we're trying to parse). If nom doesn't have random access iterators under the hood that may be problematic. This does not mean we'll have to re-parse the text when doing the bounded recursion, you memoize the prior result and "start from there" instead.

Geal mentioned possibly wrapping nom in a new library to add this functionality. It is not clear to me that this is possible at the moment but I want to mention it so that it's not immediately ruled out. I don't have a strong understanding of the inner workings of nom (just cursory glances at the source here and there) so I can't say for sure.

I will be busy until August 1st, so I can't yet fully investigate this, right now i'm just going on past knowledge working with pvm. After that date I intend to make this my main project (along with sprucing up pvm a bit). I would like to see nom with this functionality if possible as, many moons ago, I wanted to use it to write quick and simple parsers found in programming language literature (for experimentation sake). My frustration with no crate in the Rust eco system for parsing supporting left recursion is what spawned pvm. However, I want to spread the joy (and pvm has it's own interesting considerations as a potentially runtime-changing parser).

[1] Left Recursion in Parsing Expression Grammars

amarmaduke avatar Jul 27 '17 01:07 amarmaduke

What became of this?

boggle avatar Aug 14 '20 19:08 boggle

FWIW, there is a crate called nom_recursive that adds a proc macro that aims to help support left recursion, but I think it has a pretty serious limitation.

pkgw avatar Apr 22 '22 12:04 pkgw

After having a look at nom_recursive and deciding against it because of that limitation, I ended up implementing my own solution on top of nom_locate to get custom parser state. It is an implementation of a variation of the packrat algorithm as found here: https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1

The state tracking I have is fairly simple:

  • There is a State struct containing a member for each left recursive rule
  • Each of the State members contains the Action the rule will do when next tried. It can be either Seed (initial parse), Fail (to prevent infinite recursion) and Succeed(T) (succeed and return a clone of the pre-parsed T)
  • There is a Vec<State> to track state for each input position.

So far this seems to have functionally worked well, but:

  • I haven't tried to formally verify it, and I'm not an expert in parsing
  • The Vec<State> can be huge if the input is big, which is very wasteful and therefore can really tank performance. I found that it is capital to only have state tracking for recursive rules, and to use Succeed(Box<T>) to reduce the size of the state. Just the Box trick alone allowed for a 5x speedup in debug builds.
  • This forces to wrap the grammar's rules with a macro to generate code for the State struct. The internals are not pretty and it probably limits the amount of clever parser tricks you can implement.

Fortunately, the input I have to deal with is quite small. Also being able to let the grammar be left recursive was important to me (rather than e.g. parse a flat expression syntax and turn it back into an AST using a precedence table), so I don't think I would have been better served by any other lib like pest

douglas-raillard-arm avatar May 02 '23 16:05 douglas-raillard-arm