chumsky icon indicating copy to clipboard operation
chumsky copied to clipboard

Fix #486: Add another recursive parser

Open Zij-IT opened this issue 2 months ago • 9 comments

This PR contains the suggested solution described in #494 for the problem described in #486, based on the solution provided by #664.

Breaking Changes:

  • Recursive::define and Recursive::declare removed from the public API
  • RecursiveN and recursive_n added to the prelude

Description

Introduced to the public API through this PR are the following items:

  • recursive_n - a function which outputs a RecursiveN parser
  • RecursiveN - a parser which contains a recursive parser, as well as all recursive dependencies as strong references
  • RecursiveArgs - a trait which all arguments passed into recursive_n must fulfill. It is implemented up to a tuple of 8 parsers, as I hope for everyone that they never have to deal with 8 mutually recursive parsers.

These solve the memory leak by only allowing the user to make weak clones of the recursive parsers. When the parsers returned by recursive_n are dropped, all strong references are dropped. The user has no access to any of the strong references and therefor can't accidentally shoot themselves in the foot.

Considerations

  1. The documentation and examples of these elements could likely be improved. It's hard to come up with a compact example for mutually recursive parsers :sweat_smile:
  2. Could recursive be replaced with recursive_n? In some cases it would unfortunately require the user to add a : Recursive<_> to their currently existing parsers. I understand why Rust can't determine if the argument is a tuple given a name, or not, but it is surely unfortunate.

(Whoops. Looks like a got the wrong number for the commit message. dang it)

Zij-IT avatar Oct 27 '25 20:10 Zij-IT

Thanks! A few thoughts:

  • Might I suggest recursive_group as a name? There's already precedence with the group combinator.
  • It feels like it should be possible to implement this using the same macro trick as we do for Choice. Was there a reason that this didn't work?

zesterer avatar Oct 27 '25 20:10 zesterer

Might I suggest recursive_group as a name? There's already precedence with the group combinator.

My only concern with that is that the group combinator parses each one of the items in order, where as the only order that matters for this combinator is the leftmost parser.

It feels like it should be possible to implement this using the same macro trick as we do for Choice. Was there a reason that this didn't work?

No, honestly I didn't try. It took me 3 minutes to write the other definitions. If going above 8 though, definitely worth converting it to a macro.

Zij-IT avatar Oct 27 '25 20:10 Zij-IT

My only concern with that is that the group combinator parses each one of the items in order, where as the only order that matters for this combinator is the leftmost parser.

Oh, I've just realised that I entirely misunderstood the point of this combinator. My assumption is that it did fn recursive_n(f: fn((A, B, C, ...)) -> (A, B, C, ...)) -> (A, B, C, ...) (i.e: gave you back all of the parsers you created.

I'm less sure how I feel about a combinator that only gives you back one, although I realise now that it's necessary for there to be a single root of the Rc tree. A common thing to want to do is to define, say, a module parser and an expression parser and then use the former for standalone programs and the latter for the REPL, but this wouldn't permit that design (at least, not so easily).

No, honestly I didn't try. It took me 3 minutes to write the other definitions. If going above 8 though, definitely worth converting it to a macro.

Fair enough. I may attempt this at some point, if only to make future maintenance easier.

zesterer avatar Oct 27 '25 20:10 zesterer

I'm less sure how I feel about a combinator that only gives you back one, although I realise now that it's necessary for there to be a single root of the Rc tree. A common thing to want to do is to define, say, a module parser and an expression parser and then use the former for standalone programs and the latter for the REPL, but this wouldn't permit that design (at least, not so easily).

Yeah, the drawback of the solution is that it does require a little bit of duplication when you are constructing the final parsers. It happens to work fine for the problem that I have, because each node implements has it's own parser function, so the duplication is limited to some clones. Technically it could be reduced further by having a function which takes in (expr, stmt, another, parser) and returns their definitions.

// with no dedup
fn expr_parser<'a>() -> Parser<'a, Expression> {
    recursive_n(|expr, stmt, another, parser| {
      let stmt = Statement::parser(/* recursion */);
      let expr = Expr::parser(/* recursion */);
      let another = Another::parser(/* recursion */);
      let parser = Parser::parser(/* recursion */);
      (expr, stmt, another, parser)
  })
}

fn module_parser<'a>() -> Parser<'a, Module> {
    // Notice the recursive `_module` is unused
    recursive_n(|_module, stmt, expr, another, parser| {
      let stmt = Statement::parser(/* recursion */);
      let expr = Expr::parser(/* recursion */);
      let another = Another::parser(/* recursion */);
      let parser = Parser::parser(/* recursion */);
      let module = Module::parser(/* no recursion here */)
      (module, stmt, expr, another, parser)
  })
}

And here would be it again, but with some... slight dedup.

fn define_shared_recursion(/* long param list */) -> (/* long tuple */) {
      let stmt = Statement::parser(/* recursion */ stmt, expr);
      let expr = Expr::parser(/* recursion */ expr, another);
      let another = Another::parser(/* recursion */ another, expr, parser);
      let parser = Parser::parser(/* recursion */ expr, parser);
      (stmt, expr, another, parser)
}


fn expr_parser<'a>() -> Parser<'a, Expression> {
    recursive_n(|expr, stmt, another, parser| {
      let (stmt, expr, another, parser) = define_shared_recursion(stmt, expr, another, parser);
      (expr, stmt, another, parser)
  })
}

fn module_parser<'a>() -> Parser<'a, Module> {
    // Notice the recursive `_module` is unused
    recursive_n(|_module, stmt, expr, another, parser| {
      let (stmt, expr, another, parser) = define_shared_recursion(stmt, expr, another, parser);
      let module = Module::parser(/* no recursion here */, stmt, expr, another, parser)
      (module, stmt, expr, another, parser)
  })
}

Sorry for the more pseudo-ish Rust. I hope this helps to illustrate that you can still do the design. It's just a matter of shifting where you deduplicate (and the use of a non-recursive module parser)

Zij-IT avatar Oct 27 '25 20:10 Zij-IT

My only concern with that is that the group combinator parses each one of the items in order, where as the only order that matters for this combinator is the leftmost parser.

Oh, I've just realised that I entirely misunderstood the point of this combinator. My assumption is that it did fn recursive_n(f: fn((A, B, C, ...)) -> (A, B, C, ...)) -> (A, B, C, ...) (i.e: gave you back all of the parsers you created.

I'm less sure how I feel about a combinator that only gives you back one, although I realise now that it's necessary for there to be a single root of the Rc tree. A common thing to want to do is to define, say, a module parser and an expression parser and then use the former for standalone programs and the latter for the REPL, but this wouldn't permit that design (at least, not so easily).

This might not be that bad. Using the language of #664, with mutually recursive module and expression parsers, the returned parser would be parser: EitherParser<Module, Expression>. To use it as a module parser, one would use parser as is. To us it as an expression parser, one would do let expression_parser = parser.flip();.

To obtain a signature like fn recursive_n(f: fn((A, B, C, ...)) -> (A, B, C, ...)) -> (A, B, C, ...) we could instead do fn recursive_n(f: fn((A, B, C, ...)) -> (A, B, C, ...)) -> (A', B', C', ...), where the primed versions are clones of the same struct with reference counters and appropriate flips (in the language of #664 recursive_2 would be modified to return (parser.clone(), parser.flip()) instead of parser). The type signature of A', etc. would be quite unwieldy though, as it would depend on B, C, ..., but the user would not need to navigate the flips.

jkylling avatar Nov 01 '25 11:11 jkylling

So. As far as I and Miri can tell, this doesn't leak (tested using the sail crates mentioned earlier). Now we just provide the user access to all parsers defined, but stuck behind the RecursiveN and each with a strong reference to the dependencies.

This I believe matches your original expectations @zesterer (:

~~Oh, and RecursiveArgs::Aux is something that I would like to get rid of, but lack the skills to do so. If someone has a cleaner implementation, I would be 100% interested. It exists for the same reason that there is a let deps line exists in the definition of build~~

Zij-IT avatar Nov 01 '25 15:11 Zij-IT

Hmm. I do think this is a neat solution for the specific, limited-scope problem the original issue - but I can't help but feel like it doesn't go far enough to be flexible for most use-cases.

I think I am going to have a go at prototyping something before merging this: not because I don't think is worth merging, but because my spidey sense is telling me that there's a better solution very close by in the design space.

zesterer avatar Nov 02 '25 16:11 zesterer

Sounds like a plan!

I totally get wanting to explore the space and seeing if you can find a solution that has a bit more flexibility to it. My solutions are admittedly biased towards my use-case :) I'm intrigued to see what you end up finding.

If this does end up being the solution that we want to go with, it may be worth trying to modify the macro to prevent the extra clone of each of the parsers (though it is just an Rc::clone each time [soon to be Arc iirc]).

Zij-IT avatar Nov 02 '25 17:11 Zij-IT

I've implemented a version of my idea in #902, along with an example/test. It's extremely rough and not at all ready for merge, but hopefully there's enough meat on the bones to demonstrate the idea.

It has a few key advantages:

  • All of the parsers you define are in-scope after the fact, and the user decides which of them they want to continue to use
  • Less boilerplate: not much in the way of awkward tuple manipulation, just identifiers for each of the parsers

I think it should do this while guaranteeing no leaks either, since the final parsers produced at the end are the only roots of the tree. I'm interested to know what you think: whether you can foresee limitations or annoyances, etc.

Usually I am very anti-macro, but in this case (as with select!, for example) I think the reduction in boilerplate justifies it.

zesterer avatar Nov 02 '25 18:11 zesterer