grmtools icon indicating copy to clipboard operation
grmtools copied to clipboard

Explain why Copy is required for a type specified %parse-param

Open FranklinChen opened this issue 1 year ago • 13 comments

It might be useful for the book to explain briefly why Copy is required for the type specified in %parse-param. This restriction has led me to have to pass in mutable state using a layer of indirection with &RefCell<State> and repeatedly using borrow_mut() in actions, so I wonder if this is really necessary.

FranklinChen avatar May 20 '23 00:05 FranklinChen

It is theoretically possible that it could be relaxed such that %parse-param could allow of an &mut reference, but it is exceedingly difficult. With the main reason being the unique ownership of an &mut, the type for action functions needs to be changed to a type of function which cannot capture ownership of the &mut (so that when the reference passed to the action falls out of scope it can be passed to the next action). Rust can type such functions as an HRTB/Higher ranked lifetimes using for <'a>.

One of the things that makes this difficult may be the pattern matching involved in generated code, and the difference between FunctionParamPattern and Pattenrs.

Copy patterns are fairly easy to implement without actually parsing the %parse-param using the argument in the generated code verbatim. There were a lot of attempts to get mutable references working in https://github.com/softdevteam/grmtools/pull/214 But I kind of ran out of steam on that, and the RefCell workaround.

ratmice avatar May 20 '23 01:05 ratmice

It would be nice if it supported &mut but as @ratmice said, I'm not sure how practical that is. One of the challenges of very-clever type systems is that it can be hard to experiment on existing codebases, and we've definitely encountered that challenge!

Another possibility for your case might be to use interior mutation: so you could pass a type T: Copy around which internally has a RefCell or whatnot that hides some of the horrors of borrow_mut. It's still not completely ideal, of course.

ltratt avatar May 20 '23 07:05 ltratt

I guess that lalrpop (which I was using before) doesn't have this problem for its parameter passing mechanics https://lalrpop.github.io/lalrpop/tutorial/009_state_parameter.html because it generates a parser struct for each exported rule, whereas lrpar generates static functions. In lalrpop, I passed mutable state using the directive

grammar<'extra>(state: &'extra mut Vec<u8>);

whereas for lrpar I am using

%parse-param state: &RefCell<Vec<u8>>

FranklinChen avatar May 20 '23 22:05 FranklinChen

To some extent, the current design reflects the fact that I always write parsers that bubble state up rather than mutate state: honestly, it didn't really occur to me to deal with mutable state! Could lrpar be adapted to deal with mutable state? I guess it probably could. I must admit that I don't think I'll be the person who does that though :/ Sorry, that's not a very satisfactory answer on my part!

ltratt avatar May 20 '23 22:05 ltratt

I personally prefer to program purely functionally, but for performance I've been collecting some things during parsing mutably, which is very cheap when using a Vec and simply pushing to it. I've considered passing everything upward instead, at the expense of basically manually threading state through everything, and using Rust doubly LinkedList to collect things recursively without paying a quadratic concatenation penalty. I haven't done that yet for comparative benchmarking purposes.

The other thing I want to do, where mutable state seems particularly sensible, is to catch semantic errors during parsing and log error objects into a Vec before doing recovery and going on, since I don't want to fail fast in parsing but want to generate as many errors as possible for the end user.

FranklinChen avatar May 20 '23 22:05 FranklinChen

I don't mind having another look at it, though I can't promise I'll be any more successful than last time I attempted to do so. I think that the %parse-param work that made it's way in tree may have simplified things.

The primary difference being in these 2 commits, where we went from accepting multiple named arguments to a single named argument. https://github.com/softdevteam/grmtools/pull/257/commits/3298c50ce23b664e779434b4642904b1dc46d5de https://github.com/softdevteam/grmtools/pull/216

So, if we can in fact go with the simpler single named argument approach it may well avoid a lot of the difficulty I encountered in my previous attempt. As such I think there is reason to hope that much of the difficulty I'd previously encountered can be avoided.

ratmice avatar May 20 '23 23:05 ratmice

If it is doable, that would be great!

ltratt avatar May 21 '23 10:05 ltratt