pest icon indicating copy to clipboard operation
pest copied to clipboard

Precedence climbing at the grammar level ?

Open Nadrieril opened this issue 5 years ago • 12 comments

TL;DR: could the code generator implement precedence climbing automatically as an optimization ?

Here is the context for my question: I'm working with a fairly complex language grammar generated from an ABNF that has a bunch of rules like:

plus_expression = { times_expression ~ ("+" ~ times_expression)* }
times_expression = { equal_expression ~ ("*" ~ equal_expression)* }

This works great, except for performance. From what I understand, precedence climbing is meant exactly for this kind of situation. However, I would need to change both the grammar and the consuming code to a fairly large extent.

This strikes me as rather weird: precedence feels like it should be defined in the grammar, and precedence climbing is rather orthogonal to the rest of what consuming a pest AST is about. Everything else about consuming a pest AST is super uniform, but this sticks out to me.

The pattern mentioned above seems rather simple: several rules of the form separated(other_rule, separator) referencing each other. Would it be possible for the code generator to detect this pattern and generate the precedence climbing code automatically ?

Nadrieril avatar Apr 13 '19 13:04 Nadrieril

This is exactly what the precedence climbing effort is all about! 😃

Instead of having separate expressions on multiple levels in your grammar, you should ideally only have one flat expression with all operators being on the same precedence level in the grammar:

expression = { term ~ (op ~ term)* }
op = _{
    plus
  | minus
  | times
  | ...
}

Then, the actual precedence will be dealt with later, when you feed the Paris into the PrecClimber structure.

dragostis avatar Apr 13 '19 16:04 dragostis

Ah, that's exactly what I'm trying to avoid ! I'd like to be able to match on Pairs as if I had written the various plus_expression = ... rules, just without the performance penalty.

Nadrieril avatar Apr 13 '19 16:04 Nadrieril

Why, though? You do get that information with PrecClimber. The only downside is that one cannot understand predecence/associativity by reading only the grammar, but functionality wise, it's the same.

dragostis avatar Apr 13 '19 17:04 dragostis

Because then I'd have to change both the grammar (that I do not control completely) and my code (that uses various macros that rely on the uniformity of Pairs matching). I feel like having to handle the precedence in my code misses the point of pest, which is that pest handles the parsing details and my code just consumes the pretty resulting AST.

Nadrieril avatar Apr 13 '19 17:04 Nadrieril

I do agree with you that ideally you'd get a pretty AST that you can consume, but I also don't really see this working very well when defined inside of the grammar. We've had this before pest 1.0 and it makes the syntax quite a big heavier and the implementation unnecessarily complex.

For 3.0 I'm thinking about a kind of a glue layer. Grammars will not contain precedence climbing information, but AST creation can simply be annotated with precedence/associativity information.

#[rule]
#[precedence = [
    left(plus, minus),
    left(times, divided),
    right(power)
]]
fn binary_expression(lhs: Expr, op: OP, rhs: Expr) -> Expr { ... }

I'm happy to hear more thoughts on the design and different opinions. Constructive feedback and debate is highly encouraged!

dragostis avatar Apr 13 '19 17:04 dragostis

Ah, I imagined that this would not need any new syntax at all, since the compiler can detect the plus_expression = ... pattern. That may not be as straightforward as I imagine it however. Note that this only handles precedence and not associativity, because associativity is very easy to handle in user code.

Nadrieril avatar Apr 13 '19 17:04 Nadrieril

IMHO, writing all those separate expression rules makes the grammar quite verbose, while adding little context. It would be fairly easy to optimize these rules with a custom optimizer, since they all follow the same pattern.

dragostis avatar Apr 13 '19 17:04 dragostis

Well, that was kind of my point: I hoped that pest would be that optimizer. But maybe pest is not particularly focused on cases like mine of automatically-generated grammars; I completely understand that you don't want to invest time in such a feature if it's not your main target use-case.

Nadrieril avatar Apr 13 '19 17:04 Nadrieril

No, no. I think this is a fair use case, but it's something to consider post 3.0.

Also, if this was hypothetically already optimized, how would you imagine the AST looking? Right now, you'd have nested nodes of increasing/decreasing precedence: PlusExpression > TimesExpression > EqualExpression ...

dragostis avatar Apr 13 '19 17:04 dragostis

Yeah, that's what I'd expect, since it's the straightforward translation. This also makes it fairly easy to consume, though repetitive. That's what I have right now with my custom macro system: https://github.com/Nadrieril/dhall-rust/blob/a4e8f799fb4665b210086c28647e0fa335384913/dhall_core/src/parser.rs#L668-L674: at each level I just fold the children with the appropriate binary operation. Granted that's a lot of unnecessary rules and is rather verbose; I'm starting to see why you would be a bit reluctant to the idea

Nadrieril avatar Apr 13 '19 17:04 Nadrieril

I'm wondering, in your example, what is OP? Is it a Pair or something else? also, will the visitor style API be generated by Pest or be manual?

segeljakt avatar Apr 14 '19 20:04 segeljakt

@segeljakt, it's manual, but type-checked. This API will be called at parse time to actually generate the AST.

op is a Pair, yes.

dragostis avatar Apr 15 '19 17:04 dragostis