peginator icon indicating copy to clipboard operation
peginator copied to clipboard

Automatic rule inlining and type deduction

Open ethindp opened this issue 2 years ago • 5 comments

This proposal introduces the automatic rule inlining (where productions are inlined given certain conditions) and rules are type-deduced given another set of conditions. In particular, I think a singular condition (or if we want this to have multiple conditions later, a basic set of them) should be good to start with.

My proposal is that:

  1. Automatic rule inline occurs if a rule evaluates to a single item (e.g.: a rule that only has an expression on the right-hand side and nothing else). For example, if you have a rule DefaultExpression = expr:Expression;, the generator should automatically inline this rule, since creating a new node for this would be pointless and would just waste LoC and memory. Automatic inlining would still not occur if this rule was extended in the future.
  2. Rule type deduction should occure if the referenced rule is a rule that has no captures. In my grammar, I see this usually in optionals, and in that context the rule should be eliminated and replaced with the capture (which would just be a boolean). For example:
       RequeueStatement = Requeue name:Name [require_abort:RequeueAbortRequirement] ";";
    RequeueAbortRequirement = With Abort;
    
    In this example, RequeueAbortRequirement resolves to nothing (With and Abort are dropped), so again creating a separate struct for this would be pointless. Instead, require_abort should just be a boolean (was With and Abort found or not) in the generated struct.

Thoughts?

ethindp avatar Aug 23 '23 18:08 ethindp

Ok, so I definitely would want inline rules, but the type deduction (without considering inline rules) are weird.

In case 1, you should use @:. Or I might be misunderstanding something.

In case 2 require_abort:RequeueAbortRequirement will have a type equivalent to Option<()>, which will be pretty much eqivalent to a bool in both memory and compiled code. Not as pretty though, I agree with that.

Back to inline rules: my first thought was a syntax like this:

SomeComplexRule = f1:Field1 f2:(some more complicated expression) [f3:Field3];

Which would expand equivalently to the following grammar:

SomeComplexRule = f1:Field1 f2:SomeComplexRule_f2 [f3:Field3];
SomeComplexRule_f2 = (some more complicated expression);

Now this would work fine for something like this:

ParameterDecl = [cnst:"const"] name:Name;

Where in the end cnst would be an empty option, i.e. a worse bool.

Then again, this would be bad

ParameterDecl = cnst:["const"] name:Name;

As cnst would just become an empty type, and you won't even know if the "const" string was there or not.

Capturing things would also be pretty bad:

Expression = left:Name op:(@:Add | @:Sub | @:Mul | @:Div) right:Name

Special-casing the inline rules is hard, because you have to support these features IMO:

  • Capturing only part of the parsed pattern: decl:(["junk"] @:Name)
  • Capturing multiple separate things: attribs:([is_const:Const] [is_volatile:Volatile])
  • Not capturing anything, but recording that the parse happened at least: [is_const:"const"]
  • Capturing a string, like @string rules, think: digits:{"0".."9"}+

We could of course special case only some very simple cases, and just tell the user to use proper rules if they wanted more, but that would make the grammar syntax more complex, less intuitive and harder to understand why something didn't work as you've expected it to.

badicsalex avatar Aug 23 '23 22:08 badicsalex

Outside of inlining rules, auto-collapsing Option<()> to bool and Vec<()> to usize should be explored separately.

badicsalex avatar Aug 23 '23 22:08 badicsalex

By the way, doesn't > solve most of your redundancy problems?

badicsalex avatar Aug 30 '23 13:08 badicsalex

@badicsalex I would think it would, but I've found that for the particular grammar I'm working with it only makes the resulting AST look really weird. For example, if I do it on type declarations (which themselves are nested constructs) it creates an enumeration for each terminal and non-terminal and generally looks like it'd be really awkward to actually traverse.

ethindp avatar Aug 30 '23 15:08 ethindp

It works for simple things, e.g. SubtypeMark = > Name, but not for ordered choice productions, which is probably where it'd get used most since that's where the AST gets quite complicated.

ethindp avatar Aug 30 '23 15:08 ethindp