lrpeg Suppress output of unnecessary nodes

This is my grammar file,

https://github.com/oovm/lrpeg-test/blob/master/projects/lrpeg/src/ygg.peg

This is my parsing output of x=0

https://github.com/oovm/lrpeg-test/blob/95b032c9ca0fd46c39ea31edf480d41eaecdec1b/projects/lrpeg/tests/assign.yaml#L23-L34

I don't understand why there are two Terminals here, and the statement nodes I need are wrapped in children.

In my understanding, Terminal is ε or string or regex, it should have no children.

Oct 20 '21 09:10 oovm

You are right, this looks broken.

Oct 20 '21 11:10 seanyoung

(statement IGNORE)* becomes a node in tree, which is incorrectly labelled Terminal. I've just pushed a change which labels them List (amongst others). Let me know how this works for you.

Oct 20 '21 16:10 seanyoung

I'm just translating pest

program  = {SOI ~ statement* ~ EOI}
vs
program  <- IGNORE (statement IGNORE)* EOI;

// IGNORE means anything that can be skipped
IGNORE <- space* / newline* / comment?

a ~ b   =>  a IGNORE b
a ~ b?  =>  a IGNORE b?
a ~ b*  =>  a IGNORE (b IGNORE)*
a ~ b+  => a ~ b ~ b*

From the results, pest did not generate additional nodes

Oct 20 '21 16:10 oovm

Okay, it makes sense, after I traverse it once and flatten it, the result is correct

Oct 20 '21 16:10 oovm

lrpeg does generate too many nodes. Lots of them do not have useful information.

Does pest have a way of marking a rule/line as "do not generate nodes for this" or it is clever in some other way?

Oct 20 '21 17:10 seanyoung

In my opinion, whether it is useful or not needs to be determined according to the purpose. My classification is like this

Useless: Hard to think of usage
- ε, EOI
Ignored: Formatting needs to use these semantics
- comment, space, newline
Unnamed(Weak semantics): macros need to use these semantics
- keywords, brackets, operators
Effective semantics:
- others

According to this classification, my filter looks like this

pub fn flatten(node: Node) -> Node {
    let mut buffer = vec![];
    for node in node.children {
        flatten_rec(node, &mut buffer)
    }
    Node {
        rule: node.rule,
        start: node.start,
        end: node.end,
        children: buffer,
        alternative: node.alternative,
    }
}

pub fn flatten_rec(node: Node, buffer: &mut Vec<Node>) {
    match node.rule {
        // flatten these nodes
        Rule::Any | Rule::List => {
            for node in node.children {
                flatten_rec(node, buffer)
            }
        }
        // not important
        Rule::EOI => {}
        #[cfg(feature = "no-ignored")]
        Rule::IGNORE => {}
        #[cfg(not(feature = "no-ignored"))]
        Rule::IGNORE if node.start == node.end => {}
        #[cfg(feature = "no-unnamed")]
        Rule::Terminal => {}
        #[cfg(not(feature = "no-unnamed"))]
        Rule::Terminal if node.start == node.end => {}
        _ => buffer.push(flatten(node)),
    }
}

Oct 20 '21 17:10 oovm

How can the parser generator decided which nodes to create and which not to create nodes for?

Oct 23 '21 07:10 seanyoung

We could take inspiration from pest and do not create nodes for rules which start with an underscore.

Oct 23 '21 08:10 seanyoung

It sounds like a feasible design, but

If the node is hidden, who will hold the label and alternative attached to the node.

eg: what's the result of 1+2 under rule:

expr <- _expr0;
_expr0 <- 
    add:/ <lhs:_expr0> ("+"/"-") <rhs:_expr1>
        / _expr1;
expr1 <- 
    mul:/ <lhs:_expr1> ("*"/"/") <rhs:_expr2>
        / _expr2;
_expr2 <- 
    pow:/ <lhs:_expr3> "^" <rhs:_expr2>
        / _expr3;
_expr3 <- num:/num;

Oct 27 '21 01:10 oovm

So the idea is that if a node is hidden, then it will inherit its (non-hidden) children. So lhs and rhs are in the parse tree, even though _expr0 is not.

Now that does leave the question about the alternative though..

Oct 27 '21 07:10 seanyoung

lrpeg lrpeg copied to clipboard

Suppress output of unnecessary nodes

lrpeg
lrpeg copied to clipboard