lark icon indicating copy to clipboard operation
lark copied to clipboard

Disallow :|

Open j6k4m8 opened this issue 3 years ago • 10 comments
trafficstars

Some terms to make this more searchable:

Disallow the colon and pipe operators next to each other. definition operator and vertical bar adjacent :|

(Please forgive me if this has already been reported; it is very challenging to search GitHub issues for single-character non-letter operators!)

Describe the bug

I just spent a good deal of time debugging what turned out to be a rogue definition line like this:

...
my_rule: | part_1 | part_2
...

in other words, there was a rule definition, immediately followed by a vertical pipe:

:|

My suggestion is to disallow this syntax, because the result was a completely unpredictable parse, since sometimes my tree would parse correctly and sometimes my_rule would be empty. Perhaps it's possible there are times when this is desireable(?) I can't imagine when that would be... but perhaps it's worth warning the user at least, so it's clear that such a "weird" rule exists! (I imagine this is not uncommon, especially when users refactor their grammars.)

(HUGE kudos for the v_args decorator, which made this much easier to track down!)

j6k4m8 avatar Jan 12 '22 23:01 j6k4m8

I agree there is some potential for confusion there.

I've already suggested before that we switch to an explicit empty rule, like (), instead of allowing implicit ones. And of course the same can be achieved by doing my_rule: part_1? | part_2 or other variations on this theme.

Unfortunately, I don't know how to bring about such a change right now, because it might break a lot of existing grammars. Perhaps I should have used v1.0 to do it, but what's done is done.

I'm not sure if adding a warning is the right reaction to this. But I'll try to keep an open mind.

Perhaps it's possible there are times when this is desireable

I think it's quite common, although usually it would be written as part_1 | part_2 | which is perhaps clearer?

HUGE kudos for the v_args decorator

Thanks. Mind if I ask how you used it to track it down?

erezsh avatar Jan 13 '22 09:01 erezsh

Understood, and thank you for the quick response @erezsh! Maybe nothing actionable here then — feel free to close! (Either way, I hope the issue might be googleable for others now!)

I've only used Lark in a small handful of projects so far now, and I'm not sure I've ever really managed to write what I'd consider a bullet-proof Transformer anyway, so it's very likely my opinions here are moot :)

v_args helped me track down the problem to a matter of the incorrect number of arguments being passed to my rule in the Transformer: I knew then that I was making it all the way to the correct rule, but NOT passing the correct subtree (and was, in fact, passing an empty subtree). I had experienced the same issue previously when I had something like this:

rule        : part_1
            | part_1 optional_part_2 

(simplified example just for explanation purposes, I realize one can use ? for this here)


@v_args(inline=True)
class MyTransformer(Transformer):

    def rule(self, part_1, optional_part_2):
        ...

Which would throw a wrong-number-of-args exception if it matched part_1 alone. (A "duh" moment, but super helpful to track down!)


While I'm kudos'ing, I also want to say a BIG thanks for the online IDE, which shortened my development cycle considerably!!

j6k4m8 avatar Jan 13 '22 13:01 j6k4m8

We can leave it open for a while, and give other users a chance to respond. We can always close it later.

Maybe it's actionable in the sense of adding an explicit_empty_rule flag to Lark. But since it will have to be off by default for now, it probably wouldn't have prevented you from tripping on this.

Glad you liked the online IDE! It can still be improved in many ways, but I'm glad it's already helping as it is.

erezsh avatar Jan 13 '22 14:01 erezsh

I'm not sure if adding a warning is the right reaction to this

I think it would be okay to issue a warning if implicit empty rule is used, so it won't break old grammars, but will prevent from doing it accidentally. Adding an explicit empty rule to avoid warnings (and I think not ()) would be nice in that case.

evtn avatar Jan 14 '22 07:01 evtn

What would you use to signify empty? I thought () is pretty elegant, and it's what Python uses for empty tuple.

erezsh avatar Jan 14 '22 09:01 erezsh

I do like (), for exactly that Pythonic "emptiness" mnemonic reason. // might also be good — "the regular expression that matches the empty string" — which I think is disallowed in grammars right now, so that might be a good way to ensure backwards compat?

Or of course, something more explicit:

%import common.EMPTY

j6k4m8 avatar Jan 14 '22 13:01 j6k4m8

These are actually 3 different constructs

  • The empty terminal, which is disallowed because how do you match ""* ? (// is used for comments :)
  • The empty rule, which you can create with my_empty:
  • The empty derivation, which you stumbled upon with a: b|, but can be created in many different ways.

The difference between the 2nd and 3rd is that an empty rule will be "parsed" and create an empty tree node (like Tree('my_empty', [])), while the empty derivation gets optimized away during preprocessing, and doesn't even get sent to the parser.

erezsh avatar Jan 14 '22 13:01 erezsh

🤦 right right, comments! silly mistake :)

Although,

rule: opt1 | opt2 | // 

magically already works ;)

Of the two options, I think it makes sense to create a tree node so that it can be handled by the developer, rather than be totally transparent?

j6k4m8 avatar Jan 14 '22 17:01 j6k4m8

Funny!

I think it makes sense to create a tree node

For that you can use the 2nd form. But there is a performance penalty for sending it to the parser, instead of optimizing it away in the preprocess. And another penalty for creating a tree node. It's small, but it can accumulate, especially since empty derivations are more common than it might seem at first. (not to mention the inconvenience)

erezsh avatar Jan 14 '22 17:01 erezsh

What would you use to signify empty? I thought () is pretty elegant, and it's what Python uses for empty tuple.

I guess I don't have a better proposal for an empty rule. Although my grammar generator is perfectly compatible with () right now.

On second thought, I think it would be the best way to do that.

evtn avatar Jan 17 '22 14:01 evtn