language icon indicating copy to clipboard operation
language copied to clipboard

Parsing ambiguity between relationalPattern and `|`, `&` and `as`

Open stereotype441 opened this issue 2 years ago • 4 comments

The patterns spec makes the following grammar definitions:

pattern               ::= logicalOrPattern
logicalOrPattern      ::= logicalAndPattern ( '|' logicalOrPattern )?
logicalAndPattern     ::= relationalPattern ( '&' logicalAndPattern )?
relationalPattern     ::= ( equalityOperator | relationalOperator) relationalExpression
                        | unaryPattern

While the language spec defines:

relationalExpression ::= bitwiseOrExpression (typeTest | typeCast | relationalOperator bitwiseOrExpression)?
                       | 'super' relationalOperator bitwiseOrExpression
bitwiseOrExpression  ::= bitwiseXorExpression ('|' bitwiseXorExpression)*
                       | 'super' ('|' bitwiseXorExpression)+
bitwiseXorExpression ::= bitwiseAndExpression ('^' bitwiseAndExpression)*
                       | 'super' ('^' bitwiseAndExpression)+
bitwiseAndExpression ::= shiftExpression ('&' shiftExpression)*
                       | 'super' ('&' shiftExpression)+

This means that the pattern == 1 | _ is ambiguous: the | could either be an expression operator (leading to == (1 | _)) or a pattern operator (leading to (== 1) | _).

The patterns == 1 & _ and == 1 as T are similarly ambiguous.

I propose we change the definition of relationalPattern to:

relationalPattern     ::= ( equalityOperator | relationalOperator) shiftExpression
                        | unaryPattern

That way, there is no ambiguity, and patterns like == 1 | _, == 1 & _, and == 1 as T will be interpreted as (== 1) | _, (== 1) & _, and (== 1) as T respectively; I think these interpretations are much more likely to match user intent.

Note that a side effect of this proposal is that patterns like == 1 ^ 2 and == 1 is int (which previously had unambiguous meanings) would become parse errors. Personally I think this is ok.

Also, patterns like < 1 < 2 (which previously was valid, and meant < (1 < 2)) would also become parse errors. I think this is good.

stereotype441 avatar Sep 18 '22 14:09 stereotype441

CC @munificent @leafpetersen @kallentu @natebosch @jakemac53 @mit-mit @lrhn @eernstg

stereotype441 avatar Sep 18 '22 14:09 stereotype441

The expression in the relational pattern should probably have been restricted to what can come after a relational or equality operators (which is bitwiseOrExpression).

Changing relationalPattern to using a shiftExpression should be sufficient. If you really need to using | or & (or ^) in the expression, you can use parentheses.

We are shooting ourselves in the foot a little by using & with lower precedence than < in patterns , where it's the other way around in expressions. The p < 4 & p == 5 pattern associates as (p < 4) & (p == 5), but the same expression associates as (p < (4 & p)) == 5. Is that a problem?

An alternative would be to use || and && for the patterns, like in expressions, so that the expression of the relation pattern can be a bitwiseOrExpression. And it keeps the same precedence as in expressions, even if it is not expressions.

That allows direct access to the |, & and ^ user-defined operators on the operands, which may be useful in some cases. But again, those cases can just use parentheses.

So, either work, the smallest change is to use shiftExpression, the most consistent one is to use &&/|| and bitwiseOrExpression.

lrhn avatar Sep 18 '22 16:09 lrhn

We are shooting ourselves in the foot a little

I'd actually argue that it is a very serious footgun if a pattern and an expression can reasonably be said to be "the same", but they have a completely different structure because of precedence discrepancies.

eernstg avatar Sep 19 '22 07:09 eernstg

We are shooting ourselves in the foot a little

I'd actually argue that it is a very serious footgun if a pattern and an expression can reasonably be said to be "the same", but they have a completely different structure because of precedence discrepancies.

Yeah, I worry about this a little bit too. On a related note, users might also be confused by the precedence between as and | / & in patterns vs expressions:

  var a = b | c as d; // parses as `(b | c) as d`
  if (a case b | c as d) {} // parses as `b | (c as d)`
  var a = b & c as d; // parses as `(b & c) as d`
  if (a case b & c as d) {} // parses as `b & (c as d)`

stereotype441 avatar Sep 19 '22 13:09 stereotype441

Oof, this is a gnarly one. I wonder if this is why C# went with contextual keywords or and and for their pattern disjunction and conjunctions. Given that I borrowed the relational pattern syntax from C#, perhaps we should consider or and and too. But I do really like how this looks:

case 1 | 2 | 3: ...

I'll try to spend some more time thinking about it.

munificent avatar Sep 22 '22 18:09 munificent

How about commas instead?

case 1, 2, 3, 4 : print("Small number"); break;
case 4 | 5: print("This is always 5 since 4 | 5 == 5"); break;

Levi-Lesches avatar Oct 02 '22 02:10 Levi-Lesches

Commas won't work because you can use | at any level of a pattern:

case (int x, Foo() | Bar(), int z):

Changing the | to , here gets confusing. Even parenthesizing won't work, because then it looks like a nested record pattern.

lrhn avatar Oct 02 '22 10:10 lrhn

Even parenthesizing won't work, because then it looks like a nested record pattern.

Oh, but obviously we would never decide on using a syntax that eats up potential future extensions like parenthesized patterns. ;-)

eernstg avatar Oct 02 '22 14:10 eernstg

I guess there's always good ol' "have everything on its own line", like we have today:

case 1: 
case 2:
case 3: 
case 4:
  print("Small number"); break;

case 4 | 5: 
  print("This is always 5 since 4 | 5 == 5"); break;

case (int x, Foo(), int z): 
case (int x, Bar(), int z):
  print("A record"); break;

It gets a bit verbose but with ... and operators like > and < we would need fewer cases to express more possibilities. When we really do have to break things up on their own line, like the last example, I think it actually looks better.

Levi-Lesches avatar Oct 03 '22 05:10 Levi-Lesches

Maybe allow ranges in cases for integer numbers

case 4..5:

case 4..<6:

and operators to be called as methods (like the opposite of infix functions)

case first.&(second):

Wdestroier avatar Oct 03 '22 14:10 Wdestroier

I quite like how | looks at the top level of a case to chain a bunch of patterns together:

switch (n) {
  case 1 | 2 | 3 | 5: print('is fib');
  default: print('not');
}

print(switch (n) {
  1 | 2 | 3 | 5 => print('is fib'),
  _ => print('not')
});

But a number of people have expressed surprise that we use | and & for the logical patterns instead of || and &&. Arguably, the latter make more sense because pattern matching can have side effects and we specifically specify that logical patterns short-circuit. Maybe we should use || and &&?

That would lead to:

switch (n) {
  case 1 || 2 || 3 || 5: print('is fib');
  case >= 10 && <= 100: print('between ten and 100');
  default: print('not');
}

print(switch (n) {
  1 || 2 || 3 || 5 => 'is fib',
  case >= 10 && <= 100: 'between ten and 100',
  _ => print('not')
});

I don't love how || looks, but I do find the range a little easier to read. Thoughts?

munificent avatar Nov 16 '22 01:11 munificent

I don't have any preference between | and ||, so whichever is easier to parse.

natebosch avatar Nov 16 '22 01:11 natebosch

Thanks for the chance to weigh in. I'm personally in camp ||, because I think readability > terseness.

I can imagine some scenarios where we might want the bitwise operator, and some cases where using | for a logical pattern might be confusing, for example:

switch (opCode) {
  // some cases
  case undocumentedCode | 0xFF: print('do something')
  // some other cases
}

Does this look like a bitwise test?

Happy to be overruled by smarter people than me, though :) This isn't American Idol, I'll support wisdom over popularity!

timsneath avatar Nov 16 '22 02:11 timsneath

I prefer && and ||, because they match the semantics of C++.

Wdestroier avatar Nov 16 '22 02:11 Wdestroier

I'd prefer the shortcircuit operators as well, because of the short-circuiting they represent. If we really like | in top-level cases, we can introduce:

  case pattern when test
       | otherPattern when otherTest : ...

as a shorthand for

  case pattern when test:
  case otherPattern when otherTest : ...

instead of relying on "fallthrough" (which isn't fallthrough, it's just multiple case clauses for a single body.)

Probably has all the same problems, or even more because test is likely a boolean expression, so parsing issues will probably make it impossible in practice.

lrhn avatar Nov 16 '22 11:11 lrhn

Personally I'm leaning slightly toward || and && because they make the grammars for patterns and expressions more similar, and I feel like the more similar those grammars are, the less trouble we'll run into in the future when we try to add more syntax to the language.

stereotype441 avatar Nov 16 '22 13:11 stereotype441

I don't have a strong opinion, but I tend to prefer || and &&. @timsneath mentioned readability, and I think this choice helps in that area in a couple of ways:

  • The semantics of pattern matching with a <logicalOrPattern> shares some fundamental structure and semantics with <logicalOrExpression>, and similarly for ...And...: "try several operands" vs. "confirm all operands", and short-cut evaluation. On the other hand, there is no such semantic similarity between bitwise and/or and the logical operators.
  • One important reason why | comes to mind is probably that it is used to separate alternatives in grammars (and function declaration "cases" in some functional languages, and perhaps there's more). However, at least grammars are more remotely related to Dart than <logicalOrPattern> is related to <logicalOrExpression>.
  • Operator precedence: It may be confusing for a reader that || yields the same structure in an expression as | in a pattern, but something very different from | in an expression.

Brevity seems to be the lone reason why we could still choose | and &, otherwise I think || and && is the safe bet.

eernstg avatar Nov 16 '22 15:11 eernstg

OK, seems like everyone loves || and &&, so let's do that. Thanks for all the feedback! That was really helpful.

munificent avatar Nov 21 '22 22:11 munificent