language
language copied to clipboard
Parsing ambiguity between relationalPattern and `|`, `&` and `as`
The patterns spec makes the following grammar definitions:
pattern ::= logicalOrPattern
logicalOrPattern ::= logicalAndPattern ( '|' logicalOrPattern )?
logicalAndPattern ::= relationalPattern ( '&' logicalAndPattern )?
relationalPattern ::= ( equalityOperator | relationalOperator) relationalExpression
| unaryPattern
While the language spec defines:
relationalExpression ::= bitwiseOrExpression (typeTest | typeCast | relationalOperator bitwiseOrExpression)?
| 'super' relationalOperator bitwiseOrExpression
bitwiseOrExpression ::= bitwiseXorExpression ('|' bitwiseXorExpression)*
| 'super' ('|' bitwiseXorExpression)+
bitwiseXorExpression ::= bitwiseAndExpression ('^' bitwiseAndExpression)*
| 'super' ('^' bitwiseAndExpression)+
bitwiseAndExpression ::= shiftExpression ('&' shiftExpression)*
| 'super' ('&' shiftExpression)+
This means that the pattern == 1 | _
is ambiguous: the |
could either be an expression operator (leading to == (1 | _)
) or a pattern operator (leading to (== 1) | _
).
The patterns == 1 & _
and == 1 as T
are similarly ambiguous.
I propose we change the definition of relationalPattern
to:
relationalPattern ::= ( equalityOperator | relationalOperator) shiftExpression
| unaryPattern
That way, there is no ambiguity, and patterns like == 1 | _
, == 1 & _
, and == 1 as T
will be interpreted as (== 1) | _
, (== 1) & _
, and (== 1) as T
respectively; I think these interpretations are much more likely to match user intent.
Note that a side effect of this proposal is that patterns like == 1 ^ 2
and == 1 is int
(which previously had unambiguous meanings) would become parse errors. Personally I think this is ok.
Also, patterns like < 1 < 2
(which previously was valid, and meant < (1 < 2)
) would also become parse errors. I think this is good.
CC @munificent @leafpetersen @kallentu @natebosch @jakemac53 @mit-mit @lrhn @eernstg
The expression in the relational pattern should probably have been restricted to what can come after a relational or equality operators (which is bitwiseOrExpression
).
Changing relationalPattern
to using a shiftExpression
should be sufficient. If you really need to using |
or &
(or ^
) in the expression, you can use parentheses.
We are shooting ourselves in the foot a little by using &
with lower precedence than <
in patterns , where it's the other way around in expressions. The p < 4 & p == 5
pattern associates as (p < 4) & (p == 5)
, but the same expression associates as (p < (4 & p)) == 5
. Is that a problem?
An alternative would be to use ||
and &&
for the patterns, like in expressions, so that the expression of the relation pattern can be a bitwiseOrExpression
. And it keeps the same precedence as in expressions, even if it is not expressions.
That allows direct access to the |
, &
and ^
user-defined operators on the operands, which may be useful in some cases. But again, those cases can just use parentheses.
So, either work, the smallest change is to use shiftExpression
, the most consistent one is to use &&
/||
and bitwiseOrExpression
.
We are shooting ourselves in the foot a little
I'd actually argue that it is a very serious footgun if a pattern and an expression can reasonably be said to be "the same", but they have a completely different structure because of precedence discrepancies.
We are shooting ourselves in the foot a little
I'd actually argue that it is a very serious footgun if a pattern and an expression can reasonably be said to be "the same", but they have a completely different structure because of precedence discrepancies.
Yeah, I worry about this a little bit too. On a related note, users might also be confused by the precedence between as
and |
/ &
in patterns vs expressions:
var a = b | c as d; // parses as `(b | c) as d`
if (a case b | c as d) {} // parses as `b | (c as d)`
var a = b & c as d; // parses as `(b & c) as d`
if (a case b & c as d) {} // parses as `b & (c as d)`
Oof, this is a gnarly one. I wonder if this is why C# went with contextual keywords or
and and
for their pattern disjunction and conjunctions. Given that I borrowed the relational pattern syntax from C#, perhaps we should consider or
and and
too. But I do really like how this looks:
case 1 | 2 | 3: ...
I'll try to spend some more time thinking about it.
How about commas instead?
case 1, 2, 3, 4 : print("Small number"); break;
case 4 | 5: print("This is always 5 since 4 | 5 == 5"); break;
Commas won't work because you can use |
at any level of a pattern:
case (int x, Foo() | Bar(), int z):
Changing the |
to ,
here gets confusing. Even parenthesizing won't work, because then it looks like a nested record pattern.
Even parenthesizing won't work, because then it looks like a nested record pattern.
Oh, but obviously we would never decide on using a syntax that eats up potential future extensions like parenthesized patterns. ;-)
I guess there's always good ol' "have everything on its own line", like we have today:
case 1:
case 2:
case 3:
case 4:
print("Small number"); break;
case 4 | 5:
print("This is always 5 since 4 | 5 == 5"); break;
case (int x, Foo(), int z):
case (int x, Bar(), int z):
print("A record"); break;
It gets a bit verbose but with ...
and operators like >
and <
we would need fewer case
s to express more possibilities. When we really do have to break things up on their own line, like the last example, I think it actually looks better.
Maybe allow ranges in cases for integer numbers
case 4..5:
case 4..<6:
and operators to be called as methods (like the opposite of infix functions)
case first.&(second):
I quite like how |
looks at the top level of a case to chain a bunch of patterns together:
switch (n) {
case 1 | 2 | 3 | 5: print('is fib');
default: print('not');
}
print(switch (n) {
1 | 2 | 3 | 5 => print('is fib'),
_ => print('not')
});
But a number of people have expressed surprise that we use |
and &
for the logical patterns instead of ||
and &&
. Arguably, the latter make more sense because pattern matching can have side effects and we specifically specify that logical patterns short-circuit. Maybe we should use ||
and &&
?
That would lead to:
switch (n) {
case 1 || 2 || 3 || 5: print('is fib');
case >= 10 && <= 100: print('between ten and 100');
default: print('not');
}
print(switch (n) {
1 || 2 || 3 || 5 => 'is fib',
case >= 10 && <= 100: 'between ten and 100',
_ => print('not')
});
I don't love how ||
looks, but I do find the range a little easier to read. Thoughts?
I don't have any preference between |
and ||
, so whichever is easier to parse.
Thanks for the chance to weigh in. I'm personally in camp ||
, because I think readability > terseness.
I can imagine some scenarios where we might want the bitwise operator, and some cases where using |
for a logical pattern might be confusing, for example:
switch (opCode) {
// some cases
case undocumentedCode | 0xFF: print('do something')
// some other cases
}
Does this look like a bitwise test?
Happy to be overruled by smarter people than me, though :) This isn't American Idol, I'll support wisdom over popularity!
I prefer &&
and ||
, because they match the semantics of C++.
I'd prefer the shortcircuit operators as well, because of the short-circuiting they represent.
If we really like |
in top-level cases, we can introduce:
case pattern when test
| otherPattern when otherTest : ...
as a shorthand for
case pattern when test:
case otherPattern when otherTest : ...
instead of relying on "fallthrough" (which isn't fallthrough, it's just multiple case clauses for a single body.)
Probably has all the same problems, or even more because test
is likely a boolean expression, so parsing issues will probably make it impossible in practice.
Personally I'm leaning slightly toward ||
and &&
because they make the grammars for patterns and expressions more similar, and I feel like the more similar those grammars are, the less trouble we'll run into in the future when we try to add more syntax to the language.
I don't have a strong opinion, but I tend to prefer ||
and &&
. @timsneath mentioned readability, and I think this choice helps in that area in a couple of ways:
- The semantics of pattern matching with a
<logicalOrPattern>
shares some fundamental structure and semantics with<logicalOrExpression>
, and similarly for...And...
: "try several operands" vs. "confirm all operands", and short-cut evaluation. On the other hand, there is no such semantic similarity between bitwise and/or and the logical operators. - One important reason why
|
comes to mind is probably that it is used to separate alternatives in grammars (and function declaration "cases" in some functional languages, and perhaps there's more). However, at least grammars are more remotely related to Dart than<logicalOrPattern>
is related to<logicalOrExpression>
. - Operator precedence: It may be confusing for a reader that
||
yields the same structure in an expression as|
in a pattern, but something very different from|
in an expression.
Brevity seems to be the lone reason why we could still choose |
and &
, otherwise I think ||
and &&
is the safe bet.
OK, seems like everyone loves ||
and &&
, so let's do that. Thanks for all the feedback! That was really helpful.