language
language copied to clipboard
Does the new pattern proposal allow user defined equality?
[Updated: This was an overly terse summary, which caused some confusion. I've elaborated (hopefully more clearly) here]
The current rules for switches require that constants in switches come from classes that do not override equality. The new pattern proposal adds a relational pattern == constant which is defined to call the equality operator on the receiver (and also relational operators similarly).
- Is
== constantintended to work when the receiver overridesoperator ==? - Does the proposal eliminate the current error for existing switches?
cc @lrhn @eernstg @munificent @stereotype441 @natebosch @jakemac53
- Is
== constantintended to work when the receiver overridesoperator ==?
Yes, I don't see why not.
- Does the proposal eliminate the current error for existing switches?
I think it should, yes. Any reason why not?
We just had an issue (https://github.com/dart-lang/sdk/issues/49585) about the performance of the current kind of switch statement. Apparently, some backends don't optimize those as much as I would have expected, but the main point I want to make here is that we could make certain switch statements (including the ones we have today) work considerably faster than a linear scan which is capable of running arbitrary code at each case.
Will it be sufficient to just recognize the fast ones and compile them using jump tables and such? Or will developers then have to walk a tightrope in order to make sure that they get the fast translation, and they'll always have to worry about even the tiniest change that might implicitly turn their fast switch into a slow one?
We could make the difference explicit in the code (and thus given developers control) by keeping today's switch unchanged. This kind of switch could then be used when the more powerful matching features are not needed, but high performance is crucial.
They could then use the new syntax when they want to use pattern matching, which is probably going to imply a linear scan (because it's difficult to ensure a predictable semantics otherwise), and they can know for sure that they stick to the highly optimizable kind of switch when they use the current syntax.
Of course, it probably takes a very long discussion to find that new syntax, but I believe that it should be rather easy to disambiguate an arbitrary keyword (it doesn't have to be a reserved word, probably not even a built-in identifier), so we might well be able to use, say, match.
match (<expression>) {
case ... // Case is a reserved word, so this is not a local function declaration.
}
We just had an issue (dart-lang/sdk#49585) about the performance of the current kind of switch statement.
I'm not surprised that that issue relates to implementing a VM's instruction dispatcher. That seems to disproportionately be the kind of code where switch performance matters. Outside of those and lexers, I don't see switch performance come up very much.
Will it be sufficient to just recognize the fast ones and compile them using jump tables and such?
I think so. In the example, the compiler already knows that it's switching over an integer and what the exact integer values of each case is. That should be enough to know that it can compile the switch to a jump table, or at least a binary search. All of that is equally true if we extend switches to support patterns. The compiler will still be able to tell if a switch is switching on an integer type and if all the cases are literal or named constant cases and what those constant values are.
If you start using more interesting patterns in there, it might be hard to compile them efficiently. In particular, if you use extractor, list, or map patterns then it can call user-defined methods. Or if you switch on a non-sealed type (i.e. not int, string, double, bool). But as long as you switch on primitive types, enums, and/or records containing those, I think the compiler can reliably tell that no side effects can occur and can compile to something faster than a linear scan.
If I'm missing something and that's not the case, I'd definitely like to know. My hope is that switches on primitive and enum types are as optimizable with patterns as they are today and that likewise matching on records should be pretty optimizable by the compiler.
When the compiler does have to resort linear scan (with caching #2107), I think that's no slower than the code users would write if we didn't have pattern matching switches, so that's OK.
so we might well be able to use, say,
match.
I did consider that, but I think it's a bad idea to pick a contextual keyword that's the same name as the name of a type in the core library. :-/
When the compiler does have to resort linear scan (with caching #2107), I think that's no slower than the code users would write if we didn't have pattern matching switches, so that's OK.
I don't think the main concern would be with linear scan, but with whether or not switching on a set of constants requires actually calling the (potentially) user defined == operator or not, and whether we can avoid doing so in all of the cases where we currently do not do so (that is, that we can make the generality added here "pay as you go"). Concretely, consider this currently legal program:
class A {
const A();
}
class B {
const B();
}
class C {
bool operator ==(Object other) {
print("hello");
return false;
}
}
const Object a = A();
const Object b = B();
void main() {
Object o = C();
switch (o) {
case a: print("A"); break;
case b: print("B"); break;
default: break;
}
}
This program never calls any equality method (or at least does not need to), since:
- It is an error to switch over constants which override primitive equality.
- The semantics are defined to not call the equality method on the scrutinee (here 'o') but on the case constant (here
aandb).
In other words, the semantics are set up such that this can be compiled to simply check identity.
As specified in the proposal, I believe that the following program would behave quite differently:
class A {
const A();
}
class B {
const B();
}
class C {
bool operator ==(Object other) {
print("hello");
return false;
}
}
const Object a = A();
const Object b = B();
void main() {
Object o = C();
switch (o) {
case == a: print("A"); break;
case == b: print("B"); break;
default: break;
}
}
I believe that as specified, this program will print "hello" twice, and then break, since the semantics are defined to call equality in the other order: o == a and o == b. Is that not correct?
So the questions I was trying to get at are:
- Do we intend the two versions of the program listed above to be different?
- If so, are we sure that we can still compile existing switches efficiently?
I think that so long as we are not proposing to change the direction of the equality check for existing constant patterns, the answer to the second question is "yes", but @rakudrama should probably check me on that.
- Does the proposal eliminate the current error for existing switches?
I think it should, yes. Any reason why not?
I have some concern that we are removing a performance rail. Currently, simple switches over constants are ensured to be implementable relatively performantly. Removing this error means that you can fairly radically change the performance profile of a switch by overriding operator == on one of the constants. It is (I believe, see above) still pay as you go, so it's maybe not terrible, but it does worry me slightly.
I believe that as specified, this program will print "hello" twice, and then break, since the semantics are defined to call equality in the other order:
o == aando == b. Is that not correct?
Yeah, that is correct. That's an interesting example!
So the questions I was trying to get at are:
- Do we intend the two versions of the program listed above to be different?
I don't know if I'd say "intend". I didn't wake up one morning and design the proposal specifically around this example. :) But I do think the different behavior naturally falls out of design choices that otherwise make sense, and it doesn't seem particularly problematic to me that the behavior is different, unless there's something I'm missing.
It is technically a breaking change. I would be quite surprised if any significant amount of existing Dart code was affected by the change.
- If so, are we sure that we can still compile existing switches efficiently?
I believe so. I think the existing switches where performance matters are ones where the cases and scrutinee are all the same type and that type is int, String, or an enum (that doesn't override ==). When that's true, we know that the == method being called is one defined by the core library and not overridden by user code, so the compiler can assume it doesn't have side effects and knows how it behaves.
If we were to ever make int non-sealed, then we'd have big problems. Unless I'm missing something, I think patterns still let us compile switches on integer and enum types quickly. If all the cases are ints or enums but the scrutinee is not, then it gets harder because you could be calling a user-defined ==. I'm fairly confident that the switches where performance matters are also ones where the scrutinee and cases have a homogeneous type.
I have some concern that we are removing a performance rail. Currently, simple switches over constants are ensured to be implementable relatively performantly.
I mean, the other way to look at it is that the old language team kneecapped switches—you can't even switch on doubles!—in order to provide that rail... and even then the implementation teams didn't bother to take advantage of it and users didn't care enough to bother them about it. If that rail actually mattered, we'd have been having this discussion in 2012 and not 2022. :)
It is technically a breaking change. I would be quite surprised if any significant amount of existing Dart code was affected by the change.
I don't understand: if the two programs above are different (which you say above that they are), then there's no breaking change, right?
I think the existing switches where performance matters are ones where the cases and scrutinee are all the same type and that type is
int,String, or an enum (that doesn't override==). When that's true, we know that the==method being called is one defined by the core library and not overridden by user code,
This is only true if the equality called is not the one on the scrutinee.
and even then the implementation teams didn't bother to take advantage of it
No, they absolutely do. I do not believe any implementation (probably not even DDC) actually calls the equality method to evaluate switches.
Thinking about this a bit more from the user perspective. We have a choice around how to interpret literal/constant patterns:
- We can interpret them as checking identity
- We can interpret them as calling the equality method on the constant
- We can interpret them as calling the equality method on the scrutinee
Current Dart can be thought of as doing either of the first two, since they are equivalent given the constraint on switches, but if we drop that constraint they become potentially different.
It seems weird to me that you can have a constant pattern like 1 and have it be matched by something completely and arbitrarily different. That pushes me towards one of the first two interpretations. That's inconsistent with the semantics of == 1, but that seems broadly ok to me. It's a different pattern. But... it does mean that there is no way to take a literal pattern like 1 and name the literal while still preserving the existing behavior. So maybe there should be an identity pattern? Or a reverse equality pattern: a == is the pattern which checks a == o for scrutinee o?
Current Dart can be thought of as doing either of the first two, since they are equivalent given the constraint on switches, but if we drop that constraint they become potentially different.
I don't think that current Dart switch can be thought of as checking identity. I can construct a String instance which is not identical to a constant String, but does match a switch case with that constant String. (At least on the VM, the browser seems to canonicalize some strings at runtime that the VM doesn't)
I don't think that current Dart switch can be thought of as checking identity. I can construct a String instance which is not
identicalto a constant String, but does match a switch case with that constant String.
Fair enough. I don't think that changes the substance of my question though.
and even then the implementation teams didn't bother to take advantage of it
Compiling a Dart switch statement to a JavaScript switch statement is taking advantage of it. Otherwise we would have to generate an if-then-else chain.
dart2js goes further and converts a switch on enums to a switch on the enum index. Some JavaScript VMs can compile switches with integer case values in a compact range to a jump table (example).
So Dart on the web can be said to sometimes implement switch statements on enums using jump tables.
I don't think that current Dart switch can be thought of as checking identity. I can construct a String instance which is not
identicalto a constant String, but does match a switch case with that constant String. (At least on the VM, the browser seems to canonicalize some strings at runtime that the VM doesn't)
JavaScript defines === and Object.is as returning true for strings that have the same sequence of code units. This makes it impossible to have two Strings that are Dart == but not identical, even if they are separate 'primitive' objects residing in different places in the heap. There is just no way to tell they are separate objects and the JavaScript engine is free to canonicalize them any way it chooses.
In effect, Dart on the web uses identical(caseValue, scrutinee), but identical is expanded to include String objects that have the same contents.
If records are identical by having identical field values, it will feel very strange for the VM to persist in having Strings that are equal but not identical.
I think we're moving in and out of several topics here, so let me restate the parts that I'm a little bit worried about:
- Some switch statements currently allow for an optimized translation (e.g., jump tables for enums).
- We should preserve the ability to optimize those switch statements, and we should be really careful about introducing breaking changes to the dynamic semantics of existing code.
- Pattern matching is inherently a linear scan (if any steps can have side effects, which is "always" in practice, because there are so many ways to make the matching process run arbitrary code).
- Developers should be in control when they make the choice between fast switch statements and pattern matching, preferably in a way which is visible (as opposed to "you can only know which variant you have in this switch statement/expression if you can ask the analyzer").
I know, Dart is not the language where switch statements are used to implement state machines (e.g., lexers) every day, but I still think that we should preserve the ability to have a performant construct using the switch syntax, and I think it makes sense to use a different syntax to specify the new pattern matching constructs (even the tiniest difference would suffice, we can still see it).
Current Dart switch case checking is "primitive equality" checking on "the same type", which means identical except for certain known platform types. Because of that we know, for certain, that == is symmetric and without side effects. We can do any number of optimizations to get to the equality, including using identical for all non-platform types.
(Did we remove the requirement that the cases must have the same type as the switched expression? If so, symmetry is no longer promised, because the switch expression value can have a user defined ==.)
With patterns, we allow user-defined == methods in general, so we no longer know that == is symmetric, without side effects, or even consistent over time. Whether case a: and case == a: should be the same or not ... I'd probably make them so, using the == a behavior for both. (That shouldn't matter for any existing switch, if it's guaranteed symmetric and without side effects.)
I can argue for both directions.
o == asounds like what you'd do if you are inspecting the valueo. Also, it makes it easier to cache results of common pattern subexpressions if all the invocations are on the same object. It's good object orientation to ask an unknown object about itself, instead of asking other objects about it. Low-level, you can optimize calls too.==by caching the method lookup.a == omakes the pattern objecta, the object you are in complete control of when writing the switch, be the one to decide which check to make. You know the==implementation at the switch position, and can potentially inline it. (At least whenais constant, which it still needs to be, right?). There is no way the foreign object being switched on can hack its way around the switch by using a cleverly written==method. Puts the developer in charge of which equality checks will be made, so they can control performance.
I prefer the latter, but the == a pattern cannot reasonably be interpreted any other way than as the former, and if we only get to have one strategy for equality, it'll have to be it.
I don't really have any facts to add to the arguments presented above, but just for the record here's some opinions 😄:
Personally I'm a fan of @lrhn's o == a approach because I want case a: and case == a: to be equivalent. I think that in the majority of situations it's not going to make any user-visible difference, but in the situations where it does make a difference, I think users are going to assume o == a semantics, so we should follow that.
As for performance, I think there are two separate cases:
- When switching on an int, or an enum that doesn't override
operator==, I expect the compiler to create a jump table (or something similar), and I don't think that's a problem, because ints and enums are sealed types, so that's safe for the compiler to do. - When switching on anything else, I don't expect performance to be any better than a naive desugaring into an if/else chain, in which case I think
o == asemantics is fine; if the code is so performance critical that the user needs to do something cleverer than that, they're welcome to manually desugar it and then tweak it to do whatever is best.
As for @eernstg's idea of introducing a new syntax for the new matching construct, I'm not a huge fan. I think that as we're adding new features to Dart, we should strive as much as possible to steer towards the language that we would have wanted if those features had been present from day one. So that new users don't have to understand the history of Dart to make sense of the language design. I personally think that if we had had patterns from day one, we would just have a single common syntax for switch and match.
I feel like there's quite a bit of confusion in this thread - I'd like to really ask folks to take the time to read through the original post and follow ups carefully, because I think there's been a lot of misinterpretation of what the core issues are (and in fairness, I'm not sure I summarized them well in my initial issue, though I think my first follow up comment does a better job).
Let me try to get this back on track by separating out a few different issues. I'll start by replying to a few specific comments that I think get at some of the current confusion, and then try to summarize as clearly as possible what the core questions are here.
@lrhn
I'd probably make them so, using the
== abehavior for both. (That shouldn't matter for any existing switch, if it's guaranteed symmetric and without side effects.)
I'm not sure I understand what you are saying here, but if I understand you correctly, this isn't right. My first example here is an existing switch, which runs on existing implementations, and which will produce a different result if we change the semantics as you suggest here, and which will also not admit the same implementation strategy that we currently use.
@stereotype441
- When switching on an int, or an enum that doesn't override
operator==, I expect the compiler to create a jump table (or something similar), and I don't think that's a problem, because ints and enums are sealed types, so that's safe for the compiler to do.
That is not the case if we use the o == a semantics. The reason is that the equality operator which is called is not the one on the sealed type, but on the scrutinee, which can be any arbitrary object whatsoever.
There's been a lot of discussion of "jump tables" here, and I would like us to get away from that, because it's largely a red herring. The core of the issue is whether or not the compiler can reliably know that primitive equality is the right equality to use or not when matching a constant. That's it. It's true that in the case that if it does, in some cases it could then choose to use a jump table, but even if it doesn't use a jump table, there's still a huge difference between a sequence of conditional identity checks and a sequence of (in the worst case) mega-morphic virtual method calls to an equality operator.
Restatement of the issue:
To re-summarize the issue, let's start by thinking about the following small example:
class A {
const A();
}
class C {
bool operator ==(Object other) {
print("hello");
return true;
}
}
const Object a = A();
void main() {
Object o = C();
switch (o) {
case a: print("A"); break;
default: print("other"); break;
}
}
This is a valid Dart program right now, which prints "other" and then exits. It does so because switch cases are matched by calling equality on the constant being matched. That is, the test done here is a == o (not the other way around).
Point the first
Currently this proposal preserves the semantics of this switch, to wit:
Literal or constant: The pattern matches if o == v evaluates to true where o is the pattern's value.
Several folks have suggested in comments here that we should change this to use the v ==o semantics. That's a thing we can do, but the first interesting fact about that is that doing so changes the semantics of existing programs. That is, if we did so, this program would print "hello", then print "A", then exit.
Point the second
The current semantics of switches combined with an additional restriction on switch constants guarantees that switches like the one above can use primitive equality (essentially identity) to test for a match against the constant, instead of doing actual method calls. This is not (directly) about jump tables. To elaborate, the code above is, per the spec, essentially equivalent to the following code:
void main() {
Object o = C();
if (a == o) print("A");
else print("other");
}
and further there is a static restriction that says that a must have primitive equality, which says that the code above is equivalent to:
void main() {
Object o = C();
if (identical(a, o)) print("A");
else print("other");
}
As @rakudrama observes above, this also admits other implementation strategies when targeting JS (e.g. compiling to a JS switch statement).
Even if we preserve the existing semantics for constant matches (that is, use the constant as the receiver), there is an impact on our ability to generate good code if we drop the restriction requiring primitive equality. That is, the following program, which is currently not legal, becomes legal:
class A {
bool operator ==(Object other) {
print("Not previously allowed");
return identical(this, other);
}
const A();
}
class C {
bool operator ==(Object other) {
print("hello");
return true;
}
}
const Object a = A();
void main() {
Object o = C();
switch (o) {
case a: print("A"); break;
default: print("other"); break;
}
}
Even if we don't change the receiver of the equality method, this program cannot be compiled to use identical anymore, since the equality of the receiver is no longer primitive equality.
I do believe that if we make just this change (that is, drop the restriction to primitive equality), all existing switches can continue to be compiled as they currently are, since the compiler can always statically see whether or not the constant in question overrides primitive equality.
In other words, just removing this restriction is pay as you go WRT existing switches, but does remove a performance rail. @rakudrama do you agree with this summary, or am I missing anything?
Point the third
This proposal also adds a new kind of constant pattern of the form == a, which is currently specified as being matched using the semantics o == a, where o is the scrutinee.
This has the following implications:
- As currently specified, this means that the patterns
case aandcase == aare different, both semantically, and in terms of how they can be compiled.- Semantically, because different code is run, which can return different results
- From a compilation standpoint in that the first has a statically predictable equality method, whereas the second is an unknown virtual call in general.
- As currently specified in this proposal, if a constant case of the form
case amatches, we know that the scrutinee is in fact "the same" object asa, but if a constant case of the formcase == amatches, we don't know anything about the scrutinee except that its==method returnstruewhen passedaas an argument.- If we additionally relax the restriction on primitive equality for constants, then
case ano longer means that the scrutinee is the same object in general, but rather just that the equality onareturned true.
- If we additionally relax the restriction on primitive equality for constants, then
Commentary
I am highly skeptical about breaking the performance profile of existing switches. This strongly implies to me that we need to preserve the semantics of case a as a == o rather than the other way around.
I feel less strongly about whether or not we should allow users to redefine equality on constants. It doesn't feel very useful to me, and it feels surprising to users, and it has bad performance implications. On the other hand, it's pay-as-you-go, and nobody makes you override equality on your constants.
It feels a bit off to me that the semantics of case a and case == a are different. Not terrible, but a bit off. Perhaps that suggests using a different syntax?
It feels a bit off to me that there is no way to take case c where c is a constant, give c a name, and preserve semantics (because in general case c and case == n where const n = c, may call different equalities). Perhaps that suggests that if we keep the == c semantics, there should also be an "identical" pattern? e.g. case identical n?
In other words, just removing this restriction is pay as you go WRT existing switches, but does remove a performance rail. @rakudrama do you agree with this summary, or am I missing anything?
I agree with this summary.
If we change case a: to mean o == a, we can still compile efficiently when the static type of the switch expression has primitive equality.
The current formulation dates back to Dart 1, which didn't really have static types to serve this purpose.
Perhaps we can keep the performance rail as follows:
- If the case constant does not have primitive equality, you must use the
case == aform. - If all of the case clauses are of the form
case a, then the static type of the switch expression must has primitive quality.
If a library you are importing removes primitive equality from the constant or static type of the switch expression then you will have to change the switch statement in some way or guard its use.
Sorry, yes, there are a few things tangled up in this and a couple of mistakes on my part.
Optimization
Earlier, I said:
I believe so. I think the existing switches where performance matters are ones where the cases and scrutinee are all the same type and that type is
int,String, or an enum (that doesn't override==). When that's true, we know that the==method being called is one defined by the core library and not overridden by user code, so the compiler can assume it doesn't have side effects and knows how it behaves.
(Emphasis added.) I still stand by that. Regardless of whether constant patterns are compiled to scrutinee == constant or constant == scrutinee or something else involving identity... if we know the types of both the scrutinee and the constant, and that type is a sealed built-in type, then the compiler knows exactly how that == behaves.
It is true that you could have switch cases that are ints or enums and a scrutinee that is some other user-defined type that defines == in some weird way. If you do that, then, yes, the compiler can't optimize it easily. But my contention is that those are not what performance critical switches look like in the wild. I believe that every switch case where performance matters will be one where the cases and scrutinee all have the same type and it's a primitive type or an enum.
Maybe I'm wrong and there are people doing switches on weird user-defined non-enum scrutinee types where they really want the compiler to give them a jump table, but I'd be surprised.
Semantics
That still leaves the question of what should the semantics be in cases where the choice of scrutinee == constant, constant == scrutinee, or something else is user visible. For that... uh... I kind of screwed up.
I never considered that Dart allows you to have a value in a switch that doesn't implement primitive equality. It only asymmetrically requires the case constants to use primitive equality. The example you have above makes sense in retrospect but is not one I considered at all.
I'm not sure why I chose constant == scrutinee as the semantics for constant patterns. I think I may have gone back and forth a few times. Maybe Lasse suggested I do it that way specifically because of this compatibility reason?
When I much more recently added == and other relational operator patterns, I did scrutinee op constant. I think that order is necessary for the comparison operators. It would be extremely confusing if we did constant op scrutinee:
switch (1) {
case > 2: print("WAT");
}
This would print "WAT". We could change the relational pattern syntax, but I think the syntax works pretty well (and is what C# does). For that syntax to make sense, scrutinee op constant is what we want. And I think that's the "natural" order for those operations since the scrutinee expression comes first in a switch.
That's why == constant patterns use scrutinee == constant: to be consistent with the relational operators.
But I totally overlooked that it's inconsistent with named constant and literal patterns, which use constant == scrutinee. As you note, if we change this to scrutinee == constant for named constant and literal patterns, that can change the semantics of existing programs.
I would be pretty surprised if there were many places in real code that are affected by this, but it is technically breaking.
What to do
I'm not personally worried about having named constant and literal patterns call == on their operands. I think in the cases where performance matters we know the types of both sides and those types are language-defined and sealed so the compiler knows how == behaves.
For the other cases, I think we need to do something. Using constant == scrutinee for named constant and literal patterns and scrutinee == constant for == patterns would make for a delighful chapter in Dart Puzzlers, but is not an elegant choice.
I can think of a few options:
-
Use
scrutinee op constantfor comparison operators andconstant op scrutineefor==patterns. That makes comparisons do what they should and makes==consistent with named constant and literal patterns. But it feels weird to me to have==not be consistent with comparisons. -
Eliminate relational patterns entirely. I added the
== constantpattern (and by extension the other relational operators) as an escape hatch for matching on bare identifier constants because the latest version of the proposal treats all simple identifiers in patterns as variable binders. We could not do that and to some kind of explicitly variable pattern syntax. Then we don't need==and could eliminate it entirely along with the other relational operators.Then we'd keep using
constant == scrutineefor named constant and literal patterns and be consistent with the current language.But the relational operator patterns are actually pretty nice and useful for matching on numeric ranges. I think they're a good feature.
-
Always use
scrutinee == constant. Make named constant, literal patterns, and==patterns all desugar to the same structure. This is internally consistent. It's a breaking change, but I suspect the breakage would be small. With some effort, we could measure this. -
Use a different syntax for pattern-based switches. Erik has suggested this. It nicely sidesteps all of the issues, but it leaves a mostly deadweight feature in the language. I find Paul's argument that Dart should look like it had been designed entirely simultaneously and not make user-visible its evolution really compelling. I don't want to have two multi-way branching constructs: the old bad one and the new good one.
Also, pragmatically, it's hard to come up with a good syntax or keyword for it. Using
matchas a contextual keyword would be pretty annoying in a language that hasMatchin its core library and wherematchis thus an extremely common local variable name. -
Use some built-in notion of "value matching" that does not invoke a user-defined
==at all. We could come up with our own semantics for what it means for a value to match. Probably something like:- If not the exact same runtime type, then don't match.
- If the runtime type is a primitive type, enum, or record, then match if equal.
- Otherwise, match if
identical()?
But, I think it's pretty confusing to have
==patterns that don't call==. And I think it's potentially useful to call user-defined==operators. It makes switches and pattern matching more programmable and I almost always think that's a good thing. -
Define a new "value matches" method that objects can implement. This would let user-defined types specify how value matching behaves independently from how
==behaves. The default implementation of this in Object would behave such that it's consistent with how switches behave today. Since no Dart code in the wild can be implementing this non-existent method yet, it wouldn't be a breaking change.
If it were up to me, I'd probably say use scrutinee == constant and take the breaking change. It seems like the simplest, most explainable semantics. If we were designing Dart from scratch today, I think that's what we'd do. But this is definitely not an area where I feel like I have a strong opinion or good insight so I'd like to hear more from others.
If it were up to me, I'd probably say use
scrutinee == constantand take the breaking change. It seems like the simplest, most explainable semantics. If we were designing Dart from scratch today, I think that's what we'd do. But this is definitely not an area where I feel like I have a strong opinion or good insight so I'd like to hear more from others.
FWIW, this is my preference as well.
If we change case a: to mean
o == a, we can still compile efficiently when the static type of the switch expression has primitive equality.
I don't see how. The equality called is the one on the runtime type of o, not the static type. Subclasses of classes with primitive equality can have non-primitive equality.
Using
constant == scrutineefor named constant and literal patterns andscrutinee == constantfor==patterns would make for a delighful chapter in Dart Puzzlers, but is not an elegant choice.
Could you elaborate on the issues with this approach? Assuming I'm following all this right, it seems like that's a non-breaking design where case == a: can work as advertised and consistently with other comparison operators, while case a: can retain its existing behavior and still be optimized when a has primitive equality. @lrhn's point above is also fairly appealing - that constant == scrutinee uses a known implementation of == instead of letting an outside scrutinee decide which branch it fits into best.
There'd be a subtle functional difference between the two in cases where an unconventional == operator is in play, but I'm curious why that's less desirable than the other potential trade-offs being considered.
@munificent
if we know the types of both the scrutinee and the constant, and that type is a sealed built-in type, then the compiler knows exactly how that
==behaves.
True. FWIW, I only had to click a few results in code search to find an example of code where the static type of the scrutinee was Object but the constants were all Strings though. Not sure how common this is, but if we go the route of changing this, we'd better do some validation first.
Maybe I'm wrong and there are people doing switches on weird user-defined non-enum scrutinee types where they really want the compiler to give them a jump table, but I'd be surprised.
Now you're just trolling me... :)
For the other cases, I think we need to do something. Using
constant == scrutineefor named constant and literal patterns andscrutinee == constantfor==patterns would make for a delighful chapter in Dart Puzzlers, but is not an elegant choice
I'm actually not convinced of that. I'm actually pretty highly confident that there is more likely to be an entry in Dart Puzzlers that looks like switch (thingOfTypeDynamic) case 1: thingOfTypeDynamic.isEven and asks why this is returning true. That is, I think it is arguably really sketchy as a language design that you can have a literal pattern which matches something which is in all shapes and form not that literal. I think the user expectation of case 1: is not "thing which compares equal to 1 when its equality method is called", but .... "1." Sure, you can argue... don't override equality to do weird things. But if you don't override equality to do weird things, then the inconsistency is irrelevant. In other words, the only cases that matter here are places where equality is not symmetric.
So from my standpoint, I'm currently of the opinion that the most sensible thing is to keep the inconsistency. Both patterns then do exactly what they look like. case a matches exactly a, and case == a matches exactly things which compare equal to a using the obvious order. My only concern is that there isn't a way to match exactly a when it's a named thing, but that's probably ok. 99% of the time, using == a is just fine and in the remainder you can do something like case _ if identical(thing, a) or just use an ordinary conditional, or whatever.
Here's another way of thinking about it that might be helpful. We do not currently define the semantics of if (b) ... to be the same as if (b == true) ... Those are entirely different pieces of code. I don't think anyone finds that to be a "puzzler", do they? Conditionals check whether the argument really truly is the value true. Equality is... a method. That does... whatever the hell it wants.
Currently, we can say that the runtime semantics of if (b) stmts1 else stmts2 is:
switch(b) {
case true: stmts1
case false: stmts2
default: throw "Condition arguments must be booleans"
}
That feels like a natural equivalence to me, and not at all surprising. It does seems surprising to me if those pieces of code are not equivalent. But if we change the order of evaluation on literal patterns, they are. The latter becomes closer to (but not exactly equivalent to) the if (b == true) semantics (it's actually technically more like if (b == true) stmts1 else if (b == false) stmts2 else throw...
I'm actually not convinced of that. I'm actually pretty highly confident that there is more likely to be an entry in Dart Puzzlers that looks like
switch (thingOfTypeDynamic) case 1: thingOfTypeDynamic.isEvenand asks why this is returning true. That is, I think it is arguably really sketchy as a language design that you can have a literal pattern which matches something which is in all shapes and form not that literal. I think the user expectation ofcase 1:is not "thing which compares equal to 1 when its equality method is called", but .... "1." Sure, you can argue... don't override equality to do weird things. But if you don't override equality to do weird things, then the inconsistency is irrelevant. In other words, the only cases that matter here are places where equality is not symmetric.So from my standpoint, I'm currently of the opinion that the most sensible thing is to keep the inconsistency. Both patterns then do exactly what they look like.
case amatches exactlya, andcase == amatches exactly things which compare equal toausing the obvious order. My only concern is that there isn't a way to match exactlyawhen it's a named thing, but that's probably ok. 99% of the time, using== ais just fine and in the remainder you can do something likecase _ if identical(thing, a)or just use an ordinary conditional, or whatever.
Ok, good point. I find this argument pretty compelling.
@leafpetersen wrote:
It does seems surprising to me if those pieces of code are not equivalent.
and (in this comment):
I have some concern that we are removing a performance rail.
My comment here arises from exactly the same concern. In particular, @munificent wrote here:
Does the proposal eliminate the current error for existing switches?
I think it should, yes. Any reason why not?
And the reason could be that
switch(b) {
case true: stmts1
case false: stmts2
default: throw "Condition arguments must be booleans"
}
would have to invoke typeof(b).==:
The latter becomes closer to ..
if (b == true) stmts1 else ...
Presumably, we reach that conclusion because the old style case case e: would be interpreted to mean case == e:, and the latter uses scrutinee == e to perform the test.
This could still be optimized in various ways if the static type of b is bool, but we could also have:
enum E { a, b, c }
void main() {
Object? o = ...;
switch (o) {
case E.a: stms1
case E.b: stms2
default: stms3
}
}
and in that case we would definitely have to use a linear scan, e.g., because typeof(o).== may have side effects.
So, in order to avoid the breaking change and preserve the optimizability of the current switch statements, I think we'd have to determine the match of an old-style case case e: by evaluating e == scrutinee rather than scrutinee == e.
We might also want to use that choice of receiver for case == e:, and perhaps use the syntax case e ==:, for consistency, and because the equality test on e may be more predictable than the equality test on the scrutinee.
@leafpetersen This would mean that:
case SomeConstant(): ...
would do as today, and
case == SomeConstant();
would use the == of the scrutinee.
In the former case, we would then still require primitive equality for the case expressions. (We'd probably have to extend primitive equality to records, but requiring each element to recursively have primitive equality. Not sure what we'll do with structs, but I think that as a rule, they should work the same as records.)
@lrhn agreed. And good point that using primitive equality for records rather than identity is probably what we want.
That is, I think it is arguably really sketchy as a language design that you can have a literal pattern which matches something which is in all shapes and form not that literal.
Well, you can have an expression obj == 1 evaluate to true when obj is an object that is most certainly not the integer 1...
We do not currently define the semantics of
if (b) ...to be the same asif (b == true) ... Those are entirely different pieces of code. I don't think anyone finds that to be a "puzzler", do they? Conditionals check whether the argument really truly is the valuetrue.
FWIW, that's not the case in C++ and C#. For example:
public class Foo
{
String value;
Foo(String value)
{
this.value = value;
}
public static implicit operator bool(Foo foo)
{
return foo.value == "yes";
}
public static void Main()
{
if (new Foo("yes"))
{
Console.WriteLine("yes is true");
}
if (new Foo("no"))
{
Console.WriteLine("no is true");
}
else
{
Console.WriteLine("no is false");
}
}
}
This program is valid and prints:
yes is true
no is false
Dart already has an extremely deeply ingrained notation that classes get to define what equivalence means for them, and our core collection types rely on that for everything from List.contains() to deduplication in set literals to map key lookup.
I wouldn't suggest that conditions in Dart just call == true on their expression, but only because there's no way to make that a static error for types that don't want to be "bool-able" since == on Object accepts any type (alas).
I don't see anything intrinsically wrong with switch cases calling == on either the scrutinee or case constant. But I also don't care a lot about this particular issue. If others think it's important for value equivalence tests in patterns to route around user-defined == methods, I'm fine it.
That is, I think it is arguably really sketchy as a language design that you can have a literal pattern which matches something which is in all shapes and form not that literal.
Well, you can have an expression
obj == 1evaluate totruewhenobjis an object that is most certainly not the integer1..
Yep. I find that odd, but it's a basic design choice, and probably the only one that makes sense. And it provides a lot of value to users to be able to define their own operator ==, and once you choose that it's hard to get out of this consequence without building a very different language. But I don't see the same tradeoffs here. I don't really see any value in saying that case 1: means call equality on the scrutinee. I think the 99% case for users is they want to know "is this thing 1`. So why not make that one work? If they want to call equality on the scrutinee, we give them syntax for that.
We do not currently define the semantics of
if (b) ...to be the same asif (b == true) ... Those are entirely FWIW, that's not the case in C++ and C#. For example:
Your example 100% does not contradict what I said. It is true that C# has implicit conversions but the semantics of conditions is not defined to be if (b == true), unless it works very differently from what I understand. Are you claiming that if (o) is valid C# for any object o, with semantics define by the return value of calling the == operator on o with true as an argument? I do not believe that to be true. The fact that you can define an implicit conversion from whatever to bool is irrelevant to the point here.
C++ may be different, C is zero/non-zero. I'm not inclined to take C as a guiding light here though...
I wouldn't suggest that conditions in Dart just call
== trueon their expression, but only because there's no way to make that a static error for types that don't want to be "bool-able" since==on Object accepts any type (alas).
FWIW, we can perfectly well do this. We can add a method to bool (isTrue), and we can define conditions in terms of the bool-able predicate. I would oppose this change without very strong evidence that this is actually providing value in some way.
Dart already has an extremely deeply ingrained notation that classes get to define what equivalence means for them, and our core collection types rely on that for everything from
List.contains()to deduplication in set literals to map key lookup.
Hmm. Sort of. I'm going to object to the word equivalence here. They don't - they get to define what the == operator does, when called with a non-nullable argument. They don't get to define themselves as "equivalent" to null, for example. They also don't get to define what other things say when they are asked the "equivalence" (really ==) question. == is an operational question: "What does this method return on a non-nullable argument".
I don't see anything intrinsically wrong with switch cases calling
==on either the scrutinee or case constant. But I also don't care a lot about this particular issue. If others think it's important for value equivalence tests in patterns to route around user-defined==methods, I'm fine it.
I have seen no arguments for calling == on the scrutinee for constant patterns actually being useful, and I personally think there's a strong "principle of least-surprise" argument against it.
Dart has two natural equivalences: identity and ==.
And a third, unnatural, one that we use for some constants: primitive equality, which is basically "==, but only when it agrees with identity, or we know the class".
We use primitive equality for switch cases, and it works well today. (Well enough, if anything, at least people seem to understand it enough to not get too confused.)
I can see keeping both:
case MyEnum.value:
case 1:
case "arglebargle":
// primitive equality of constant case expression value
case == MyEnum.value|:
case == 1:
case == "arglebargle":
// runtime scrutinee.== behavior
Both make sense.
Sorry for the long essay. See the summary at the end...
My mental model for patterns is that whenever the pattern represents a constant value, regardless of syntax, the underlying behavior is "match if the value is 'the same as' that value". From that viewpoint, 2, Some.constant, == 2, and == Some.constant are all basically the same kind of pattern since they are all matching against values.
Then it's a question of what underlying operation represents "the same as". The language currently uses constant primitive== value, where primitive== is defined to be roughly value semantics for int and strings and object identity for all other types.
The language also makes it a compiler error to have a constant in a switch case whose runtime type has a user-defined operator == method. (It could in principle allow this and simply not call the == method and instead always directly invoke its own defined "primitive==" operation, but it doesn't. I presume to avoid confusion.) In practice, this means that switches are currently only useful for:
- Integers
- Strings
- Booleans
- Null
- Enums
- User-defined classes that have const constructors and are careful to only ever create constant instances
In particular, this excludes doubles and any user-defined type with "value semantics". As soon as you define == to give it the value semantics you want, you lose the ability to switch over it.
Primitive equality on records
Regardless of what else we do around cases and equality, we have to decide what it means to have records in switch cases. Records do define a ==, so they don't inherit primitive == from Object. Consider:
class Constants {
const origin = (0, 0);
}
var zero = 0;
switch ((zero, zero)) {
case Constants.origin: print('matched');
default: print('no match');
}
What does this do? I can think of three answers:
-
Compile error because
==on record types isn't primitive. -
Prints "match" because we specify
==on record types to be primitive and invoke it here, which returns true if the operand (the matched value) is a record with the same shape and==corresponding fields. -
Has unspecified runtime behavior because we specify
==on record types to be primitive but useidentical()here instead of calling==andidentical()is deliberately loosely specified for records.
Option 3 is clearly out. I don't find option 1 compelling. It seems perverse to say that case (0, 0) is fine, but hoisting that record pattern to a syntactically identical constant becomes a compile error. It's clear what the user wants the code do to.
This suggests that we specify that == on records is also primitive and implements value semantics, just like primitive == on String.
Recursing into record fields
Now consider:
class MyThing {
final int value;
const MyThing(this.value);
/// User-defined non-primitive equality:
bool operator==(other) => other is MyThing and value == other.value;
int get hashCode => value.hashCode;
}
class Constants {
const thingPair = (MyThing(1), MyThing(2));
}
var zero = 0;
switch ((MyThing(1), MyThing(2))) {
case Constants.thingPair: print('matched');
default: print('no match');
}
We can define == on records to be primitive. But the implementation of == on records recurses into the fields and calls == on those. Those might not be primitive. Is that allowed?
-
If so, then we have constant cases that can end up calling a user-defined
==. In that case, is there any reason to keep the restriction that==on the outermost constant object must be primitive? -
If not, how do we prevent that? Do we say specify that a record in a constant pattern must only contain fields whose values have primitive equality (recursively)? That's doable I guess (we do know all the field values at compile time), but feels like a strange sort of restriction.
Given those choices, I think the most useful behavior is to remove the restriction on primitive equality and just use ==, whatever it is. This:
- Lets records have the value semantics users expect in switch cases.
- Makes it possible for user-define value types (i.e. classes with
==and value semantics) be usable in cases. I have painfully run into this limitation before and struggled to work around it. - Probably makes working with data classes / structs / views / whatever more seamless with user-defined value-like classes.
- Let's you switch on doubles.
== operands
The next question is which way the operands to == are ordered. Earlier, I said my mental model is that == 1, 1, and SomeConstant.equalToOne are all conceptually the same. I accept that others feel the == 1 is different.
In that case, I think the reasonable thing is (as suggested above):
-
Use
constant == valuefor literal and named constant patterns. This way, we know at compile time exactly which==method will be dispatched to. Compilers will continue to have all of the optimization opportunities they have today, and we can preserve the existing semantics. -
Use
value == constantfor==patterns. It's the only thing that makes sense given the pattern's syntax.
There are still a couple of holes:
-
If the case is a constant with a short name, there's no way to get
constant == value(short of Paul's very clever but kind of ungainly suggestion ofcase x when constant == x). Usingcase == shortNamecalls it in the wrong order. -
There are other constant expressions like const constructor calls and infix operators on numbers that aren't supported at all (again unless you use a guard clause) (#2408).
For those, I think we should do what Lasse (I think) suggested and add an explicit "constant expression pattern". It's const followed by an expression (probably primary, maybe prefix). It would allow stuff like:
switch (obj) {
case const shortName: ...
case const SomeClass.constConstructor(1, 2): ...
case const (1 + 2): ...
}
It's specified to call constant == value. The existing literal and qualified named constant patterns are essentially just syntactic sugar for constant expression patterns that let you elide the const (which I think is worth supporting).
Proposal summary
- Remove the restriction on primitive equality on constants in switch cases (or now constants in patterns).
- Add constant expression patterns that are
'const' <expression>. - Literal, named constant, and constant expression patterns invoke
constant == value. - Explicit
== <expression>patterns invokevalue == constant.
That addresses this issue as well as #2408 and makes it possible to match by value in switch cases for any kind of user-defined equality while still being able to optimize dispatch based on the known concrete == method being called.
How does that sound?
(tl;dr: This does not affect the proposal, it's entirely about primitive equality, which we still need for other things.)
A constant object having "Primitive equality" means that its == operator can be understood and computed at compile-time for constant values. In some cases, the actual == invocation is only allowed for some operands (int.== is only considered a compile-time constant operation if the second operand is an int too, because otherwise we'd need to compare it to doubles).
We use the "primitive equality" definition for other things than case expressions. In particular, we use it for constant set values and map keys.
I really want to allow const {(1, 2), (1, 3), (2, 3)} to be a valid set. That requires (1, 2) to be included in the types that we accept as having primitive equality. Since (int, int) does override Object.==, it needs to be special cased, and the only reasonable definition is to require the fields to recursively have primitive equality. I want const {(a, b), (a, b)} to be an invalid set literal, because (a, b) == (a, b), which I can check at compile-time precisely when both a and b have primitive equality.
It's the only viable definition that allows constant sets of records, so it's basically a given that this is how primitive equality must work for records.
Whether we require primitive equality for case expressions or not is orthogonal to that.
Whether we use == for case comparisons is related, but still an orthogonal choice to how == works.
We currently do:
void main() {
const int two = 2;
const double twod = 2.0;
const num twon = twod;
switch (twon) {
case two: print("yes"); // Prints
}
}
A case twod: ... would be a compile-time error (on the VM) because doubles do not have primitive equality (on the web, the twod constant value is an int too).
So, for the "Primitive equality on records" section, there is an option 4:
- Constant records have primitive equality iff all their fields have primitive equality.
You allude to that solution in the next section, but say that it "feels like a strange sort of restriction."
It's not. If anything, it's the immediate approach if you think of records not as values by themselves (even if they are), but just a collections of individual values that are temporarily grouped. (Which is what I use as my guiding principle when deciding how records should work - records should not get in your way for doing things to records of values that you could do on the individual values, they should just help you do it in parallel).
It means that if you can do a == c && b == d as a constant operation, then you can do (a, b) == (c, d) (and it has the same result).