rhombus-prototype icon indicating copy to clipboard operation
rhombus-prototype copied to clipboard

What do we do about equality?

Open jackfirth opened this issue 4 years ago • 134 comments

Racket newcomers are immediately confronted with four equality operators:

  • equal?, which does what you'd expect most of the time
  • =, which does what you'd expect if you're used to IEEE numbers
  • eq?, which looks like "the magic fast pointer comparison / reference equality operator"
  • eqv?, which just looks bizarre

I think we should simplify this in ~~racket2~~ Rhombus. Here's a hypothetical straw proposal:

  • Keep equal? and =
  • Get rid of eqv?
  • Rename eq? to something like same-object-reference? and tuck it away in the corner of the docs, to make it clearer that you should only use this if you have a very good reason to

Is this feasible? Can we do better?

jackfirth avatar Jul 15 '19 16:07 jackfirth

It might be worth considering a notion of equality that recurs through immutable data structures like equal? does, but uses reference equality on mutable data structures.

Same idea as egal? from Rackjure and Equal Rights for Functional Objects

AlexKnauth avatar Jul 15 '19 16:07 AlexKnauth

There's a syntax part of this, a semantics part, and a library part.

Regarding semantics of eq?, Robby has long advocated the idea that programs should have the same functional correctness if all (eq? x y) is replaced with #f. The idea is that eq? is purely a kind of optimistic optimization where the slow path is required to be present.

jeapostrophe avatar Jul 15 '19 16:07 jeapostrophe

regarding eq? would same? be better? the racket reference uses same to describe eq?;

Return #t if v1 and v2 refer to the same object, #f otherwise.

https://docs.racket-lang.org/reference/booleans.html?q=eq%3F#%28def.%28%28quote.~23~25kernel%29._eq~3f%29%29

spdegabrielle avatar Jul 15 '19 21:07 spdegabrielle

regarding eq? would same? be better? the racket reference uses same to describe eq?;

Return #t if v1 and v2 refer to the same object, #f otherwise.

https://docs.racket-lang.org/reference/booleans.html?q=eq%3F#%28def.%28%28quote.~23~25kernel%29._eq~3f%29%29

The name we pick for eq? ought to be longer / more complex than equal?, to nudge people towards using equal? by default. So while I like using the word "same" for this, I think same-object? (or something similar) would be much clearer than same?.

jackfirth avatar Jul 15 '19 22:07 jackfirth

+1 for same-object?. Much better than same?.

spdegabrielle avatar Jul 15 '19 22:07 spdegabrielle

  1. Why don't we make it consistent: use =? instead of =? We do have things like free-identifier=? already anyway (yes, I'm also up for <?, <=?, etc.).
  2. Is there a case where (equal? a b) and (= a b) don't behave the same when a and b are numbers? If not, why don't we also merge them into one. I.e., in general use =? (which is currently equal?). If you want a contracted version, just prefix it: number=?, string=?, etc.
  3. object-equal? or reference-equal? also works for eq?'s new name. No need to invent a new name (same).

sorawee avatar Jul 16 '19 02:07 sorawee

To answer (2), yes, for (equal? 1 1.0) => #f vs (= 1 1.0) => #t. Exact-integers and floats-that-happen-to-be-integers are different racket values, but are equivalent numbers under =

AlexKnauth avatar Jul 16 '19 05:07 AlexKnauth

My primary use for eq? is with symbols, especially as keys in eq?-based hash-tables. My impression is that there is a notable performance benefit.

LiberalArtist avatar Jul 16 '19 05:07 LiberalArtist

My primary use for eq? is with symbols, especially as keys in eq?-based hash-tables. My impression is that there is a notable performance benefit.

Here, here. eq? (and intern) are in the original Scheme paper, and I'm having a hard time understanding (Jay's account of) Robby's idea.

Is he saying eq? is misused so often that it is effectively useless, or that it is actually useless for some technical reason?

dedbox avatar Jul 16 '19 18:07 dedbox

eq? interferes with optimizations because it exposes pointer identities, so you cannot, for example, create copies of immutable objects. Similarly, it interferes with providing safety because you can tell the difference between an object and its chaperone.

jeapostrophe avatar Jul 16 '19 18:07 jeapostrophe

Something that I think is weird is that an immutable string can be equal? to a mutable string, but an immutable hash can't be equal? to a mutable hash. (I understand the reason at the C implementation, but I think that the difference should not leak to the racket level.)

gus-massa avatar Jul 16 '19 21:07 gus-massa

Mutable objects are a strange case. If I have two vectors v and w, it's very important to know whether or not they refer to the same underlying storage: meaning whether or not vector-set! on v changes the result of vector-ref on w. This is different from knowing if v and w are eq?, because a vector may not be eq? to its own chaperone but they both refer to the same storage. Today, equal? on mutable objects just considers whether they have the same contents - not the same identity (which, due to chaperones, is not the same as just being eq?). I think this is very confusing, and that we should change equal? on mutable objects to only return true if they refer to the same mutable storage. This would communicate that the identity of a mutable object is an important part of its semantics.

jackfirth avatar Jul 17 '19 01:07 jackfirth

As another point in the design space, Pyret has several notions of equality. In summary:

Name Operator Partial Predicate Total Predicate Similar To
Equal Now =~ equal-now equal-now3 equal? (Racket) == (Python, Ruby)
Always Equal == equal-always equal-always3 = (Ocaml)
Identical <=> identical identical3 eq? (Scheme) == (Ocaml) === (JavaScript) is (Python) == (Java)

IIUC, Racket lacks a general "Always Equal" for mutable values: a test to identify that mutations to one value will also mutate the other. Currently, eq? can only say "yes" or "idk": it is foiled by chaperones and impersonators. The default equal? for opaque structs does this, but sacrifices "Equal Now".

LiberalArtist avatar Jul 17 '19 01:07 LiberalArtist

Re @jeapostrophe's comment on eq?: I agree that we should avoid exposing pointer identities (at least in the safe/high-level language) and we should allow the implementation to be aggressive as it wants at optimizing storage for immutable objects. I hope we can find a way to do that without degrading performance for things that are now idiomatic and correct uses of eq?, particularly with symbols.

I have a vague notion that the cost of equal? vs. eq? is especially notable in hash tables. With a statically-visible symbol, equal? can just be optimized to eq?, but AIUI it triggers more expensive double hashing in hash tables.

Another use-case is weak eq?-based hash tables to cache expensive computations that might be repeated: if the hash-table lookup becomes more expensive, that might negate the benefit of using the cache in the first place.

But maybe we will come up with an equality test that avoids the problems with eq? but still has good performance.

LiberalArtist avatar Jul 17 '19 01:07 LiberalArtist

eq? interferes with optimizations because it exposes pointer identities, so you cannot, for example, create copies of immutable objects. Similarly, it interferes with providing safety because you can tell the difference between an object and its chaperone.

For this reason exactly, I think it'd be good to introduce a prop:has-hashable-object-identity property that new user-defined data structures don't have to implement (even if every value up to now has done so).

And, uh, I think this would eventually motivate migrating to yet another list representation, this time using cons cells that are not mutable and do not have object identity. But maybe one step at a time...

With that in mind, this is how I figure the equality functions can be renamed and recontextualized:

  • eq? -> equal-cheap, an equality check that only works on values that can be compared without traversing them. All values that have object identity support this since the object identity is the only part that needs to be checked.
  • (did not exist) -> equal, a new deep equality check that not all values support, but that is actually guaranteed to tell the truth about indistinguishability for values that do. Again, this any value that has object identity can be compared without traversing into it, so this operation is equivalent to eq? in Racket 1, and that's why it doesn't exist yet.
  • equal? -> custom-hashable-equal, which no longer accepts all values, but just those values which have object identity, have support for the new equal, or have prop:equal+hash. The name might tip people off to the gotchas of using this kind of equality, since programmatic behavior in an untyped language is unlikely to enforce algebraic laws.
  • = -> numbers-are-actual-numbers-and-have-been-rounded-to-be-equal. This rather elaborate name clarifies that this operation is only for numbers, that it returns #f if there are any NaNs, and that it isn't comparing real numbers, but only roundings thereof, which may have become arbitrarily imprecise.
  • eqv? -> (retired, possibly renamed to number-content-or-character-content-or-other-value-object-identity-is-equal)

rocketnia avatar Jul 17 '19 04:07 rocketnia

I think in nearly all cases where eq?-based hash tables are used, they're used with quoted symbol keys. In these cases, instead of exposing pointer equality we can just point users to a special purpose data structure that assumes symbol-like keys such as Rebellion's records. This also makes it easier to optimize further than what is possible with a generic hasheq data structure that assumes nothing about its keys.

I really think we should avoid introducing additional equality operators and concentrate on changing or removing the existing ones. I'd much rather change equal? to treat mutable objects with different identity as unequal than introduce a new equality operator with this behavior. If I wanted to compare two mutable vectors on their contents alone, the first approach I'd reach for would be (equal? (vector->immutable-vector vec1) (vector->immutable-vector vec2)) rather than read through the docs on the various equality operators until I'm sure I have the one that does what I want.

jackfirth avatar Jul 17 '19 04:07 jackfirth

For an equality function that uses structural equality on immutable values, but reference equality on mutable values, is chaperone-of? the function that currently does that?

I would propose making equal? more like Rackjure's egal? or Pyret's equal-always, while renaming the existing structural-on-mutable equal? to equal-now?.

However if chaperone-of? already behaves like an "equal-always" predicate, then that might be as simple as renaming chaperone-of? to equal? while renaming the old equal? to equal-now?. Is this right?

If chaperone-of? behaves like an "equal-always", then the only difference I can think of between chaperone-of? and egal? is on streams, where egal? says streams are egal if they have egal elements (potentially diverging on infinite streams), where chaperone-of? would return false.

(Edit: okay it wouldn't quite be that simple, there would also need to be chaperone-of-based hash-tables, renamed to equal-based hash-tables along with it) (Edit 2: apparently chaperone-of? isn't symmetric, so I suppose the new "equal-always" version of equal? should be equivalent to (or (chaperone-of? a b) (chaperone-of? b a)), except it shouldn't have to traverse the data twice) (Edit 3: so just or-flipping on the outside won't be enough, the or-flip needs to happen on every nested place because it could need different directions in different parts of the data) (Edit 4: so just or-fliiping-and-traversing won't be enough, because if x2 and x3 are both chaperones of x1, they're unrelated by chaperone-of? in both directions)

AlexKnauth avatar May 17 '20 00:05 AlexKnauth

In the current Rhombus:

  • = is for variable/object mutation (set!, vector-set!, etc.).
  • == is for number comparison (=)
  • === is for equal?

I think this is not ideal.

  • No other languages use == strictly for number comparison, so it would be very confusing to people who come from other languages.
  • === is not ergonomic. It's used everywhere, and three keystrokes cost too much.
    • = for mutation seems like a good target to be removed, since it's much less frequently used compared to other operations.

My preference is:

  • := for mutation (from Discord, @plane suggests that <- is also another good candidate)
  • = for number equality (from Discord, @soegaard suggests that this could be generalized to eqv?, though I personally think eqv? should not exist)
  • == for equal?
  • === for eq?

sorawee avatar Nov 19 '21 17:11 sorawee

One potential confusion with == as equal? is that 1 is not equal? to 1.0. (That's one reason why == isn't currently equal?.)

Maybe == could be numeric equality when both arguments are numbers and equal? otherwise.

mflatt avatar Nov 19 '21 18:11 mflatt

What if instead of making == mean numeric equality on numbers, we just made numbers read as exact decimals by default? Then 1 == 1.0 would be true even if == means equal?.

jackfirth avatar Nov 26 '21 01:11 jackfirth

@jackfirth What should be the answer of

1 == exact_to_inexact(1)

?

sorawee avatar Nov 26 '21 01:11 sorawee

I'd be fine with that being false, and telling people if they want to mix comparisons of exact and inexact values, they could use this:

is_numerically_equal(1, exact_to_inexact(1))

Exactness-insensitive numerical equality tests make up a very tiny fraction of all equality tests. I'm not too worried about them requiring some explicitness.

jackfirth avatar Nov 26 '21 01:11 jackfirth

Hmm, I'm not too sure that's a good thing, as quite likely a large portion of the users (in particular students) will expect this result to be true. This can lead to serious silent bugs too.

I would be okay with 1 == exact_to_inexact(1) raising an exception instead.

Metaxal avatar Nov 26 '21 09:11 Metaxal

I think of it as a type mismatch. Two things of different disjoint types can't be equal, similar to how some_number == some_string is allowed, but pointless. It's important that it's allowed so that things like x == some_gensym work correctly. But we could lean on static analysis to detect cases where equality tests are guaranteed to be false.

jackfirth avatar Nov 26 '21 10:11 jackfirth

My current(*) personal preference:

  • set, := or <- for assignment
  • = for numeric equality (1 = 1. is true).
  • == for value equality within the same type, and exception raised if types are different
  • equal? for type equality, then value equality
  • object-equal? or ref=? for eq? (these names aren't great, but at least it's intuitive)
  • eqv? does not exist

Roughly based on the principle of least surprise: names should be intuitive enough and not do what you would not expect them to do (which is less stringent than "they should do what you expect them to do").

(*) Ask me next month for different answers.

Metaxal avatar Nov 26 '21 11:11 Metaxal

=, ==, and equal? all having slightly different semantics is not a good state of affairs. I think there should only be one symbolic equality operator and it should have the same meaning as whatever is_equal means in Rhombus (which may or may not be the same as equal? in Racket.) Any other forms of equality should have descriptive names like is_numerically_equal or is_same_object and should not have symbolic operators associated with them. We want equality to have a single clear default meaning, with other notions of equality given names that say how they're different from the default meaning.

jackfirth avatar Nov 26 '21 11:11 jackfirth

What if instead of making == mean numeric equality on numbers, we just made numbers read as exact decimals by default?

That would definitely create surprises, given how much more expensive exact arithmetic is.

Exactness-insensitive numerical equality tests make up a very tiny fraction of all equality tests.

I'm not at all sure this is true, and I think this must depend on what kind of program you're writing.

While there are pitfalls to the convention that . means a floating point number and to conventions for implicit coercions between them, it's a pretty good compromise for the most part between performance and real arithmetic. I haven't seen the combination as a big source of bugs in our libraries. (It is a source of confusion for beginners in practically all languages, and this is one of those places where a different choice may be appropriate in a pedagogical language like BSL.)

More generally, I'm surprised at how often opinions in this thread dismiss the status quo, as if the only reason Racket has =, eq?, and equal? the way they are and in widespread use is because we've been too lazy to try anything else. Equality is tricky. Picking a small number of equality operators and allocating names to encourage the best ones — that sounds tractable and a good idea from our past experience. Trying to boil it down to just one equality operator seems at odds with experience across languages.

mflatt avatar Nov 26 '21 13:11 mflatt

My current(*) perference:

  • == is Racket's equal? (because that's usually what you want, and it's a good choice for match)
  • === is Racket's eq? (because it really is a common operation, however much we might prefer a different universe)
  • .= is Racket's = (because I think traditional number comparison is common enough to merit a compact name)

Meanwhile:

  • := is assignment (because Pascal was my second language)
  • = is not a predefined operator, although it is used in places like providing the default for an optional argument

I still worry that the behavior of == as equal? on numbers will end surprising to some, but it's the least-bad effect I see in various compromises. Also, ending up with Racket's set of equality operators seems like a big benefit.

(*) Ask me again after you ask @Metaxal.

mflatt avatar Nov 26 '21 13:11 mflatt

In Discord (and above comments), @AlexKnauth has been advocating for == for egal? over equal?. egal? is probably not ergonomic unless most things are immutable (strings in particular), but it looks like we are heading in that direction anyway?

sorawee avatar Nov 26 '21 18:11 sorawee

I've been calling it "equal always" to distinguish it from "equal now", and I was thinking of having 2 separate operators for them. So my current preference:

  • =~ for "equal now", which is what Racket's equal? does.
  • == for "equal always", somewhat similar to Racket's chaperone-of? but symmetric.
  • === for "identical", which is what Racket's eq? does.

With == equal always being blessed as the preferred notion of equality, the default for list operations such as member and remove-duplicates, the one used to enforce chaperone compliance, and the one used for key comparison in the most convenient hash-tables / maps / dictionaries / sets.

Meanwhile:

  • := for assignment.
  • = for numeric equality would be nice, but I suppose I'd also like =. or .=. Slightly prefer =. over .= for number comparison.

AlexKnauth avatar Nov 26 '21 19:11 AlexKnauth