Unitful.jl icon indicating copy to clipboard operation
Unitful.jl copied to clipboard

Hashing of `Quantity`s

Open sostock opened this issue 3 years ago • 6 comments

A discussion about the implementation of hash was started in #378. While that issue has a narrower scope (hashing of Quantity{BigFloat} where the BigFloats have the same values (==) but are not identical (===)), hashing should generally be consistent with isequal (see the hash docstring), i.e., hash(1u"m") == hash(100u"cm").

Implementing such a hash method would potentially be very breaking (for example, right now one can have a Dict with both 1u"m" and 100u"cm" as keys), so it is something to consider for a 2.0 release.

Some observations/thoughts related to such a hash implementation:

  • To implement such a hash function, a quantity would have to be converted to a “standard unit” (probably upreferred). However, floating-point quantities might break this, since isequal does not necessarily convert its arguments to upreferred (it uses promote, which only falls back to upreferred for FreeUnits). I can think of two possible solutions to this:
    • Change isequal to always convert to upreferred.
    • Use TwicePrecision (is it sufficient?).
  • The Dates stdlib already implements hashing like this: hash(Second(60)) == hash(Minute(1)). However, they don’t have to deal with floating-point numbers, which makes it easier. If comparison between Quantitys and Periods is added to Unitful (see #331), hashing of Quantitys should also be consistent with hashing of Periods.

sostock avatar Sep 07 '20 08:09 sostock

However, floating-point quantities might break this, since isequal does not necessarily convert its arguments to upreferred (it uses promote, which only falls back to upreferred for FreeUnits). I can think of two possible solutions to this:

That's a good point, but can we construct a realistic scenario where this consideration is a problem?

cstjean avatar Sep 07 '20 15:09 cstjean

However, floating-point quantities might break this, since isequal does not necessarily convert its arguments to upreferred (it uses promote, which only falls back to upreferred for FreeUnits). I can think of two possible solutions to this:

That's a good point, but can we construct a realistic scenario where this consideration is a problem?

Maybe the following: I can define a dictionary d = Dict{typeof(1.0u"cm"), Int}(). Then I push a value which has a different unit: d[1u"inch"] = 1. The key would get converted to u"cm". Now I might not be able to retrieve the value using d[1u"inch"] because the key (after conversion) and 1u"inch" have different hashes.

Edit: In this specific example (1.0u"cm" and 1u"inch"), the key actually cannot be inserted into the Dict because convert(typeof(1.0u"cm"), 1u"inch") is not equal to 1u"inch" (setindex! checks for that). But there could be another example where the quantities are equal but their hashes differ.

sostock avatar Sep 07 '20 17:09 sostock

I can think of two possible solutions to this:

  • Change isequal to always convert to upreferred.
  • Use TwicePrecision (is it sufficient?).

Changing isequal to always convert to upreferred seems not too bad to me. I actually think there is no good reason not to do this. The point of ContextUnits and FixedUnits is to customize promotion behavior, but isequal only uses the promoted value internally, so it should not matter.

The only problem that I see with it is that it is wasteful if both quantities already have the same unit. Maybe we don’t have to do the conversion in this case.

But I think I would also be in favor of using TwicePrecision for all floating-point conversion factors, independently of the hashing issue, if the performance isn’t too bad.

sostock avatar Sep 07 '20 19:09 sostock

Implementing such a hash method would potentially be very breaking (for example, right now one can have a Dict with both 1u"m" and 100u"cm" as keys), so it is something to consider for a 2.0 release.

I'd say that it's a bugfix if it's done...

The only problem that I see with it is that it is wasteful if both quantities already have the same unit. Maybe we don’t have to do the conversion in this case.

(edited because math) I agree that we wouldn't need the conversion for same-unit quantities, which makes it acceptable for performance IMO. I wonder if there are other issues.

cstjean avatar Oct 14 '20 08:10 cstjean

~Wait; the requirement is that x == y implies hash(x) == hash(y), but the converse need not be true. Thus, we don't need upreferred in isequal. We just need upreferred in hash, which seems entirely acceptable to me.~

EDIT: nevermind, with this proposal if x and y are of different units it's possible to have isequal(x, y) && hash(x) != hash(y).

cstjean avatar Oct 14 '20 08:10 cstjean

the requirement is that x == y implies hash(x) == hash(y)

I thought that was isequal, not ==. They're different

giordano avatar Oct 14 '20 08:10 giordano