Unitful.jl
Unitful.jl copied to clipboard
Hashing of `Quantity`s
A discussion about the implementation of hash was started in #378. While that issue has a narrower scope (hashing of Quantity{BigFloat} where the BigFloats have the same values (==) but are not identical (===)), hashing should generally be consistent with isequal (see the hash docstring), i.e., hash(1u"m") == hash(100u"cm").
Implementing such a hash method would potentially be very breaking (for example, right now one can have a Dict with both 1u"m" and 100u"cm" as keys), so it is something to consider for a 2.0 release.
Some observations/thoughts related to such a hash implementation:
- To implement such a
hashfunction, a quantity would have to be converted to a “standard unit” (probablyupreferred). However, floating-point quantities might break this, sinceisequaldoes not necessarily convert its arguments toupreferred(it usespromote, which only falls back toupreferredforFreeUnits). I can think of two possible solutions to this:- Change
isequalto always convert toupreferred. - Use
TwicePrecision(is it sufficient?).
- Change
- The
Datesstdlib already implements hashing like this:hash(Second(60)) == hash(Minute(1)). However, they don’t have to deal with floating-point numbers, which makes it easier. If comparison betweenQuantitys andPeriods is added to Unitful (see #331), hashing ofQuantitys should also be consistent with hashing ofPeriods.
However, floating-point quantities might break this, since isequal does not necessarily convert its arguments to upreferred (it uses promote, which only falls back to upreferred for FreeUnits). I can think of two possible solutions to this:
That's a good point, but can we construct a realistic scenario where this consideration is a problem?
However, floating-point quantities might break this, since
isequaldoes not necessarily convert its arguments toupreferred(it usespromote, which only falls back toupreferredforFreeUnits). I can think of two possible solutions to this:That's a good point, but can we construct a realistic scenario where this consideration is a problem?
Maybe the following: I can define a dictionary d = Dict{typeof(1.0u"cm"), Int}(). Then I push a value which has a different unit: d[1u"inch"] = 1. The key would get converted to u"cm". Now I might not be able to retrieve the value using d[1u"inch"] because the key (after conversion) and 1u"inch" have different hashes.
Edit: In this specific example (1.0u"cm" and 1u"inch"), the key actually cannot be inserted into the Dict because convert(typeof(1.0u"cm"), 1u"inch") is not equal to 1u"inch" (setindex! checks for that). But there could be another example where the quantities are equal but their hashes differ.
I can think of two possible solutions to this:
- Change
isequalto always convert toupreferred.- Use
TwicePrecision(is it sufficient?).
Changing isequal to always convert to upreferred seems not too bad to me. I actually think there is no good reason not to do this. The point of ContextUnits and FixedUnits is to customize promotion behavior, but isequal only uses the promoted value internally, so it should not matter.
The only problem that I see with it is that it is wasteful if both quantities already have the same unit. Maybe we don’t have to do the conversion in this case.
But I think I would also be in favor of using TwicePrecision for all floating-point conversion factors, independently of the hashing issue, if the performance isn’t too bad.
Implementing such a hash method would potentially be very breaking (for example, right now one can have a Dict with both 1u"m" and 100u"cm" as keys), so it is something to consider for a 2.0 release.
I'd say that it's a bugfix if it's done...
The only problem that I see with it is that it is wasteful if both quantities already have the same unit. Maybe we don’t have to do the conversion in this case.
(edited because math) I agree that we wouldn't need the conversion for same-unit quantities, which makes it acceptable for performance IMO. I wonder if there are other issues.
~Wait; the requirement is that x == y implies hash(x) == hash(y), but the converse need not be true. Thus, we don't need upreferred in isequal. We just need upreferred in hash, which seems entirely acceptable to me.~
EDIT: nevermind, with this proposal if x and y are of different units it's possible to have isequal(x, y) && hash(x) != hash(y).
the requirement is that
x == yimplieshash(x) == hash(y)
I thought that was isequal, not ==. They're different