Unitful.jl
Unitful.jl copied to clipboard
Should hashing of `Quantity{BigFloat}` behave differently to those of `BigFloat`'s?
Hashing has a different behavior for "bare" BigFloat's and unit-ed Quantity{BigFloat}'s. For the former, hash is unaffected by BigFloat's data address in memory, stored in its d field; while for the latter, hash do seem to take the data memory address into account, so that equal numerical values hash the same for "bare" numbers, irrespectively of memory location, and can hash differently for same unit-ed Quantity-ies:
Take a, b, and c which are numerically equal to BigFloat(1//7), but a.d === c.d and a.d !== b.d:
julia> [a.d, b.d, c.d] # a and c are the same, but b resides in a different location
3-element Array{Ptr{UInt64},1}:
Ptr{UInt64} @0x00007fc23e2a2798
Ptr{UInt64} @0x00007fc23f92d6d8
Ptr{UInt64} @0x00007fc23e2a2798
julia> a == c == b # They have the same values...
true
julia> a === c !== b # ...but are NOT identical (as === performs bit-level comparison)
true
julia> map(hash, [a, b, c]) # The bare numbers hash the SAME...
3-element Array{UInt64,1}:
0x861a441bf6ee33e9
0x861a441bf6ee33e9
0x861a441bf6ee33e9
julia> map(x -> hash(x * u"m"), [a, b, c]) # ...however, their Quantity-ies do not!
3-element Array{UInt64,1}:
0x6bc1521c2c63173d
0xe8f1b97e24b56c91
0x6bc1521c2c63173d
So, should hashing of Quantity{BigFloat} behave differently to those of BigFloat's?
I see it as inconsistent behavior, that may lead to false negative tests that work just fine for Quantity{Float64}, Quantity{Float32}, etc.
Note that there is no special code for hashing in this package, as far as I can see, but hash(::BigFloat) follows a different code path
I think, the behavior for Quantity{BigFloat} should be fixed, since isequal(x,y) should always imply hash(x) == hash(y).
However, that means hash(1u"m") == hash(100u"cm") should also hold. The Dates stdlib already does this: hash(Second(60)) == hash(Minute(1)). I would be in favor of implementing hash like that (for Unitful 2.0, since it is potentially very breaking), but it seems rather difficult, since every value would have to be converted to a “standard unit” for hashing. Floating-point arithmetic might then lead to examples where hash(x) != hash(y) even though isequal(x,y).
In my humble opinion, both hash and === operate at bit level, and should be able to detect representation differences.
The memory address in which a datum is stored is immaterial to the datum itself — this is what introduces the difference in behavior in BigFloats from the other AbstractFloat concrete subtypes in the first place, indicating that hash(BigFloat) method ignores BigFloat's .d field, which is the case, as hash(::BigFloat) calls hash(::Any), which calls hash(::Real, ::UInt), which decomposes the number into num, pow, den., and finally decompose has specialized methods for each underlying AbstractFloat concrete subtype (in "... julia/base/hashing2.jl").
So whether one 0.1m and other identical 0.1m values are stored in different places in memory (and the underlying DataType happens to store this info in its structure) should be irrelevant for ==, ===, and hash comparisons (recalling that this also follows what is already established for such "bare" numbers).
Now, the case of comparing 0.1m and 10.0cm, they indeed have (i) the same physical value, but (ii) different representations, so it seems to me that such quantities should pass an == equality test, but fail both === and hash equality tests. Otherwise, one can have instances in which unit(a) == unit(b) returns false while a === b returns true! — which is way less puzzling/surprising than two bare BigFloat's failing a.d == b.d while passing a === b.
Evidently the different unit case exceeds the scope of the original question (in which units and representations are kept the same), and seems to be a topic for further debate.
Note that there is no special code for hashing in this package, as far as I can see, but
hash(::BigFloat)follows a different code path
Based on my previous comment and analysis of Julia's "base/hashing2.jl" source, the Unitful.jl package could specialize hash methods by (i) integer hashing either (debate topic) the unit or the dimension, and then (ii) hashing the numerical value with the previous hash as second argument, which falls back to julia's hashing functions:
julia> a = BigFloat("0.1") * u"m"
0.1000000000000000000000000000000000000000000000000000000000000000000000000000002 m
julia> b = BigFloat("0.1") * u"m"
0.1000000000000000000000000000000000000000000000000000000000000000000000000000002 m
julia> a.val.d == b.val.d
false
julia> function myHash(x)
h = UInt('U') # UInt magick value / could be something else
h = hash(unit(x), h)
return hash(x.val, h)
end
myHash (generic function with 1 method)
julia> myHash(a)
0x819956baf0640cc6
julia> myHash(b)
0x819956baf0640cc6
julia> [hash(a), hash(b)] # Current fallback
2-element Array{UInt64,1}:
0x26aa3c69c475a124
0x08b21fe7b9bd6711
julia> [hash(a.val), hash(b.val)] # Bare values's num, pow, den-based hash
2-element Array{UInt64,1}:
0x049e46df46389711
0x049e46df46389711
Now, the case of comparing
0.1mand10.0cm, they indeed have (i) the same physical value, but (ii) different representations, so it seems to me that such quantities should pass an==equality test, but fail both===andhashequality tests.
They will obviously fail ===, but they should not fail hash, since the behavior of hash should be consistent with isequal. The behavior of === is also completely irrelevant to this discussion (and you cannot customize it anyway).
Otherwise, one can have instances in which
unit(a) == unit(b)returnsfalsewhilea === breturnstrue!
No, this is literally impossible (unless one changes unit so that its return value depends on some global state and not just the passed argument).
Sadly, a hash method that is consistent with isequal seems difficult to implement in this case. In my opinion, your proposed hash implementation (or something very similar) should be added for Unitful 1.x.y, and an implementation that conforms to the documented hash behavior is something to consider for Unitful 2.0.
I started a discussion about hashing quantities with different units here: #379.