Distributed.jl icon indicating copy to clipboard operation
Distributed.jl copied to clipboard

remotecall_fetch vulnerable to hash collisions?

Open nalimilan opened this issue 7 years ago • 6 comments

I've bumped into this behavior by accident after defining hash incorrectly for arrays. I'm absolutely not sure it indicates a bug, but I figured I'd better report it since this kind of exceptional situation is rarely tested.

It appears that when hash collisions happen, remotecall_fetch is not able to detect that the contents of an array have been updated. This can easily be reproduced by always returning the same hash for all arrays. This is surprising to me, as I would have expected that differing hashes are sufficient, but not necessary, to consider that two arrays are different, i.e. that isequal would always be called to check for hash collisions.

julia> using Distributed

julia> Base.hash(::AbstractArray, ::UInt) = zero(UInt)

julia> x = [1]
1-element Array{Int64,1}:
 1

julia> remotecall_fetch(()->x, 2)
1-element Array{Int64,1}:
 1

julia> x[1] = 2
2

julia> remotecall_fetch(()->x, 2) # Woops, not updated
1-element Array{Int64,1}:
 1

nalimilan avatar Dec 21 '17 06:12 nalimilan

I wonder if there's some fallback somewhere from when arrays were hashed by identity because they're mutable. That would manifest like this and should definitely be changed, if so.

StefanKarpinski avatar Dec 21 '17 07:12 StefanKarpinski

It happens here where the logic for moving globals is implemented.

cc: @amitmurthy

andreasnoack avatar Dec 21 '17 10:12 andreasnoack

That block of code is an optimization to avoid serializing global objects if unchanged. Relevant especially in the case of largish arrays. Short of removing the optimization and serializing all referenced globals (except consts) each time, I am at a loss how to fix this - we cannot compare against the previous state of the object.

amitmurthy avatar Dec 21 '17 15:12 amitmurthy

Would using object_id make sense here?

nalimilan avatar Dec 21 '17 15:12 nalimilan

No, object_id of x is unchanged in your example above - it is the same object. The global is serialized if either object_id or the hash value of the binding changes between remotecall invocations.

amitmurthy avatar Dec 21 '17 15:12 amitmurthy

Can we do something about this? It really sounds like a recipe to create very rare bugs that will be terribly hard to reproduce for users.

Moreover, if we merge JuliaLang/julia#26022, collisions will become quite likely if you change just a few entries in an array. Cc: @mbauman

nalimilan avatar Jul 14 '18 10:07 nalimilan