DataStructures.jl icon indicating copy to clipboard operation
DataStructures.jl copied to clipboard

Check dict hashing issue

Open kmsquire opened this issue 8 years ago • 9 comments

It's likely that OrderedDicts, as implemented here, are susceptible to JuliaLang/julia#15077. This needs to be tested and handled.

Related: #221

kmsquire avatar Nov 22 '16 17:11 kmsquire

Does #221 make this worse? From the discussion, it seems like the fix will be a more clever rehash! and the testing solution is set and get a ton of keys (rather than trying to deduce the steps that would cause the problem). I'd be happy to contribute the tests ...

phaverty avatar Nov 22 '16 21:11 phaverty

Well, #221 might have to be simplified in order to address this.

I think we can trigger the bug by adding enough keys to get the slot array to the right size (256), and then adding the sequence of keys used in the test for the Julia issue. Or rather, the total number of keys needs to be at least 171 (2/3 of 256).

kmsquire avatar Nov 23 '16 02:11 kmsquire

I encountered a problem that suggests OrderedDict and/or OrderedSet are somehow broken. https://github.com/bicycle1885/Automa.jl/pull/1

bicycle1885 avatar Jan 07 '17 21:01 bicycle1885

To clarify it, I encountered a problem that doesn't happen when using Base.Dict and Base.Set on Julia 0.5 but happens when using DataStructures.OrderedDict and DataStructures.OrderedSet. Tests fail sometimes, not always, say 10-30% of probability. Automa.jl creates many dictionaries and sets that have object keys like NFANode and DFANode. There is no randomness in it except hash values of objects, which is taken from object_id. If ordered dicts and sets were drop-in replacement of usual dicts and sets, this problem won't happen. So I think this is due to DataStructures.jl.

bicycle1885 avatar Jan 07 '17 21:01 bicycle1885

Don't know if this is the same thing, but I'm encountering missing keys accessed directly from the SortedDict. I have a custom concrete struct (Virus{UInt} that is immutable, and isbits. I don't know how to paste the exact data in for a bug report. But it doesn't happen when I repackage in a standard Base.Dict. Looks like this

julia> lt.root_nodes
SortedDict{Virus{UInt64}, UInt64, Base.Order.ForwardOrdering} with 11 entries:
...

julia> (v1,l1) = first(lt.root_nodes)
Virus{UInt64}(0x0000000000000002, 0x00000000000097dd, 0x00000000000097e0, 976.4969064069165, 977.7852658472708) => 0x0000000000000004

julia> haskey(lt.root_nodes,v1)
false #But you just gave this to me!

julia> dtest = Dict(lt.root_nodes...) # repackage in Dict
Dict{Virus{UInt64}, UInt64} with 11 entries:

julia> dtest[v1]
0x0000000000000004 # now it works great

chelate avatar Sep 21 '22 15:09 chelate

This is a different issue. Could you repost it as a new issue in DataStructures.jl. And, if possible, post as much information as you have about what is triggering the bug. Ideally you could post a self-contained code that triggers the bug.

Note that SortedDict will not work properly unless your custom ordering satisfies the usual rules for an ordering: for any two keys a,b, exactly one of the relations a<b, a=b, b<a holds, and the relation < must be transitive.

StephenVavasis avatar Sep 21 '22 15:09 StephenVavasis

ok figured. Sorry for clogging up this delicate deep hashing issue. P.S. I'm extending Base.isless, but will try Base:< Base:>, Base:== instead.

Edit: Yes my custom Base.isless was only partially ordered which lead to missing keys. Works fine when I resolved ties. Thanks @StephenVavasis! Leaving this in case it helps some other clumsy person.

chelate avatar Sep 21 '22 16:09 chelate

I understand that this isn't a problem anymore (https://github.com/JuliaCollections/OrderedCollections.jl/commit/d3facfd32c4a74c51196d79f863e617f3ecce74e)

laborg avatar Oct 04 '23 18:10 laborg

Oh, looking at this a second time I think we can't yet close this.

laborg avatar Oct 04 '23 18:10 laborg