Dictionaries.jl icon indicating copy to clipboard operation
Dictionaries.jl copied to clipboard

Behavior seems to depend on order items added to dictionary

Open danspielman opened this issue 4 years ago • 2 comments

The behavior seems to depend in an odd way on the order in which keys are added in to the dictionary. For example, I would expect that the following two dictionaries would be identical:

julia> d1 = Dictionary(["a","b"],[1;2])
2-element Dictionary{String,Int64}
 "a" │ 1
 "b" │ 2

julia> d2 = Dictionary(["b","a"],[1;2])
2-element Dictionary{String,Int64}
 "b" │ 1
 "a" │ 2

However, d1 .+ d2 gives an error:

julia> d1 .+ d2
ERROR: IndexError("Indices do not match")
Stacktrace:
 [1] _sharetokens at /Users/spielman/.julia/packages/Dictionaries/uSQZZ/src/broadcast.jl:68 [inlined]
 [2] BroadcastedDictionary(::Function, ::Tuple{Dictionary{String,Int64},Dictionary{String,Int64}}) at /Users/spielman/.julia/packages/Dictionaries/uSQZZ/src/broadcast.jl:12
 [3] broadcasted at /Users/spielman/.julia/packages/Dictionaries/uSQZZ/src/broadcast.jl:116 [inlined]
 [4] broadcasted(::Function, ::Dictionary{String,Int64}, ::Dictionary{String,Int64}) at ./broadcast.jl:1238
 [5] top-level scope at REPL[25]:1
 [6] eval(::Module, ::Any) at ./boot.jl:331
 [7] eval_user_input(::Any, ::REPL.REPLBackend) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
 [8] run_backend(::REPL.REPLBackend) at /Users/spielman/.julia/packages/Revise/kqqw8/src/Revise.jl:1163
 [9] top-level scope at none:0

Whereas, I do not get that error if I create the dictionary in the other order.

julia> d2 = Dictionary(["a","b"],[1;2])
2-element Dictionary{String,Int64}
 "a" │ 1
 "b" │ 2

julia> d1 .+ d2
2-element Dictionary{String,Int64}
 "a" │ 2
 "b" │ 4

danspielman avatar Jun 15 '20 19:06 danspielman

Yes, that's right - the order of the elements in a Dictionaries.jl dictionary is now an important feature. Take for example ==.

julia> d1 = Dictionary(["a", "b"], [1, 2])
2-element Dictionary{String,Int64}
 "a" │ 1
 "b" │ 2

julia> 

julia> d2 = Dictionary(["b", "a"], [2, 1])
2-element Dictionary{String,Int64}
 "b" │ 2
 "a" │ 1

julia> d1 == d2
false

julia> isdictequal(d1, d2)
true

If you take an example like map(+, d1, d2) then map semantically matches up inputs based on iteration order and returns an iterable with the same ordering. If you were to enforce strictly the preservation of iteration order in Dictionaries.jl then you might end up matching the values to key "a" in d1 and key "b" in d2, which is likely to be a logic error, and it wouldn't be clear what the output container should be, so we forbid it. An advantage of the strict ordering is that operations like map are much faster than you observe with Base.Dict, since the iterative approach requires no hash-based lookups (see the last section in the README).

Luckily you can fix your example easily by reindexing d2 to iterate in the same order as d1, making them more compatible:

julia> d3 = getindices(d2, keys(d1))
2-element Dictionary{String,Int64}
 "a" │ 1
 "b" │ 2

julia> d1 .+ d3
2-element Dictionary{String,Int64}
 "a" │ 2
 "b" │ 4

Equivalently you could use d3 = view(d2, keys(d1)) to avoid an in-memory copy of d2. In these examples d1 and d3 share the same index and co-iteration is really fast, and you perform 3x less hash-based lookups (just once per element of d2, instead of once for d1, once for d2 and once for the output, per element).

That said, while map is generally defined in Base to preserve iteration order, broadcast is all about matching indices up in a more flexible manner, and it might make sense to make d1 .+ d2 work in your example. But as stated, it may be slower, and it is not implemented yet.

I am also working on implementing things like permute! and sortperm and reverse so you can really work with the ordering of your dictionaries.

andyferris avatar Jun 15 '20 22:06 andyferris

By the way, if people using this package have opinions about this approach I would appreciate hearing them! :)

andyferris avatar Jun 15 '20 22:06 andyferris