Dictionaries.jl
Dictionaries.jl copied to clipboard
Behavior seems to depend on order items added to dictionary
The behavior seems to depend in an odd way on the order in which keys are added in to the dictionary. For example, I would expect that the following two dictionaries would be identical:
julia> d1 = Dictionary(["a","b"],[1;2])
2-element Dictionary{String,Int64}
"a" │ 1
"b" │ 2
julia> d2 = Dictionary(["b","a"],[1;2])
2-element Dictionary{String,Int64}
"b" │ 1
"a" │ 2
However, d1 .+ d2 gives an error:
julia> d1 .+ d2
ERROR: IndexError("Indices do not match")
Stacktrace:
[1] _sharetokens at /Users/spielman/.julia/packages/Dictionaries/uSQZZ/src/broadcast.jl:68 [inlined]
[2] BroadcastedDictionary(::Function, ::Tuple{Dictionary{String,Int64},Dictionary{String,Int64}}) at /Users/spielman/.julia/packages/Dictionaries/uSQZZ/src/broadcast.jl:12
[3] broadcasted at /Users/spielman/.julia/packages/Dictionaries/uSQZZ/src/broadcast.jl:116 [inlined]
[4] broadcasted(::Function, ::Dictionary{String,Int64}, ::Dictionary{String,Int64}) at ./broadcast.jl:1238
[5] top-level scope at REPL[25]:1
[6] eval(::Module, ::Any) at ./boot.jl:331
[7] eval_user_input(::Any, ::REPL.REPLBackend) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
[8] run_backend(::REPL.REPLBackend) at /Users/spielman/.julia/packages/Revise/kqqw8/src/Revise.jl:1163
[9] top-level scope at none:0
Whereas, I do not get that error if I create the dictionary in the other order.
julia> d2 = Dictionary(["a","b"],[1;2])
2-element Dictionary{String,Int64}
"a" │ 1
"b" │ 2
julia> d1 .+ d2
2-element Dictionary{String,Int64}
"a" │ 2
"b" │ 4
Yes, that's right - the order of the elements in a Dictionaries.jl dictionary is now an important feature. Take for example ==
.
julia> d1 = Dictionary(["a", "b"], [1, 2])
2-element Dictionary{String,Int64}
"a" │ 1
"b" │ 2
julia>
julia> d2 = Dictionary(["b", "a"], [2, 1])
2-element Dictionary{String,Int64}
"b" │ 2
"a" │ 1
julia> d1 == d2
false
julia> isdictequal(d1, d2)
true
If you take an example like map(+, d1, d2)
then map
semantically matches up inputs based on iteration order and returns an iterable with the same ordering. If you were to enforce strictly the preservation of iteration order in Dictionaries.jl then you might end up matching the values to key "a"
in d1
and key "b"
in d2
, which is likely to be a logic error, and it wouldn't be clear what the output container should be, so we forbid it. An advantage of the strict ordering is that operations like map
are much faster than you observe with Base.Dict
, since the iterative approach requires no hash-based lookups (see the last section in the README).
Luckily you can fix your example easily by reindexing d2
to iterate in the same order as d1
, making them more compatible:
julia> d3 = getindices(d2, keys(d1))
2-element Dictionary{String,Int64}
"a" │ 1
"b" │ 2
julia> d1 .+ d3
2-element Dictionary{String,Int64}
"a" │ 2
"b" │ 4
Equivalently you could use d3 = view(d2, keys(d1))
to avoid an in-memory copy of d2
. In these examples d1
and d3
share the same index and co-iteration is really fast, and you perform 3x less hash-based lookups (just once per element of d2
, instead of once for d1
, once for d2
and once for the output, per element).
That said, while map
is generally defined in Base
to preserve iteration order, broadcast
is all about matching indices up in a more flexible manner, and it might make sense to make d1 .+ d2
work in your example. But as stated, it may be slower, and it is not implemented yet.
I am also working on implementing things like permute!
and sortperm
and reverse
so you can really work with the ordering of your dictionaries.
By the way, if people using this package have opinions about this approach I would appreciate hearing them! :)