Dictionaries.jl icon indicating copy to clipboard operation
Dictionaries.jl copied to clipboard

Go back to a base `Dict` or some kind of table interface

Open Moelf opened this issue 4 years ago • 7 comments

I'm a happy user of your package, in our line of work we process many many independent files to make a summary histograms or extract parts of the data. In the end I simply do a reduce((x,y) -> append!.(x,y), results) to collect the results together without manually tracking the order of things.

However, it's rather difficult if I want to put them into a table or anything because Dictionary doesn't conform with Table interface (, expected), but also can't go back to Dict:

10-element Dictionaries.Dictionary{Symbol, Vector{Float32}}
  :lep1_pt │ Float32[33638.438, 92686.78, 38112.855, 110358.19, 164663.92, 9687…
 :lep1_eta │ Float32[0.45212966, -0.83190763, -1.0084606, 0.20597617, 1.0637895…
 :lep1_phi │ Float32[-1.4313298, -1.067748, -1.569317, 0.66097116, 2.2409034, -…
 :lep1_pid │ Float32[13.0, -11.0, 13.0, -11.0, 13.0, 13.0, -11.0, 11.0, -13.0, …
  :lep2_pt │ Float32[26518.098, 51955.1, 33665.395, 67624.08, 78728.49, 75583.3…
 :lep2_eta │ Float32[-2.23238, -0.5876625, -2.065182, -1.0818828, 2.0865402, 0.…
 :lep2_phi │ Float32[-2.4220688, 2.4145992, 0.50862205, 2.3646975, 0.4607052, 2…
 :lep2_pid │ Float32[-13.0, 11.0, 11.0, -13.0, 13.0, -13.0, -13.0, -13.0, 13.0,…
      :MET │ Float32[64662.047, 39246.2, 90002.63, 121876.12, 41813.074, 125130…
  :mass_4l │ Float32[154441.69, 289828.94, 148317.48, 248640.03, 339248.1, 2372…

julia> Dict(SIGhist)
Dict{Float32, Float32} with 10 entries:
  13.0      => -11.0
  -2.23238  => -0.587663
  64662.0   => 39246.2
  33638.4   => 92686.8
  26518.1   => 51955.1
  0.45213   => -0.831908
  -13.0     => 11.0
  1.54442f5 => 2.89829f5
  -1.43133  => -1.06775
  -2.42207  => 2.4146

What's the recommended workflow?

Moelf avatar Aug 03 '21 13:08 Moelf

I guess this could work:

Arrow.write("./blah.arrow", Dict(data.indices.values .=> data.values))

Moelf avatar Aug 03 '21 13:08 Moelf

Hi @Moelf,

The Dict constructor expects to get an iterable of Pairs - or other iterable things where the first element is the key and the second is the value (which explains your strange result).

To go from a Dictionary to a Dict use the pairs function, like Dict(pairs(dictionary)).

Does that help? Perhaps this should be prominently documented...

andyferris avatar Aug 03 '21 22:08 andyferris

Also, we should probably think about the Tables.jl interface at some point...

andyferris avatar Aug 03 '21 23:08 andyferris

thanks, the pairs makes sense and probably should have been specialized by Dictionaries.jl since that's the only sensible outcome I think.

Moelf avatar Aug 04 '21 06:08 Moelf

Unfortunately a specialisation to insert pairs would break Dict(copy(pairs(dictionary))) where you’d expect the copy to have no effect on the output.

It’s also hard to add methods for all AbstractDict, for example.

andyferris avatar Aug 04 '21 07:08 andyferris

Not sure I understand:

julia> d = Dictionary([1,2,3], [4,5,6])
3-element Dictionary{Int64, Int64}
 1 │ 4
 2 │ 5
 3 │ 6

julia> copy(pairs(d))
3-element Dictionary{Int64, Pair{Int64, Int64}}
 1 │ 1 => 4
 2 │ 2 => 5
 3 │ 3 => 6

this is the current behavior, I propose adding:

julia> Base.Dict(D::Dictionary) = Dict(pairs(D))

julia> Dict(d)
Dict{Int64, Int64} with 3 entries:
  2 => 5
  3 => 6
  1 => 4

julia> Dict(pairs(d))
Dict{Int64, Int64} with 3 entries:
  2 => 5
  3 => 6
  1 => 4

julia> copy(pairs(d))
3-element Dictionary{Int64, Pair{Int64, Int64}}
 1 │ 1 => 4
 2 │ 2 => 5
 3 │ 3 => 6

I don't see why adding Dict() would break anything.

Edit: Oh, in the case of Dict(copy(pairs(d))), it means we should have specialized copy too then.

Moelf avatar Aug 04 '21 07:08 Moelf

Oh, in the case of Dict(copy(pairs(d))), it means we should have specialized copy too then.

Yes. But we can't specialize a Dict constructor on this - all it sees is a Dictionary. Similarly as you can do Dict(zip(keys, values)), you can also do things like Dict(Dictionary(keys, zip(keys, values))) and expect it to work the same. If we had Base.Dict(D::Dictionary) = Dict(pairs(D)) this would be broken :(.

There's also the fact that though while we might theoretically try to specialize (::Type{<:AbstractDict})(::AbstractDictionary), in practice this will lead to problems with ambiguity errors. Even if we patch those up for Base they will reappear again on using OrderedCollections or using DataStructures.

At the end of the day the only clean choice is to let users write pairs as necessary.

andyferris avatar Aug 05 '21 05:08 andyferris