DataStructures.jl icon indicating copy to clipboard operation
DataStructures.jl copied to clipboard

Access underlying `Dict` of `DefaultDict`

Open goretkin opened this issue 3 years ago • 4 comments

I think it would be useful to have a documented way to access the underlying Dict of a DefaultDict. The following seems to rely on internals:

julia> dd
DefaultDict{Any,Array{Int64,1},DataType} with 3 entries:
  0 => [3, 6, 9]
  2 => [2, 5, 8]
  1 => [1, 4, 7, 10]

julia> dd.d.d
Dict{Any,Array{Int64,1}} with 3 entries:
  0 => [3, 6, 9]
  2 => [2, 5, 8]
  1 => [1, 4, 7, 10]

goretkin avatar Nov 19 '20 18:11 goretkin

This is probably a valid thing for convert(Dict, ::DefaultDict) to do following the logic of https://docs.julialang.org/en/v1/manual/conversion-and-promotion/#Mutable-collections

Though I am still not entirely sure of what this is useful for.

oxinabox avatar Nov 19 '20 21:11 oxinabox

Let me know what you think about this. I often use DefaultDict because the behavior is convenient in a very localized place, and I do not want that behavior elsewhere. For example:

using DataStructures: DefaultDict
"""
    Generalize `filter, like `DataFrames.groupby`

# Examples
\```jldoctest
julia> filter_key(k -> k % 3, 1:10)
Dict{Any,Array{Int64,1}} with 3 entries:
  0 => [3, 6, 9]
  2 => [2, 5, 8]
  1 => [1, 4, 7, 10]

julia> filter_key(iseven, 1:10)
  Dict{Any,Array{Int64,1}} with 2 entries:
    false => [1, 3, 5, 7, 9]
    true  => [2, 4, 6, 8, 10]
\```
"""
function filter_key(key, itr)
    T = eltype(itr)
    out = DefaultDict{Any, Vector{T}}(Vector{T})
    for x in itr
        push!(out[key(x)], x)
    end
    return out.d.d # TODO https://github.com/JuliaCollections/DataStructures.jl/issues/705
end

It would probably be more correct (more defensive) to return something immutable (if not keys and values, then at least the Dict itself) in that case, but barring that, at least I can be defensive by returning a Dict. Alternatively, the fact that I use a DefaultDict for convenience is merely an implementation detail of filter_key, and even though it otherwise follows the AbstractDict interface, it still never throws a KeyError.

goretkin avatar Nov 19 '20 22:11 goretkin

I often use DefaultDict because the behavior is convenient in a very localized place, and I do not want that behavior elsewhere

I tend to use a plain Dict and get/get! (as in: get!(dict, key, default) and get(()->default, dict, key)) in those circumstances

function filter_key(key, itr)
    T = eltype(itr)
    out = Dict{Any, Vector{T}}()
    for x in itr
        col = get!(()->Vector{T}(), out, x)
        push!(col, x)
    end
    return out
end

oxinabox avatar Nov 19 '20 22:11 oxinabox

If I'm understanding correctly, that idea can be used to obviate the need for DefaultDict altogether if you're willing to use use get! in place of getindex. Said differently, I see the entire point of DefaultDict to be to delegate all methods transparently, except for that very one transformation you just described.

You bring up an excellent alternative, in any case.

goretkin avatar Nov 19 '20 22:11 goretkin