DimensionalData.jl icon indicating copy to clipboard operation
DimensionalData.jl copied to clipboard

`Pair`s constructor for `DimArray`

Open kapple19 opened this issue 5 months ago • 7 comments

Would help if I could construct a DimArray with syntax that lets me see which input values are associated with which output values, without running the code.

FWIW, I've been thinking about a Dimensional generalisation of the following syntax.

julia> DimArray(Dim{:Name}, "A" => 1, "B" => 2)
┌ 2-element DimArray{Int64, 1} ┐
├──────────────────────────────┴─────────────────── dims ┐
  ↓ Name Categorical{String} ["A", "B"] ForwardOrdered
└────────────────────────────────────────────────────────┘
 "A"  1
 "B"  2

which I obtain with

function DimArray(
    dim::Type{<:Dimension},
    pairs::Pair{D, V}...;
    kw...
) where {D, V}
    DimArray(
        V[pair.second for pair in pairs],
        D[pair.first for pair in pairs] |> dim;
        kw...
    )
end

I'm thinking of something like

function DimArray(
    dims::Tuple{Vararg{N, <:Dimension}},
    pairs::Pair{NTuple{N, V}}...
) where {N, V}
    # something here I have yet to figure out
end

that gets called like

DimArray(
    (
        Dim{:Letter},
        Dim{:Integer}
    ),
    ("a", 1) => 21,
    ("a", 2) => 22,
    ("b", 1) => 23,
    ("b", 2) => 24;
    name = :Numbers
)

for an arbitrary number of dimensions.

I may open a pull request at some point in the future, just thought I'd gauge everyone's opinions on this syntax and functionality.

kapple19 avatar Aug 08 '25 04:08 kapple19

Could be nice for DimVector, but I'm not sure it will make sense for more dimensions?

rafaqz avatar Aug 08 '25 04:08 rafaqz

What wouldn't make sense for more dimensions? I had a gut feeling that might be the case, but I haven't investigated enough to figure it out.

For multidimensional data, I would end up coming up with a way of merging DimVectors into higher dimensions, so the automation of that into a DimArray constructor method with checking if I've missed values would be nice.

kapple19 avatar Aug 08 '25 06:08 kapple19

cat already mostly works like that - it will check that dimensions and their values match.

I think what youre suggesting is fundamentally one-dimensional, basically its mapping dictionaries to DimArrays. But I think thats cool, a DimVector is similar to a dictionary. We could allow any AbstractDictionary in the constructor, and AbstractVector{<:Pair}. They could all turn into Categorical lookups by default.

rafaqz avatar Aug 08 '25 08:08 rafaqz

Yeah fair enough. I guess I haven't figured out the cleanest way to use cat yet for all my use cases. It felt workaroundy to create a collection of DimArrays then concatenate them.

But behaving similar to a dictionary would be perfect, yes. I found myself writing wrappers that received a dictionary then converted the dictionary into a DimArray.

kapple19 avatar Aug 09 '25 02:08 kapple19

I missed information in the issue description. The methods that need to be defined:

function DimensionalData.DimArray(
    dim::Type{<:DimensionalData.Dimension},
    pairs::Pair{D, V}...;
    kw...
) where {D, V}
    DimArray(
        V[pair.second for pair in pairs],
        D[pair.first for pair in pairs] |> dim;
        kw...
    )
end

function DimensionalData.DimArray(
    dim::Type{<:DimensionalData.Dimension},
    pairs::Base.Generator;
    kw...
) where {}
    DimArray(dim, collect(pairs)...; kw...)
end

Do you want them as DimVector methods? Or are you fine with them as DimArray methods?

kapple19 avatar Aug 15 '25 05:08 kapple19

I guess both should work. And dim should probably be the second Arg? It could also be constructed but empty, not only a type.

Any AbstractDict or AbstracVector{Pair} could also be valid input.

rafaqz avatar Aug 15 '25 08:08 rafaqz

Well the first signature has a pairs splat. I guess it would be better if it was an array of Pairs for performance purposes.

And yeah, an explicit method for converting a Dict into a DimArray makes sense.

kapple19 avatar Aug 15 '25 10:08 kapple19