DimensionalData.jl icon indicating copy to clipboard operation
DimensionalData.jl copied to clipboard

Utilities for managing deeply nested data?

Open kapple19 opened this issue 1 year ago • 9 comments

TL; DR

I'm looking to simplify code for managing nested DimArrays and DimStacks. E.g. feeding nested data to AlgebraOfGraphics. Maybe an add_dims function? I've prototyped one that I'm happy to PR:

function add_dims(dimarray, newdims...)
    alldims = (dims(dimarray)..., newdims...)
    return DimArray(
        reshape(
            dimarray |> parent,
            alldims .|> length
        ),
        alldims;
        name = dimarray.name,
        metadata = dimarray.metadata,
        refdims = dimarray.refdims
    )
end

Motivating Example

My work typically involves exploring nested data structures that are best not laid out as a long-format data table. A (non-obvious) example is visualised in the following plot:

image

where the titles are bearings [deg], the horizontal axis is sound speed [m/s], the vertical axis is depth [m] and the colour is range [m].

And this is still a simplified version of the type of data I manage probably half my days at work.

The data plotted above is generated by the following code:

##
using DimensionalData
using CairoMakie, AlgebraOfGraphics

using DimensionalData: dims

##
function munk_profile(z::Real; ϵ = 7.37e-3)
    z̃ = 2(z/1300 - 1)
    return 1500(
        1 + ϵ * (
            z̃ - 1 + exp(-z̃)
        )
    )
end

rand_munk_profile(z::Real) = munk_profile(z) + 5rand()

##
dimprofile(z, c) = DimArray(c, Dim{:z}(z); name = :c)
dimslice(r, zc) = DimArray(zc, Dim{:r}(r); name = :slice)
dimenvironment(θ, rzc) = DimArray(rzc, Dim{:θ}(θ); name = :environment)

rand_depths() = [0 : 10 : 50; 5e3rand(rand(20:30))] |> sort! |> unique!
rand_ranges() = [0; 10e3rand(rand(2:5))] |> sort! |> unique!

rand_dimprofile() = rand_depths() |> zs -> dimprofile(zs, rand_munk_profile.(zs))
rand_dimslice() = rand_ranges() |> rs -> dimslice(rs, [rand_dimprofile() for _ in rs])
rand_environment() = (45 |> step -> 0 : step : 360-step) |> θs -> dimenvironment(θs, [rand_dimslice() for _ in θs])

env = rand_environment()

I can show the nested structures of env by:

julia> dims(env)
(↓ θ Sampled{Int64} 0:45:315 ForwardOrdered Regular Points)

julia> dims(env[1])
(↓ r Sampled{Float64} [0.0, 3597.4471629977843, 5534.263563590665, 7028.512437622645] ForwardOrdered Irregular Points)

julia> dims(env[1][1])
(↓ z Sampled{Float64} [0.0, 10.0, …, 4977.663346971497, 4990.064674258991] ForwardOrdered Irregular Points)

Each bearing Dim{:θ} has a collection of ranges Dim{:r} whose values aren't shared between each θ. Likewise each Dim{:r} has a collection of depth-sound speed Dim{:z}-Dim{:c} pairs whose z values aren't shared between each r.

(In other words, vector data I guess 😅 but with structures resembling raster data, DimensionalData is the best I've found so far for managing such data.)

I've played around with ways to efficiently and flexibly visualise the above in different ways. My current preference involves the auxiliary function add_dims I've defined above, which behaves like reshape but for DimArrays.

The idea for add_dims is that when you access a DimArray inside another DimArray, that inner DimArray doesn't hold information on the Dim(s) you used to call it.

I'll define the compass bearing relationships:

lat_cols = ["N"; ""; "S"]
lon_rows = ["W" "" "E"]
compass_dirs = lat_cols .* lon_rows
compass_bearings = 0 : 45 : 360-45
compass_grid = Dict(
    θ => 2 .+ (
        -round(Int, cosd(θ), RoundFromZero),
        round(Int, sind(θ), RoundFromZero),
    )
    for θ in compass_bearings
)
compass_directions = Dict(
    θ => compass_dirs[compass_grid[θ]...]
    for θ in compass_bearings
)

and then plot like so:

draw(
    sum(
        add_dims(
            env[θ = At(θ)][r = At(r)],
            Dim{:θ}([θ]),
            Dim{:r}([r])
        ) |> data
        for θ in 45 |> step -> 0 : step : 360-step
        for r in dims(env[θ = At(θ)], :r) |> reverse
    )
    # * mapping(:c, :z, color = :r, layout = :θ => (θ -> compass_directions[θ]) => presorted)
    * mapping(:c, :z, color = :r, layout = :θ => nonnumeric)
    * visual(Lines),
    scales(;
        Color = (; colormap = Reverse(:blues)),
        Layout = (; palette = [compass_grid[θ] for θ in compass_bearings])
    );
    axis = (; yreversed = true)
)

Okay so, what's your question?

My question is, does anyone know any better ways to manage/handle this data, including interaction with AlgebraOfGraphics? The above is still only a simplified example of the type of data I handle about half by work days. Re AoG I'll be looking into pagination next as well, excited to see what it can do.

If you guys like add_dims I'm happy to PR it with docs and example of course.

I'm partially open to alternatives. I'd like to avoid pregrouped, it makes my soul hurt and in my opinion goes against the whole declarative syntax idea AoG was made for (but I still understand why pregrouped exists).

Are there other packages of storage structures that work better on this type of vector data but allows dimensional names and indexing, including grammar-of-graphics interoperability like what AoG provides?

kapple19 avatar Nov 15 '24 23:11 kapple19

This makes me think refdims should also be table columns (just the same value repeated), so you could be setting refdims instead of reshaping.

It's pretty much what they are meant for - tracking where in a larger array a slice comes from.

We could even add a DimNestedArray object that holds other DimArray/DimStack that updates refdims automatically on indexing?

rafaqz avatar Nov 16 '24 09:11 rafaqz

Oh I see. I'll check out refdims then.

This makes me think refdims should also be table columns

Are you saying it doesn't implement the Tables.jl interface so it won't work out-of-the-box with AoG? I'm about to check it out now and see what it can do.

We could even add a DimNestedArray object that holds other DimArray/DimStack that updates refdims automatically on indexing?

I see. From my superficial understanding, DimArray already has the features for nesting multiple layers, but you have an idea for a new object that would work better?

The automatic refdims update upon indexing would indeed simplify the syntax for passing to AoG.data.

I also found eachslice.

kapple19 avatar Nov 17 '24 09:11 kapple19

+1 for treating refdims as table columns. I have a completely unrelated usecase where this would also make my life a lot easier

tiemvanderdeure avatar Nov 25 '24 15:11 tiemvanderdeure

We could even add a DimNestedArray object that holds other DimArray/DimStack that updates refdims automatically on indexing?

One thing DimNestedArray would do differently/on top of normal DimArray/DimStack is that each nested layer doesn't have conflicting Dims?

This realisation of mine negates my above thought that:

From my superficial understanding, DimArray already has the features for nesting multiple layers

On top of your suggestion that accessing layers updates refdims.

kapple19 avatar Nov 27 '24 03:11 kapple19

Ok I've made a new issue to add refdims as tabel columns https://github.com/rafaqz/DimensionalData.jl/issues/884

rafaqz avatar Dec 13 '24 22:12 rafaqz

Would it be possible to handle your data as a DimTree? See #900

felixcremer avatar Mar 22 '25 06:03 felixcremer

I've tinkered around a little, and I'm not sure how to use it.

In my example above, for the definitions of dimprofile, dimslice, and dimenvironment, I wrap the DimArray in a DimTree?

Is there a way I can print a tree-diagram (a la AbstractTrees.jl or other) of the nested layers and their dimensions in a DimTree?

kapple19 avatar Mar 29 '25 00:03 kapple19

This probably wants https://github.com/rafaqz/DimensionalData.jl/issues/956 to enforce structure down tree branches, along with maybe some form of refdims support in the tree.

asinghvi17 avatar Mar 29 '25 02:03 asinghvi17

I guess the idea here is to be able to keep dims at different levels of a tree? Sort of like a less structured RasterSeries at each tree level that might have DimTrees or regular DimArrays as nodes?

So you have a tree where a path looks something like:

toplevel -> Theta(45) -> R(0) yields a DimArray with dimensions Z and C, or maybe they're in a MergedLookup

so that would be tree[Theta(At(45)), R(At(0))] for example

this would probably be a special kind of tree node, and we can use the dimtree infrastructure to propagate the queries...each such node would then have dimensions associated with it, and the "query object" ((Theta(At(45)), R(At(0)))) would descend down the tree while losing dimensions at each node on the tree.

Then this has a very easy hook to use tree based acceleration under a GiST (generalized search tree) like interface...

asinghvi17 avatar Apr 08 '25 01:04 asinghvi17

Hi! I started looking at this again, the DimTree interface is appealing.

The following code

##
using DimensionalData
using CairoMakie, AlgebraOfGraphics

using DimensionalData: dims

##
function munk_profile(z::Real; ϵ = 7.37e-3)
    z̃ = 2(z/1300 - 1)
    return 1500(
        1 + ϵ * (
            z̃ - 1 + exp(-z̃)
        )
    )
end

rand_munk_profile(z::Real) = munk_profile(z) + 5rand()

##
rand_depths() = [
    0 : 10 : 50;
    5e3rand(rand(20:30))
] |> sort! |> unique! |> Dim{:z}

c = DimArray(
    rand_munk_profile(z)
    for z in rand_depths();
    name = :c
)

DimTree(c)

triggered this error

┌ DimTree ┐
├─────────┴──────────────────────────────────────────────────────────────────── dims ┐
  ↓ z Sampled{Float64} [0.0, …, 4917.793599313537] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────────────────────────── layers ┤
  :c eltype: Float64 dims: z size: 27
ERROR: MethodError: no method matching isless(::Tuple{Int64, Int64}, ::Int64)
The function `isless` exists, but no method is defined for this combination of argument types.

It errors during printing the DimTree data.

I understand it's experimental right now, just giving a heads up.

kapple19 avatar Jun 18 '25 09:06 kapple19

Oh, the docstring example errors too.

julia> xdim, ydim = X(1:10), Y(1:15)
(↓ X 1:10,
→ Y 1:15)

julia> z1, z2 = Z([:a, :b, :c]), Z([:d, :e, :f])
(↓ Z [:a, …, :c],
→ Z [:d, …, :f])

julia> a = rand(xdim, ydim)
┌ 10×15 DimArray{Float64, 2} ┐
├────────────────────────────┴────────────────────── dims ┐
  ↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  → Y Sampled{Int64} 1:15 ForwardOrdered Regular Points
└─────────────────────────────────────────────────────────┘
  ↓ →  1         2         3         4          …  12          13         14          15
  1    0.431139  0.766344  0.680733  0.434004       0.972547    0.689369   0.644629    0.282081    
  2    0.494937  0.979805  0.800735  0.0370863      0.953885    0.441732   0.109628    0.75322     
  3    0.30938   0.827543  0.689148  0.988115       0.100356    0.941429   0.038943    0.742074    
  4    0.92676   0.584531  0.267452  0.92015        0.836046    0.564318   0.527189    0.273874    
  5    0.625766  0.232331  0.25695   0.0899371  …   0.884136    0.921556   0.850157    0.221585    
  6    0.914972  0.867678  0.367152  0.755524       0.600778    0.707885   0.0545637   0.78925     
  7    0.580514  0.159453  0.225751  0.803873       0.958556    0.307145   0.380439    0.846797    
  8    0.870401  0.372077  0.949768  0.247944       0.47449     0.902576   0.529998    0.331135    
  9    0.745896  0.297128  0.785453  0.563281       0.0464147   0.72461    0.730443    0.815666    
 10    0.947063  0.469179  0.43807   0.929097   …   0.27696     0.749187   0.227356    0.906712    

julia> b = rand(Float32, xdim, ydim)
┌ 10×15 DimArray{Float32, 2} ┐
├────────────────────────────┴────────────────────── dims ┐
  ↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  → Y Sampled{Int64} 1:15 ForwardOrdered Regular Points
└─────────────────────────────────────────────────────────┘
  ↓ →  1          2          3           …  12          13          14         15
  1    0.267717   0.745973   0.100425        0.49723     0.764769    0.874368   0.435644
  2    0.919661   0.613082   0.94317         0.843127    0.730652    0.959973   0.348006
  3    0.739104   0.878826   0.686476        0.46639     0.0133945   0.752119   0.317827
  4    0.328293   0.515415   0.00122887      0.0621238   0.984851    0.196984   0.447085
  5    0.356225   0.0125846  0.612853    …   0.619211    0.133317    0.321907   0.47704
  6    0.10013    0.0521478  0.346144        0.124377    0.864417    0.930086   0.750962
  7    0.0498002  0.62154    0.483711        0.402199    0.883385    0.38481    0.909172
  8    0.463682   0.817018   0.690233        0.126902    0.266677    0.21443    0.397974
  9    0.952143   0.580393   0.648435        0.637117    0.452452    0.834434   0.392239
 10    0.0254184  0.649666   0.516964    …   0.659007    0.401939    0.245411   0.155369

julia> c = rand(Int, xdim, ydim, z1)
┌ 10×15×3 DimArray{Int64, 3} ┐
├────────────────────────────┴────────────────────── dims ┐
  ↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  → Y Sampled{Int64} 1:15 ForwardOrdered Regular Points,
  ↗ Z Categorical{Symbol} [:a, …, :c] ForwardOrdered
└─────────────────────────────────────────────────────────┘
[:, :, 1]
  ↓ →                     1                     2  …                    14                    15
  1     2549365937309670703  -6352002122223040015      6342427699707729702  -2253747691512256973   
  2    -1197647964048946273  -1935810263962669460      8026367113688938701   1229693043013662783   
  3     4791067143135695674  -7111192877224573306     -7358256935777330162  -5852549565869906783   
  4     6302568973059507805   4962251561090407136      7606254116867676543   -837627562824835692   
  5    -2742276420471260526    420527978904158625  …   7496103867188108710   6668817884371272833   
  6    -7871085629967451081  -7064319582985812535     -3165997854501172729   9120277185986966208   
  7     5365222329554739944  -4268522555772210960     -8575651024653549601  -4762554439637736682   
  8    -2702968716692365849  -4037560465555650357      7654210709672564666   9054112777211902810   
  9     -421380135397797000   1169493608650937711      6106646718539258075   2985432485145621102   
 10     6837181553379967769   8739406315085957075  …  -3795614266173972674   5627263396102455935   

julia> d = rand(Int, xdim, z2)
┌ 10×3 DimArray{Int64, 2} ┐
├─────────────────────────┴───────────────────────── dims ┐
  ↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  → Z Categorical{Symbol} [:d, …, :f] ForwardOrdered
└─────────────────────────────────────────────────────────┘
  ↓ →                      :d                      :e                      :f
  1     7883012906896310549    -4344312286815458549     2429617596433190752
  2     7090703238402082320    -6781249200401307153    -5655021069320437425
  3    -3294876943945941717     6041020194356684468    -7406671700993365082
  4    -2857108328772281211     2840197897401601340     7962435146062545998
  5     4741665711403784749     -523215411855494075    -1958236821240171908
  6     7729262015907641464     4383881756925715368    -8548806962099150419
  7      718860843568791252     5954049029442234358     2343432942649020299
  8    -3462865642004764327      -24093399781691095    -4201374711055325926
  9    -4596837663219574717     4535514199529305803     3490418870694116477
 10    -5356313519293442383     7616664750720194257    -6362815497152645189

julia> DimTree(a, b)
┌ DimTree ┐
├─────────┴───────────────────────────────────────── dims ┐
  ↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  → Y Sampled{Int64} 1:15 ForwardOrdered Regular Points
├───────────────────────────────────────────────── layers ┤
  : eltype: Float32 dims: X, Y size: 10×15
Error showing value of type DimTree:

SYSTEM (REPL): showing an error caused an error
ERROR: MethodError: no method matching isless(::Tuple{Int64, Int64}, ::Int64)
The function `isless` exists, but no method is defined for this combination of argument types.

kapple19 avatar Jun 18 '25 09:06 kapple19

Feel free to fix the docstring ;)

rafaqz avatar Jun 19 '25 02:06 rafaqz

Also can you post the whole error output? (Always!!)

rafaqz avatar Jun 19 '25 02:06 rafaqz

My bad.

julia> DimTree(a, b)
┌ DimTree ┐
├─────────┴───────────────────────────────────────── dims ┐
  ↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
  → Y Sampled{Int64} 1:15 ForwardOrdered Regular Points
├───────────────────────────────────────────────── layers ┤
  : eltype: Float32 dims: X, Y size: 10×15
Error showing value of type DimTree:

SYSTEM (REPL): showing an error caused an error
ERROR: MethodError: no method matching isless(::Tuple{Int64, Int64}, ::Int64)
The function `isless` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  isless(::Missing, ::Any)
   @ Base missing.jl:87
  isless(::Any, ::Missing)
   @ Base missing.jl:88
  isless(::Tuple, ::Tuple{})
   @ Base tuple.jl:621
  ...

Stacktrace:
  [1] print_response(errio::IO, response::Any, backend::Union{…}, show_value::Bool, have_color::Bool, specialdisplay::Union{…})
    @ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:446
  [2] (::REPL.var"#70#71"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
    @ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:405
  [3] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
    @ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:678
  [4] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
    @ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:403
  [5] (::REPL.var"#do_respond#100"{…})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
    @ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:1035
  [6] (::VSCodeServer.var"#103#106"{REPL.var"#do_respond#100"{…}})(mi::REPL.LineEdit.MIState, buf::IOBuffer, ok::Bool)
    @ VSCodeServer c:\Users\T0185914\.vscode\extensions\julialang.language-julia-1.146.2\scripts\packages\VSCodeServer\src\repl.jl:122
  [7] #invokelatest#2
    @ .\essentials.jl:1055 [inlined]
  [8] invokelatest
    @ .\essentials.jl:1052 [inlined]
  [9] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
    @ REPL.LineEdit C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\LineEdit.jl:2755
 [10] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
    @ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:1506
 [11] (::REPL.var"#79#85"{REPL.LineEditREPL, REPL.REPLBackendRef})()
    @ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:497
Some type information was truncated. Use `show(err)` to see complete types.

I checked where it errors within DimensionalData and it's in src/stack/show.jl somewhere between right before the return of print_layers_block and its output of its first call in show_main. Annotated source code with # errors before here! and # errors after here!:

function show_main(io, mime, stack::AbstractDimStack)
    displaywidth = displaysize(io)[2]
    iobuf = IOBuffer()
    blockwidth, _ = print_layers_block(iobuf, mime, stack; blockwidth=0, displaywidth)

    # errors before here!

    lines, blockwidth, displaywidth, separatorwidth, istop = print_top(io, mime, stack; blockwidth, displaywidth)
    blockwidth, separatorwidth = print_layers_block(io, mime, stack; 
        blockwidth, displaywidth, separatorwidth, istop
    )
    _, blockwidth, istop = print_metadata_block(io, mime, metadata(stack); 
        blockwidth=min(displaywidth-2, blockwidth), displaywidth, separatorwidth, istop
    )
end
function print_layers_block(io, mime, stack; 
    blockwidth, displaywidth, separatorwidth=blockwidth, istop=false
)
    layers = DD.layers(stack)
    newblockwidth = blockwidth
    keylen = _keylen(Base.keys(layers))
    for key in keys(layers)
        mxbw = max(newblockwidth, length(sprint(print_layer, stack, key, keylen)))
        newblockwidth = min(displaywidth - 2, mxbw)
    end
    newblockwidth = print_block_separator(io, "layers", separatorwidth, newblockwidth; istop)
    println(io)
    for key in keys(layers)
        print_layer(io, stack, key, keylen)
    end

    # errors after here!

    return newblockwidth, newblockwidth
end

I don't fully understand the error, a REPL thing it seems.

kapple19 avatar Jun 20 '25 23:06 kapple19

With the latest DimensionalData at v0.29.21, changing my above convenience function definitions to

dimprofile(z, c) = DimArray(c, Dim{:z}(z); name = :c) |> DimTree
dimslice(r, zc) = DimArray(zc, Dim{:r}(r); name = :slice) |> DimTree

the DimTree display is fixed

julia> env = rand_environment()
┌ 8-element DimArray{DimTree, 1} environment ┐
├────────────────────────────────────────────┴────────── dims ┐
  ↓ θ Sampled{Int64} 0:45:315 ForwardOrdered Regular Points
└─────────────────────────────────────────────────────────────┘
   0  …  DimTree(
# output truncated... very long

And indexing is working.

env[θ = At(0)][:slice][r = At(0)][:c][z = At(0.0)]

I'm not sure how to work the tree[Theta(At(45)), R(At(0))] syntax suggested by @asinghvi17, but I'll figure it out.

I'm happy to close this issue given the above functionality. I'll be trying it out with AlgebraOfGraphics soon, and can open any issues if I find any.

Thanks so much! I fricken love this package, it's made my life so much easier. Keep up the great work!

kapple19 avatar Aug 15 '25 05:08 kapple19