DimensionalData.jl
DimensionalData.jl copied to clipboard
Utilities for managing deeply nested data?
TL; DR
I'm looking to simplify code for managing nested DimArrays and DimStacks.
E.g. feeding nested data to AlgebraOfGraphics.
Maybe an add_dims function? I've prototyped one that I'm happy to PR:
function add_dims(dimarray, newdims...)
alldims = (dims(dimarray)..., newdims...)
return DimArray(
reshape(
dimarray |> parent,
alldims .|> length
),
alldims;
name = dimarray.name,
metadata = dimarray.metadata,
refdims = dimarray.refdims
)
end
Motivating Example
My work typically involves exploring nested data structures that are best not laid out as a long-format data table. A (non-obvious) example is visualised in the following plot:
where the titles are bearings [deg], the horizontal axis is sound speed [m/s], the vertical axis is depth [m] and the colour is range [m].
And this is still a simplified version of the type of data I manage probably half my days at work.
The data plotted above is generated by the following code:
##
using DimensionalData
using CairoMakie, AlgebraOfGraphics
using DimensionalData: dims
##
function munk_profile(z::Real; ϵ = 7.37e-3)
z̃ = 2(z/1300 - 1)
return 1500(
1 + ϵ * (
z̃ - 1 + exp(-z̃)
)
)
end
rand_munk_profile(z::Real) = munk_profile(z) + 5rand()
##
dimprofile(z, c) = DimArray(c, Dim{:z}(z); name = :c)
dimslice(r, zc) = DimArray(zc, Dim{:r}(r); name = :slice)
dimenvironment(θ, rzc) = DimArray(rzc, Dim{:θ}(θ); name = :environment)
rand_depths() = [0 : 10 : 50; 5e3rand(rand(20:30))] |> sort! |> unique!
rand_ranges() = [0; 10e3rand(rand(2:5))] |> sort! |> unique!
rand_dimprofile() = rand_depths() |> zs -> dimprofile(zs, rand_munk_profile.(zs))
rand_dimslice() = rand_ranges() |> rs -> dimslice(rs, [rand_dimprofile() for _ in rs])
rand_environment() = (45 |> step -> 0 : step : 360-step) |> θs -> dimenvironment(θs, [rand_dimslice() for _ in θs])
env = rand_environment()
I can show the nested structures of env by:
julia> dims(env)
(↓ θ Sampled{Int64} 0:45:315 ForwardOrdered Regular Points)
julia> dims(env[1])
(↓ r Sampled{Float64} [0.0, 3597.4471629977843, 5534.263563590665, 7028.512437622645] ForwardOrdered Irregular Points)
julia> dims(env[1][1])
(↓ z Sampled{Float64} [0.0, 10.0, …, 4977.663346971497, 4990.064674258991] ForwardOrdered Irregular Points)
Each bearing Dim{:θ} has a collection of ranges Dim{:r} whose values aren't shared between each θ.
Likewise each Dim{:r} has a collection of depth-sound speed Dim{:z}-Dim{:c} pairs whose z values aren't shared between each r.
(In other words, vector data I guess 😅 but with structures resembling raster data, DimensionalData is the best I've found so far for managing such data.)
I've played around with ways to efficiently and flexibly visualise the above in different ways.
My current preference involves the auxiliary function add_dims I've defined above, which behaves like reshape but for DimArrays.
The idea for add_dims is that when you access a DimArray inside another DimArray, that inner DimArray doesn't hold information on the Dim(s) you used to call it.
I'll define the compass bearing relationships:
lat_cols = ["N"; ""; "S"]
lon_rows = ["W" "" "E"]
compass_dirs = lat_cols .* lon_rows
compass_bearings = 0 : 45 : 360-45
compass_grid = Dict(
θ => 2 .+ (
-round(Int, cosd(θ), RoundFromZero),
round(Int, sind(θ), RoundFromZero),
)
for θ in compass_bearings
)
compass_directions = Dict(
θ => compass_dirs[compass_grid[θ]...]
for θ in compass_bearings
)
and then plot like so:
draw(
sum(
add_dims(
env[θ = At(θ)][r = At(r)],
Dim{:θ}([θ]),
Dim{:r}([r])
) |> data
for θ in 45 |> step -> 0 : step : 360-step
for r in dims(env[θ = At(θ)], :r) |> reverse
)
# * mapping(:c, :z, color = :r, layout = :θ => (θ -> compass_directions[θ]) => presorted)
* mapping(:c, :z, color = :r, layout = :θ => nonnumeric)
* visual(Lines),
scales(;
Color = (; colormap = Reverse(:blues)),
Layout = (; palette = [compass_grid[θ] for θ in compass_bearings])
);
axis = (; yreversed = true)
)
Okay so, what's your question?
My question is, does anyone know any better ways to manage/handle this data, including interaction with AlgebraOfGraphics?
The above is still only a simplified example of the type of data I handle about half by work days.
Re AoG I'll be looking into pagination next as well, excited to see what it can do.
If you guys like add_dims I'm happy to PR it with docs and example of course.
I'm partially open to alternatives. I'd like to avoid pregrouped, it makes my soul hurt and in my opinion goes against the whole declarative syntax idea AoG was made for (but I still understand why pregrouped exists).
Are there other packages of storage structures that work better on this type of vector data but allows dimensional names and indexing, including grammar-of-graphics interoperability like what AoG provides?
This makes me think refdims should also be table columns (just the same value repeated), so you could be setting refdims instead of reshaping.
It's pretty much what they are meant for - tracking where in a larger array a slice comes from.
We could even add a DimNestedArray object that holds other DimArray/DimStack that updates refdims automatically on indexing?
Oh I see. I'll check out refdims then.
This makes me think
refdimsshould also be table columns
Are you saying it doesn't implement the Tables.jl interface so it won't work out-of-the-box with AoG? I'm about to check it out now and see what it can do.
We could even add a DimNestedArray object that holds other DimArray/DimStack that updates
refdimsautomatically on indexing?
I see. From my superficial understanding, DimArray already has the features for nesting multiple layers, but you have an idea for a new object that would work better?
The automatic refdims update upon indexing would indeed simplify the syntax for passing to AoG.data.
I also found eachslice.
+1 for treating refdims as table columns. I have a completely unrelated usecase where this would also make my life a lot easier
We could even add a DimNestedArray object that holds other DimArray/DimStack that updates
refdimsautomatically on indexing?
One thing DimNestedArray would do differently/on top of normal DimArray/DimStack is that each nested layer doesn't have conflicting Dims?
This realisation of mine negates my above thought that:
From my superficial understanding,
DimArrayalready has the features for nesting multiple layers
On top of your suggestion that accessing layers updates refdims.
Ok I've made a new issue to add refdims as tabel columns https://github.com/rafaqz/DimensionalData.jl/issues/884
Would it be possible to handle your data as a DimTree? See #900
I've tinkered around a little, and I'm not sure how to use it.
In my example above, for the definitions of dimprofile, dimslice, and dimenvironment, I wrap the DimArray in a DimTree?
Is there a way I can print a tree-diagram (a la AbstractTrees.jl or other) of the nested layers and their dimensions in a DimTree?
This probably wants https://github.com/rafaqz/DimensionalData.jl/issues/956 to enforce structure down tree branches, along with maybe some form of refdims support in the tree.
I guess the idea here is to be able to keep dims at different levels of a tree? Sort of like a less structured RasterSeries at each tree level that might have DimTrees or regular DimArrays as nodes?
So you have a tree where a path looks something like:
toplevel -> Theta(45) -> R(0) yields a DimArray with dimensions Z and C, or maybe they're in a MergedLookup
so that would be tree[Theta(At(45)), R(At(0))] for example
this would probably be a special kind of tree node, and we can use the dimtree infrastructure to propagate the queries...each such node would then have dimensions associated with it, and the "query object" ((Theta(At(45)), R(At(0)))) would descend down the tree while losing dimensions at each node on the tree.
Then this has a very easy hook to use tree based acceleration under a GiST (generalized search tree) like interface...
Hi! I started looking at this again, the DimTree interface is appealing.
The following code
##
using DimensionalData
using CairoMakie, AlgebraOfGraphics
using DimensionalData: dims
##
function munk_profile(z::Real; ϵ = 7.37e-3)
z̃ = 2(z/1300 - 1)
return 1500(
1 + ϵ * (
z̃ - 1 + exp(-z̃)
)
)
end
rand_munk_profile(z::Real) = munk_profile(z) + 5rand()
##
rand_depths() = [
0 : 10 : 50;
5e3rand(rand(20:30))
] |> sort! |> unique! |> Dim{:z}
c = DimArray(
rand_munk_profile(z)
for z in rand_depths();
name = :c
)
DimTree(c)
triggered this error
┌ DimTree ┐
├─────────┴──────────────────────────────────────────────────────────────────── dims ┐
↓ z Sampled{Float64} [0.0, …, 4917.793599313537] ForwardOrdered Irregular Points
├──────────────────────────────────────────────────────────────────────────── layers ┤
:c eltype: Float64 dims: z size: 27
ERROR: MethodError: no method matching isless(::Tuple{Int64, Int64}, ::Int64)
The function `isless` exists, but no method is defined for this combination of argument types.
It errors during printing the DimTree data.
I understand it's experimental right now, just giving a heads up.
Oh, the docstring example errors too.
julia> xdim, ydim = X(1:10), Y(1:15)
(↓ X 1:10,
→ Y 1:15)
julia> z1, z2 = Z([:a, :b, :c]), Z([:d, :e, :f])
(↓ Z [:a, …, :c],
→ Z [:d, …, :f])
julia> a = rand(xdim, ydim)
┌ 10×15 DimArray{Float64, 2} ┐
├────────────────────────────┴────────────────────── dims ┐
↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:15 ForwardOrdered Regular Points
└─────────────────────────────────────────────────────────┘
↓ → 1 2 3 4 … 12 13 14 15
1 0.431139 0.766344 0.680733 0.434004 0.972547 0.689369 0.644629 0.282081
2 0.494937 0.979805 0.800735 0.0370863 0.953885 0.441732 0.109628 0.75322
3 0.30938 0.827543 0.689148 0.988115 0.100356 0.941429 0.038943 0.742074
4 0.92676 0.584531 0.267452 0.92015 0.836046 0.564318 0.527189 0.273874
5 0.625766 0.232331 0.25695 0.0899371 … 0.884136 0.921556 0.850157 0.221585
6 0.914972 0.867678 0.367152 0.755524 0.600778 0.707885 0.0545637 0.78925
7 0.580514 0.159453 0.225751 0.803873 0.958556 0.307145 0.380439 0.846797
8 0.870401 0.372077 0.949768 0.247944 0.47449 0.902576 0.529998 0.331135
9 0.745896 0.297128 0.785453 0.563281 0.0464147 0.72461 0.730443 0.815666
10 0.947063 0.469179 0.43807 0.929097 … 0.27696 0.749187 0.227356 0.906712
julia> b = rand(Float32, xdim, ydim)
┌ 10×15 DimArray{Float32, 2} ┐
├────────────────────────────┴────────────────────── dims ┐
↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:15 ForwardOrdered Regular Points
└─────────────────────────────────────────────────────────┘
↓ → 1 2 3 … 12 13 14 15
1 0.267717 0.745973 0.100425 0.49723 0.764769 0.874368 0.435644
2 0.919661 0.613082 0.94317 0.843127 0.730652 0.959973 0.348006
3 0.739104 0.878826 0.686476 0.46639 0.0133945 0.752119 0.317827
4 0.328293 0.515415 0.00122887 0.0621238 0.984851 0.196984 0.447085
5 0.356225 0.0125846 0.612853 … 0.619211 0.133317 0.321907 0.47704
6 0.10013 0.0521478 0.346144 0.124377 0.864417 0.930086 0.750962
7 0.0498002 0.62154 0.483711 0.402199 0.883385 0.38481 0.909172
8 0.463682 0.817018 0.690233 0.126902 0.266677 0.21443 0.397974
9 0.952143 0.580393 0.648435 0.637117 0.452452 0.834434 0.392239
10 0.0254184 0.649666 0.516964 … 0.659007 0.401939 0.245411 0.155369
julia> c = rand(Int, xdim, ydim, z1)
┌ 10×15×3 DimArray{Int64, 3} ┐
├────────────────────────────┴────────────────────── dims ┐
↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:15 ForwardOrdered Regular Points,
↗ Z Categorical{Symbol} [:a, …, :c] ForwardOrdered
└─────────────────────────────────────────────────────────┘
[:, :, 1]
↓ → 1 2 … 14 15
1 2549365937309670703 -6352002122223040015 6342427699707729702 -2253747691512256973
2 -1197647964048946273 -1935810263962669460 8026367113688938701 1229693043013662783
3 4791067143135695674 -7111192877224573306 -7358256935777330162 -5852549565869906783
4 6302568973059507805 4962251561090407136 7606254116867676543 -837627562824835692
5 -2742276420471260526 420527978904158625 … 7496103867188108710 6668817884371272833
6 -7871085629967451081 -7064319582985812535 -3165997854501172729 9120277185986966208
7 5365222329554739944 -4268522555772210960 -8575651024653549601 -4762554439637736682
8 -2702968716692365849 -4037560465555650357 7654210709672564666 9054112777211902810
9 -421380135397797000 1169493608650937711 6106646718539258075 2985432485145621102
10 6837181553379967769 8739406315085957075 … -3795614266173972674 5627263396102455935
julia> d = rand(Int, xdim, z2)
┌ 10×3 DimArray{Int64, 2} ┐
├─────────────────────────┴───────────────────────── dims ┐
↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
→ Z Categorical{Symbol} [:d, …, :f] ForwardOrdered
└─────────────────────────────────────────────────────────┘
↓ → :d :e :f
1 7883012906896310549 -4344312286815458549 2429617596433190752
2 7090703238402082320 -6781249200401307153 -5655021069320437425
3 -3294876943945941717 6041020194356684468 -7406671700993365082
4 -2857108328772281211 2840197897401601340 7962435146062545998
5 4741665711403784749 -523215411855494075 -1958236821240171908
6 7729262015907641464 4383881756925715368 -8548806962099150419
7 718860843568791252 5954049029442234358 2343432942649020299
8 -3462865642004764327 -24093399781691095 -4201374711055325926
9 -4596837663219574717 4535514199529305803 3490418870694116477
10 -5356313519293442383 7616664750720194257 -6362815497152645189
julia> DimTree(a, b)
┌ DimTree ┐
├─────────┴───────────────────────────────────────── dims ┐
↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:15 ForwardOrdered Regular Points
├───────────────────────────────────────────────── layers ┤
: eltype: Float32 dims: X, Y size: 10×15
Error showing value of type DimTree:
SYSTEM (REPL): showing an error caused an error
ERROR: MethodError: no method matching isless(::Tuple{Int64, Int64}, ::Int64)
The function `isless` exists, but no method is defined for this combination of argument types.
Feel free to fix the docstring ;)
Also can you post the whole error output? (Always!!)
My bad.
julia> DimTree(a, b)
┌ DimTree ┐
├─────────┴───────────────────────────────────────── dims ┐
↓ X Sampled{Int64} 1:10 ForwardOrdered Regular Points,
→ Y Sampled{Int64} 1:15 ForwardOrdered Regular Points
├───────────────────────────────────────────────── layers ┤
: eltype: Float32 dims: X, Y size: 10×15
Error showing value of type DimTree:
SYSTEM (REPL): showing an error caused an error
ERROR: MethodError: no method matching isless(::Tuple{Int64, Int64}, ::Int64)
The function `isless` exists, but no method is defined for this combination of argument types.
Closest candidates are:
isless(::Missing, ::Any)
@ Base missing.jl:87
isless(::Any, ::Missing)
@ Base missing.jl:88
isless(::Tuple, ::Tuple{})
@ Base tuple.jl:621
...
Stacktrace:
[1] print_response(errio::IO, response::Any, backend::Union{…}, show_value::Bool, have_color::Bool, specialdisplay::Union{…})
@ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:446
[2] (::REPL.var"#70#71"{REPL.LineEditREPL, Pair{Any, Bool}, Bool, Bool})(io::Any)
@ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:405
[3] with_repl_linfo(f::Any, repl::REPL.LineEditREPL)
@ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:678
[4] print_response(repl::REPL.AbstractREPL, response::Any, show_value::Bool, have_color::Bool)
@ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:403
[5] (::REPL.var"#do_respond#100"{…})(s::REPL.LineEdit.MIState, buf::Any, ok::Bool)
@ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:1035
[6] (::VSCodeServer.var"#103#106"{REPL.var"#do_respond#100"{…}})(mi::REPL.LineEdit.MIState, buf::IOBuffer, ok::Bool)
@ VSCodeServer c:\Users\T0185914\.vscode\extensions\julialang.language-julia-1.146.2\scripts\packages\VSCodeServer\src\repl.jl:122
[7] #invokelatest#2
@ .\essentials.jl:1055 [inlined]
[8] invokelatest
@ .\essentials.jl:1052 [inlined]
[9] run_interface(terminal::REPL.Terminals.TextTerminal, m::REPL.LineEdit.ModalInterface, s::REPL.LineEdit.MIState)
@ REPL.LineEdit C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\LineEdit.jl:2755
[10] run_frontend(repl::REPL.LineEditREPL, backend::REPL.REPLBackendRef)
@ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:1506
[11] (::REPL.var"#79#85"{REPL.LineEditREPL, REPL.REPLBackendRef})()
@ REPL C:\Users\T0185914\.julia\juliaup\julia-1.11.5+0.x64.w64.mingw32\share\julia\stdlib\v1.11\REPL\src\REPL.jl:497
Some type information was truncated. Use `show(err)` to see complete types.
I checked where it errors within DimensionalData and it's in src/stack/show.jl somewhere between right before the return of print_layers_block and its output of its first call in show_main. Annotated source code with # errors before here! and # errors after here!:
function show_main(io, mime, stack::AbstractDimStack)
displaywidth = displaysize(io)[2]
iobuf = IOBuffer()
blockwidth, _ = print_layers_block(iobuf, mime, stack; blockwidth=0, displaywidth)
# errors before here!
lines, blockwidth, displaywidth, separatorwidth, istop = print_top(io, mime, stack; blockwidth, displaywidth)
blockwidth, separatorwidth = print_layers_block(io, mime, stack;
blockwidth, displaywidth, separatorwidth, istop
)
_, blockwidth, istop = print_metadata_block(io, mime, metadata(stack);
blockwidth=min(displaywidth-2, blockwidth), displaywidth, separatorwidth, istop
)
end
function print_layers_block(io, mime, stack;
blockwidth, displaywidth, separatorwidth=blockwidth, istop=false
)
layers = DD.layers(stack)
newblockwidth = blockwidth
keylen = _keylen(Base.keys(layers))
for key in keys(layers)
mxbw = max(newblockwidth, length(sprint(print_layer, stack, key, keylen)))
newblockwidth = min(displaywidth - 2, mxbw)
end
newblockwidth = print_block_separator(io, "layers", separatorwidth, newblockwidth; istop)
println(io)
for key in keys(layers)
print_layer(io, stack, key, keylen)
end
# errors after here!
return newblockwidth, newblockwidth
end
I don't fully understand the error, a REPL thing it seems.
With the latest DimensionalData at v0.29.21,
changing my above convenience function definitions to
dimprofile(z, c) = DimArray(c, Dim{:z}(z); name = :c) |> DimTree
dimslice(r, zc) = DimArray(zc, Dim{:r}(r); name = :slice) |> DimTree
the DimTree display is fixed
julia> env = rand_environment()
┌ 8-element DimArray{DimTree, 1} environment ┐
├────────────────────────────────────────────┴────────── dims ┐
↓ θ Sampled{Int64} 0:45:315 ForwardOrdered Regular Points
└─────────────────────────────────────────────────────────────┘
0 … DimTree(
# output truncated... very long
And indexing is working.
env[θ = At(0)][:slice][r = At(0)][:c][z = At(0.0)]
I'm not sure how to work the tree[Theta(At(45)), R(At(0))] syntax suggested by @asinghvi17, but I'll figure it out.
I'm happy to close this issue given the above functionality.
I'll be trying it out with AlgebraOfGraphics soon, and can open any issues if I find any.
Thanks so much! I fricken love this package, it's made my life so much easier. Keep up the great work!