DimensionalData.jl Performance of `broadcast_dims` on `DimStack` vs `DimArray`

It appears that broadcast_dims over a DimStack is significantly slower than over a DimArray. Consider the following example:

julia> xs, ys, zs = X(1:70), Y(1:50), Z(1:500)
↓ X 1:70,
→ Y 1:50,
↗ Z 1:500

julia> AB = DimStack((;A=rand(xs,ys,zs), B=rand(xs,ys,zs)))
╭────────────────────╮
│ 70×50×500 DimStack │
├────────────────────┴───────────────────────────── dims ┐
  ↓ X Sampled{Int64} 1:70 ForwardOrdered Regular Points,
  → Y Sampled{Int64} 1:50 ForwardOrdered Regular Points,
  ↗ Z Sampled{Int64} 1:500 ForwardOrdered Regular Points
├──────────────────────────────────────────────── layers ┤
  :A eltype: Float64 dims: X, Y, Z size: 70×50×500
  :B eltype: Float64 dims: X, Y, Z size: 70×50×500
└────────────────────────────────────────────────────────┘

julia> ab = DimStack((;A=rand(zs), B=rand(zs)))
╭──────────────────────╮
│ 500-element DimStack │
├──────────────────────┴─────────────────────────── dims ┐
  ↓ Z Sampled{Int64} 1:500 ForwardOrdered Regular Points
├──────────────────────────────────────────────── layers ┤
  :A eltype: Float64 dims: Z size: 500
  :B eltype: Float64 dims: Z size: 500
└────────────────────────────────────────────────────────┘

# 6ms!!
julia> @benchmark broadcast_dims(-, AB, ab)
BenchmarkTools.Trial: 768 samples with 1 evaluation.
 Range (min … max):  6.045 ms …  11.359 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.587 ms               ┊ GC (median):    3.23%
 Time  (mean ± σ):   6.506 ms ± 336.666 μs  ┊ GC (mean ± σ):  3.86% ± 3.73%

        ▃███▆▂▁▂▁                  ▁ ▁▅▄▅▄▂    ▁               
  ▃▃▄▆▇██████████▇▇▆▅▅▅▃▂▂▁▁▁▂▃▂▁▅▆███████████▇█▇▆█▄▃▅▅▅▃▃▃▃▃ ▅
  6.05 ms         Histogram: frequency by time        7.06 ms <

 Memory estimate: 53.42 MiB, allocs estimate: 114.

# baseline: 1ms
julia> @benchmark broadcast_dims(-, AB.A, ab.A)
BenchmarkTools.Trial: 4448 samples with 1 evaluation.
 Range (min … max):  955.459 μs …   1.922 ms  ┊ GC (min … max): 0.00% …  0.00%
 Time  (median):       1.043 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.120 ms ± 200.487 μs  ┊ GC (mean ± σ):  6.72% ± 11.55%

     ▃▆▇█▅▄▃                                                     
  ▂▄▇████████▆▅▄▃▃▂▂▂▂▂▂▁▁▂▂▁▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▃▃▃▄▃▄▄▃▃▃▃▃▃▂ ▃
  955 μs           Histogram: frequency by time         1.68 ms <

 Memory estimate: 13.35 MiB, allocs estimate: 35.

If I define my own broadcast_dims, it has the right speed:

julia> function my_broadcast_dims(func, AB, ab)
           names = keys(AB)
           out = map(names) do name
               broadcast_dims(func, AB[name], ab[name])
           end
           DimStack(NamedTuple{names}(out), dims(AB))
       end
my_broadcast_dims (generic function with 1 method)

# It takes ~twice as long to broadcast over the 2-stack compared to one DimArray
julia> @benchmark my_broadcast_dims(-, AB, ab)
BenchmarkTools.Trial: 2222 samples with 1 evaluation.
 Range (min … max):  1.992 ms …   3.999 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.152 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.245 ms ± 221.165 μs  ┊ GC (mean ± σ):  4.72% ± 7.53%

      ▁▂▄▃▅█▆█▆▆▂                                              
  ▂▃▅▆███████████▇▇▆▆▃▃▃▂▂▁▂▁▁▁▁▁▁▁▁▂▁▁▂▂▃▃▄▄▃▆▄▄▆▅▆▅▅▄▄▃▃▃▃▂ ▄
  1.99 ms         Histogram: frequency by time        2.77 ms <

 Memory estimate: 26.71 MiB, allocs estimate: 84.

Jun 11 '24 22:06 haakon-e

Thanks. @profview may help. I would make a function that runs it like so

f(AB, ab, n) = for _ in 1:n broadcast_dims(-, AB, ab)

And then try 100 times

using ProfileView
@profview f(AB, ab, 100)

That should highly the problems. Feel free to PR if you fix it.

Jun 12 '24 00:06 rafaqz

Ok so the problem is there are a lot of edge cases where your formulation will make the order of arguments change the results - as stack layers don't have all the dimensions of each stack.

What Ive done in #767 is check for these cases and avoid the performance cost of using DimExtensionArray on everything.

With that I'm getting the expected performance for your case, but it still fixes the edge cases.

Aug 16 '24 21:08 rafaqz

DimensionalData.jl DimensionalData.jl copied to clipboard

Performance of `broadcast_dims` on `DimStack` vs `DimArray`

DimensionalData.jl
DimensionalData.jl copied to clipboard