DimensionalData.jl icon indicating copy to clipboard operation
DimensionalData.jl copied to clipboard

`findmax` does not work on DimStacks

Open asinghvi17 opened this issue 1 year ago • 5 comments

MWE:

using DimensionalData
A = [1.0 2.0 3.0;
     4.0 5.0 6.0]
x, y, z = X([:a, :b]), Y(10.0:10.0:30.0; metadata=Dict()), Z()
dimz = x, y
da1 = DimArray(A, (x, y); name=:one, metadata=Metadata())
da2 = DimArray(Float32.(2A), (x, y); name=:two)
da3 = DimArray(Int.(3A), (x, y); name=:three)
da4 = DimArray(cat(4A, 5A, 6A, 7A; dims=3), (x, y, z); name=:extradim)

s = DimStack((da1, da2, da3))

maximum(s) # fine
findmax(s) # errors

Error:


julia> maximum(s) # fine
(one = 6.0, two = 12.0f0, three = 18)

julia> findmax(s) # errors
ERROR: MethodError: no method matching isless(::DimMatrix{…}, ::DimMatrix{…})
The function `isless` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  isless(::Missing, ::Any)
   @ Base missing.jl:87
  isless(::Any, ::Missing)
   @ Base missing.jl:88
  isless(::Base.CoreLogging.LogLevel, ::Base.CoreLogging.LogLevel)
   @ Base logging/logging.jl:131
  ...

Stacktrace:
  [1] _rf_findmax(::Tuple{DimMatrix{…}, Symbol}, ::Tuple{DimMatrix{…}, Symbol})
    @ Base ./reduce.jl:907
  [2] BottomRF
    @ ./reduce.jl:86 [inlined]
  [3] MappingRF (repeats 2 times)
    @ ./reduce.jl:100 [inlined]
  [4] _foldl_impl(op::Base.MappingRF{…}, init::Base._InitialValue, itr::Base.Iterators.Zip{…})
    @ Base ./reduce.jl:62
  [5] foldl_impl(op::Base.MappingRF{…}, nt::Base._InitialValue, itr::Base.Iterators.Zip{…})
    @ Base ./reduce.jl:48
  [6] mapfoldl_impl(f::Base.var"#353#354"{…}, op::typeof(Base._rf_findmax), nt::Base._InitialValue, itr::Base.Generator{…})
    @ Base ./reduce.jl:44
  [7] mapfoldl(f::Function, op::Function, itr::Base.Generator{…}; init::Base._InitialValue)
    @ Base ./reduce.jl:175
  [8] mapfoldl
    @ ./reduce.jl:175 [inlined]
  [9] _findmax
    @ ./reduce.jl:906 [inlined]
 [10] findmax(f::Function, domain::DimStack{…})
    @ Base ./reduce.jl:905
 [11] _findmax(a::DimStack{…}, ::Colon)
    @ Base ./reduce.jl:935
 [12] findmax(itr::DimStack{…})
    @ Base ./reduce.jl:934
 [13] top-level scope
    @ REPL[265]:1
Some type information was truncated. Use `show(err)` to see complete types.

If this is intended, we should dispatch on findmax and throw a better error message. If not - how would this work? Returning a namedtuple of selectors for each layer seems a bit arbitrary...

asinghvi17 avatar Nov 12 '24 15:11 asinghvi17

btw, I don't need this, just an edge case I stumbled into.

asinghvi17 avatar Nov 12 '24 15:11 asinghvi17

Oh no now reductions are kind of broken if you expect the new iteration :crying_cat_face:

The NamedTuple/Array crossover is endless pain, every change breaks some other expectation

rafaqz avatar Nov 12 '24 17:11 rafaqz

But doing this over a NameTuple iterator is totally useless, it will just ~~error~~ return a weird result:

julia> maximum(s) # fine
(one = 6.0, two = 12.0f0, three = 18)

Its also generally useless over multi-value objects like arrays, because you just end up with a single object that was chosen as the "maximum" for some arcane reason or other (maximum first value in this case), rather than the maximum of each separate position in the vector.

TLDR we can't have findmax find a single Int and have maximum work like this too, unless findmax returns a NamedTuple of indices in each separate layer. Or, it just looks at the first layer, breaking with how the other reducing methods work.

rafaqz avatar Nov 12 '24 17:11 rafaqz

So this is the problem:

julia> maximum([(a=1, b=200), (a=2, b=1)])
(a = 2, b = 1)

We may need a slightly more formal definition of what a DimStack is and how it operates under iteration and reduction. Its just hard to know what definition is the most useful practically until you try them out for a year

rafaqz avatar Nov 12 '24 17:11 rafaqz

This is what DimStack is currently:

Array of NamedTuple

  • iteration and indexing: one single Array of NamedTuple

NamedTuple of Array

  • reduction: multiple separate arrays of different lengths reduced separately
  • broadcast_dims: separate broadcasts over each layer like reductions
  • merge and Symbol indexing/getproperty: separate layers like a NamedTuple of Array

Both

  • tables: Row-wise A of NT, Col-wise NT of A (because looped to be the same length)

Undefined

  • broadcast: ?

So, still pretty weird. I find sum(dimstack) to be an odd outcome, and different to summing the iterator unless all layers are the same size.

findmax just doesn't really fit in at all.

I always wondered if it was trying to do too much in one object, and should just be two that you can switch between.

rafaqz avatar Nov 13 '24 00:11 rafaqz