DimensionalData.jl
DimensionalData.jl copied to clipboard
DimensionMismatch with `cat`
sorry about another cat issue, but shouldn't it work even if the dimensions are transposed in the underlying array?
julia> using DimensionalData
julia> A = rand(X(2), Y(3))
╭─────────────────────────╮
│ 2×3 DimArray{Float64,2} │
├─────────────────── dims ┤
↓ X, → Y
└─────────────────────────┘
0.638864 0.541596 0.625792
0.885387 0.24923 0.979457
julia> B = rand(Y(3), X(4))
╭─────────────────────────╮
│ 3×4 DimArray{Float64,2} │
├─────────────────── dims ┤
↓ Y, → X
└─────────────────────────┘
0.384553 0.65434 0.108298 0.531854
0.909667 0.847332 0.319774 0.492833
0.653972 0.868346 0.567731 0.408845
julia> cat(A, B, dims=:X)
ERROR: DimensionMismatch: mismatch in dimension 2 (expected 3 got 4)
Stacktrace:
[1] _cs
@ ./abstractarray.jl:1785 [inlined]
[2] _cshp
@ ./abstractarray.jl:1775 [inlined]
[3] _cshp(ndim::Int64, dims::Tuple{Bool}, shape::Tuple{Int64, Int64}, nshape::Tuple{Int64, Int64})
@ Base ./abstractarray.jl:1782
[4] _cat_size_shape
@ ./abstractarray.jl:1761 [inlined]
[5] cat_size_shape
@ ./abstractarray.jl:1759 [inlined]
[6] _cat_t
@ ./abstractarray.jl:1800 [inlined]
[7] _cat(dims::Tuple{Int64}, X1::Matrix{Float64}, X::Matrix{Float64})
@ SparseArrays ~/.julia/juliaup/julia-1.10.2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.10/SparseArrays/src/sparsevector.jl:1227
[8] cat
@ ./abstractarray.jl:1993 [inlined]
[9] _cat(catdims::Tuple{…}, A1::DimMatrix{…}, As::DimMatrix{…})
@ DimensionalData ~/.julia/packages/DimensionalData/vXseP/src/array/methods.jl:335
[10] _cat
@ ~/.julia/packages/DimensionalData/vXseP/src/array/methods.jl:269 [inlined]
[11] #cat#196
@ ~/.julia/packages/DimensionalData/vXseP/src/array/methods.jl:266 [inlined]
[12] top-level scope
@ REPL[5]:1
Some type information was truncated. Use `show(err)` to see complete types.
julia> cat(A, B', dims=:X)
╭─────────────────────────╮
│ 6×3 DimArray{Float64,2} │
├─────────────────── dims ┤
↓ X, → Y
└─────────────────────────┘
0.638864 0.541596 0.625792
0.885387 0.24923 0.979457
0.384553 0.909667 0.653972
0.65434 0.847332 0.868346
0.108298 0.319774 0.567731
0.531854 0.492833 0.408845
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 12 × Apple M2 Max
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
JULIA_PROJECT = @.
JULIA_EDITOR = vi
(jl_ZgjCA7) pkg> st
Status `/private/var/folders/s5/8d629n5d7nsf37f60_91wzr40000gq/T/jl_ZgjCA7/Project.toml`
[0703355e] DimensionalData v0.26.3
That would really be moving away from base julia behavior.
We actually never automatically permute or do anything fancy like that on Base methods so we don't accidentally let through bugs in generic contexts. Like a permuted square DimMatrix would work in cat like its parent Array but the answer will be different because of the auto permutation.
For broadcasts we have our own broadcast_dims that does permute and even reshapes missing dimensions. Someone could write a PR for something similar for cat.
Can you just permute all your arrays to a particular dim order first?
i can just permute and that is what i've just done to fix a bug in my code. but, in my mind one of the nice things about DimArrays is you don't (usually at least i'm just now learning b/c of cat) have to pay attention to the order of the underlying dims. so for example in the code above i can reverse the indexing and it doesn't matter: A[Y(1),X(2)] == A[X(2),Y(1)], where as in base julia the analogous equality, A[1,2] == A[2,1], would be false.
I guess in this case using the named dimension could move us away from the base behaviour as youre already using dims=:X and that will fail on a Array. If you use dims=1 we cant permute.
But what should it do if you use hcat ?
I agree it makes sense to keep base behavior here. It seems like a macro that overrides base behavior and uses dimensions would be useful. I guess a generalization of the current macro for broadcasting.
i like the idea that with dims=:X it does the permute for you and with dims=1 it throws an error if the sizes of the DimArrays don't match. for hcat, i'd suggest that it should behave like cat(..., dims=1) and throw an error.
i've got 2400 lines of code in which i assumed when i wrote it that the order of named dims never mattered. are there any other cases like cat for which it actually does?
According to the docs in Base hcat is equivalent to cat(A...; dims=2) and vcat to cat(A...; dims=1) and therefore I would expect this to error.
@bjarthur basically the dimension order doesnt matter for indexing if you use dimensions.
Where else besides cat do we want auto permutation if you name the dimension? I guess stack .
~And like all functiond that call reduce over a named dimension with multiple DimArrays ?~ oops thats only one array
And I guess the first argument gets to be the one that doesnt permute?
What if a regular Array is included in cat with two DimArrays that need permuting?
Where else besides
catdo we want auto permutation if you name the dimension? I guessstack.
Yes stack seems to be a valid one. I just went through the Base Array function docs and looked for functions that accepted a dims keyword. There was no other function that came to mind where the auto permutation seems relevant. https://docs.julialang.org/en/v1/base/arrays/
What if a regular Array is included in
catwith two DimArrays that need permuting?
I think, that should error, because it is not clear what is the intended behaviour. But do you think that this is a real use case? Normally I am having a list of DimArrays that I want to concatenate together. And as a fix for that one can wrap the normal Array into a DimArray but then the user can exactly specify which dimension has which name.
Yes you're right @felixcremer it seem its only cat and stack where it makes sense to do this. Currently its an error so we can add it at any time and I'm ok with auto permutation as long as the syntax errors on a regular array.
@sethaxen in the long term a macro for broadcast would be nice I was thinking @d A * B + 7 could wrap all of its arguments in a function. That would wrap just AbstractDimArray and AbstractDimStack in a wrapper type to indicate auto permuted/reshaped broadcasts are ok, and then replace itself with @. to create regular broadcasts. Then we can basically do what broadcast_dims does behind the scenes during broadcasting dispatch.
Its also no more characters than regular @. macro broadcasts, which is nice.