DistributedArrays.jl icon indicating copy to clipboard operation
DistributedArrays.jl copied to clipboard

Matrix-vector multiplication fails when DArray is unevenly distributed

Open wcwitt opened this issue 3 years ago • 0 comments

The following example creates DArrays A and x, then attempts A*x. The only unusual bit is that A is distributed unevenly.

# test.jl

using Distributed, DistributedArrays
@everywhere using Distributed, DistributedArrays

A2 = @spawnat 2 ones((2,4))           # 2x4 array
A3 = @spawnat 3 ones((3,4))           # 3x4 array
A = DArray(reshape([A2,A3], (2,1)))   # 5x4 array

x = dones((4,1))                      # 4x1 array

A*x

Launching the REPL with julia -p 2 including test.jl yields

julia> include("test.jl")
ERROR: LoadError: ArgumentError: cuts of the first dimension of the output matrix must match cuts of dimension 1 of the first input matrix
Stacktrace:
 [1] _matmatmul!(C::DArray{Float64, 2, Matrix{Float64}}, A::DArray{Float64, 2, Matrix{Float64}}, B::DArray{Float64, 2, Matrix{Float64}}, α::Int64, β::Int64, tA::Char)
   @ DistributedArrays ~/.julia/packages/DistributedArrays/fEM6l/src/linalg.jl:209
 [2] mul! (repeats 2 times)
   @ ~/.julia/packages/DistributedArrays/fEM6l/src/linalg.jl:262 [inlined]
 [3] *(A::DArray{Float64, 2, Matrix{Float64}}, B::DArray{Float64, 2, Matrix{Float64}})
   @ DistributedArrays ~/.julia/packages/DistributedArrays/fEM6l/src/linalg.jl:279

Looking at the source around /src/linalg.jl:279,

function Base.:*(A::DMatrix, B::AbstractMatrix)
    T = Base.promote_op(_matmul_op, eltype(A), eltype(B))
    C = DArray(I -> Array{T}(undef, map(length, I)),
            (size(A, 1), size(B, 2)),
            procs(A)[:,1:min(size(procs(A), 2), size(procs(B), 2))],
            (size(procs(A), 1), min(size(procs(A), 2), size(procs(B), 2))))
    return mul!(C, A, B)
end

I think the issue may be that creation of the output array C doesn't account for the (admittedly unusual) possibility that A is distributed in uneven chunks.

wcwitt avatar Feb 09 '22 10:02 wcwitt