BlockArrays.jl icon indicating copy to clipboard operation
BlockArrays.jl copied to clipboard

Very slow matrix-vector multiplication

Open lijas opened this issue 2 years ago • 4 comments

I had some plans of using block arrays with iterative solvers, but it seems like matrix-vector multiplications with block arrays is very slow.


using BlockArrays, SparseArrays


N = 10000
M = 2
b = BlockVector(rand(Float64,N + M), [N, M])
A = BlockArray(sprand(Float64,N + M, N + M, 0.1), [N, M], [N, M])

@time A*b;   # 8.486403 seconds
@time sprand(Float64, N+M, N+M, 0.1)*rand(Float64, N+M);  #0.232274 seconds 

lijas avatar Aug 16 '22 13:08 lijas

This seems to be hitting a fallback method that loops over the blocks. This will be slow for sparse matrices. Perhaps the multiplication can be performed recursively in such cases where the block sizes are compatible.

jishnub avatar Sep 12 '22 11:09 jishnub

This gives reasonable results for a 2x2 block system of sparse matrices:

function LinearAlgebra.mul!(C::AbstractVector, A::BlockMatrix, B::AbstractVector)#, α::Number, β::Number)
   
    n,m = blocksizes(A)

    LinearAlgebra.mul!(view(C, 1:n[1]), A[Block(1,1)], view(B, 1:m[1]), 1.0, 0.0) # c1 = A11*b1 
    LinearAlgebra.mul!(view(C, 1:n[1]), A[Block(1,2)], view(B, (1:m[2]) .+ m[1]), 1.0, 1.0) # c1 += A12*b2
    
    LinearAlgebra.mul!(view(C, (1:n[2]) .+ n[1]), A[Block(2,1)], view(B, 1:m[1]), 1.0, 0.0) # c2 = A21*b1
    LinearAlgebra.mul!(view(C, (1:n[2]) .+ n[1]), A[Block(2,2)], view(B, (1:m[2]) .+ m[1]), 1.0, 1.0)# c2 += A22*b2 
    
end

It can probably be generalized to NxN block matrices.

lijas avatar Sep 12 '22 13:09 lijas

We could redesign the fallback to support sparse matrices better. I believe the fallback calls something like:

for J = blockaxes(A,2), K = blockcolsupport(A,J)
...
end

That is, it already supports certain sparsity patterns (K will only loop over non-zero entries in the column for block banded matrices). Perhaps we can do redesign this as:

for (K,J) = nonzeroblocks(A)
...
end

which will default to the previous behaviour, with no performance penality.

nonzeroblocks should be renamed to whatever it's called in sparse arrays...

dlfivefifty avatar Sep 13 '22 09:09 dlfivefifty

Errr.... for matrix-vector do we need to make sure we are doing the columns in order?? Maybe it doesn't matter...

dlfivefifty avatar Sep 13 '22 09:09 dlfivefifty

I have a similar-ish problem where the matrices being blocked are Toeplitz. A little contrived problem is e.g.

using ToeplitzMatrices, BlockArrays
N = 10000
vc = 1:N
vr = [1; rand(N-1)]
T = Toeplitz(vc, vr)
F = Matrix(T)
x = ones(size(T,1))
@time T*x; # 0.002933 seconds (59 allocations: 1019.281 KiB)
@time F*x; # 0.012257 seconds (2 allocations: 78.172 KiB)

BlockT = mortar(reshape([T,T,T,T,T],1,5))
BlockF = [F F F F F]
varinfo(r"Block")
# name          size summary
# BlockF   3.725 GiB 10000×50000 Matrix{Float64}
# BlockT 156.750 KiB 1x5-blocked
yblock = BlockVector(ones(Float64,5N), [N,N,N,N,N])
y = ones(5N)
@time BlockT*yblock; # 4.774871 seconds (50.01 k allocations: 3.128 MiB)
@time BlockT*y;      # 0.390425 seconds (3 allocations: 78.203 KiB)
@time BlockF*yblock; # 0.172353 seconds (3 allocations: 78.266 KiB)
@time BlockF*y;      # 0.070361 seconds (2 allocations: 78.172 KiB)

I guess this makes me wonder if I should even expect this package to work when the blocked matrices are structured and not dense/sparse. I also do not get why using the blocked-vector is so much worse than the standard vector.

mipals avatar Feb 23 '23 14:02 mipals