BlockArrays.jl
BlockArrays.jl copied to clipboard
Very slow matrix-vector multiplication
I had some plans of using block arrays with iterative solvers, but it seems like matrix-vector multiplications with block arrays is very slow.
using BlockArrays, SparseArrays
N = 10000
M = 2
b = BlockVector(rand(Float64,N + M), [N, M])
A = BlockArray(sprand(Float64,N + M, N + M, 0.1), [N, M], [N, M])
@time A*b; # 8.486403 seconds
@time sprand(Float64, N+M, N+M, 0.1)*rand(Float64, N+M); #0.232274 seconds
This seems to be hitting a fallback method that loops over the blocks. This will be slow for sparse matrices. Perhaps the multiplication can be performed recursively in such cases where the block sizes are compatible.
This gives reasonable results for a 2x2 block system of sparse matrices:
function LinearAlgebra.mul!(C::AbstractVector, A::BlockMatrix, B::AbstractVector)#, α::Number, β::Number)
n,m = blocksizes(A)
LinearAlgebra.mul!(view(C, 1:n[1]), A[Block(1,1)], view(B, 1:m[1]), 1.0, 0.0) # c1 = A11*b1
LinearAlgebra.mul!(view(C, 1:n[1]), A[Block(1,2)], view(B, (1:m[2]) .+ m[1]), 1.0, 1.0) # c1 += A12*b2
LinearAlgebra.mul!(view(C, (1:n[2]) .+ n[1]), A[Block(2,1)], view(B, 1:m[1]), 1.0, 0.0) # c2 = A21*b1
LinearAlgebra.mul!(view(C, (1:n[2]) .+ n[1]), A[Block(2,2)], view(B, (1:m[2]) .+ m[1]), 1.0, 1.0)# c2 += A22*b2
end
It can probably be generalized to NxN block matrices.
We could redesign the fallback to support sparse matrices better. I believe the fallback calls something like:
for J = blockaxes(A,2), K = blockcolsupport(A,J)
...
end
That is, it already supports certain sparsity patterns (K
will only loop over non-zero entries in the column for block banded matrices). Perhaps we can do redesign this as:
for (K,J) = nonzeroblocks(A)
...
end
which will default to the previous behaviour, with no performance penality.
nonzeroblocks
should be renamed to whatever it's called in sparse arrays...
Errr.... for matrix-vector do we need to make sure we are doing the columns in order?? Maybe it doesn't matter...
I have a similar-ish problem where the matrices being blocked are Toeplitz. A little contrived problem is e.g.
using ToeplitzMatrices, BlockArrays
N = 10000
vc = 1:N
vr = [1; rand(N-1)]
T = Toeplitz(vc, vr)
F = Matrix(T)
x = ones(size(T,1))
@time T*x; # 0.002933 seconds (59 allocations: 1019.281 KiB)
@time F*x; # 0.012257 seconds (2 allocations: 78.172 KiB)
BlockT = mortar(reshape([T,T,T,T,T],1,5))
BlockF = [F F F F F]
varinfo(r"Block")
# name size summary
# BlockF 3.725 GiB 10000×50000 Matrix{Float64}
# BlockT 156.750 KiB 1x5-blocked
yblock = BlockVector(ones(Float64,5N), [N,N,N,N,N])
y = ones(5N)
@time BlockT*yblock; # 4.774871 seconds (50.01 k allocations: 3.128 MiB)
@time BlockT*y; # 0.390425 seconds (3 allocations: 78.203 KiB)
@time BlockF*yblock; # 0.172353 seconds (3 allocations: 78.266 KiB)
@time BlockF*y; # 0.070361 seconds (2 allocations: 78.172 KiB)
I guess this makes me wonder if I should even expect this package to work when the blocked matrices are structured and not dense/sparse. I also do not get why using the blocked-vector is so much worse than the standard vector.