BlockBandedMatrices.jl icon indicating copy to clipboard operation
BlockBandedMatrices.jl copied to clipboard

Performances of mul!(x, transpose(B), y)

Open matthieugomez opened this issue 6 years ago • 2 comments

Thanks for the great package! I am trying to use BandedBlockBandedMatrix as a Jacobian in NLsolve (following https://github.com/JuliaDiffEq/DifferentialEquations.jl/issues/483). However the performances get worse than for a sparse matrix. I think this comes from this line in NLSolve. To give an example

using LinearAlgebra, SparseArrays, BlockBandedMatrices, BenchmarkTools
x = rand(10000)
y = rand(10000)
J = BandedBlockBandedMatrix(Ones(10000, 10000), (fill(100, 100), fill(100, 100)), (1, 1), (1, 1))
@btime mul!($x, transpose($J), $y)
# 2.528 s (1 allocation: 16 bytes)
sparseJ = sparse(J)
@btime mul!($x, transpose($sparseJ), $y)
#  60.817 μs (1 allocation: 16 bytes)

Is there a way to make this operation faster?

matthieugomez avatar Aug 08 '19 15:08 matthieugomez

If that matrix is a Jacobian, you probably just want to do reverse mode AD instead.

ChrisRackauckas avatar Aug 08 '19 15:08 ChrisRackauckas

Yes I haven't made sure Transpose{T,<:BandedBlockBandedMatrix} implements the necessary routines. This should in principle be possible, though will take some work.

If you are eager, the starting point is to create a BandedBlockBandedRowMajor() memory layout, see: https://github.com/JuliaMatrices/BlockBandedMatrices.jl/blob/24c5d6b3f030a7735ece01b0f918b3d634802957/src/BandedBlockBandedMatrix.jl#L279

Then we'd need to make sure sub-blocks are BandedRowMajor(): https://github.com/JuliaMatrices/BlockBandedMatrices.jl/blob/24c5d6b3f030a7735ece01b0f918b3d634802957/src/BandedBlockBandedMatrix.jl#L456

We probably need a call to @lazymul to make sure mul! lowers to the LazyArrays.jl multiplication routines.

In theory then we're done: mul!(x, transpose(A), y) should lower to

https://github.com/JuliaMatrices/BlockBandedMatrices.jl/blob/24c5d6b3f030a7735ece01b0f918b3d634802957/src/linalg.jl#L33

which then will call BandedMatrices.jl's implementation of BandedRowMajor multiplication. Though there is likely missing functionality that will be hit.

dlfivefifty avatar Aug 08 '19 15:08 dlfivefifty