BandedMatrices.jl
BandedMatrices.jl copied to clipboard
Matrix-Matrix product is slow with diagonal banded matrices
julia> B = BandedMatrix(0=>ones(1000));
julia> D = Diagonal(B);
julia> @btime $B * $B;
118.442 μs (2 allocations: 7.98 KiB)
julia> @btime $D * $D;
761.298 ns (1 allocation: 7.94 KiB)
julia> B * B == D * D
true
This is a significant difference, and it would be good to identify why this is slow.
Probably need to overload copy(::MulAdd{<:DiagonalLayout,<:BandedColumns})
Not sure if I understand. This uses
copy(::MulAdd{BandedMatrices.BandedColumns{DenseColumnMajor}, BandedMatrices.BandedColumns{DenseColumnMajor}, ...)
Perhaps some special cases may be added to gbmm! to deal with the diagonal cases?
Part of the issue seems to be that BLAS.gbmv! is slow for a diagonal banded matrix.
julia> using BandedMatrices, LinearAlgebra, BenchmarkTools
julia> import BandedMatrices: bandeddata
julia> A = BandedMatrix(0=>rand(1000));
julia> v = Float64[i for i in 1:size(A,2)];
julia> y = zeros(size(A,1));
julia> @btime BLAS.gbmv!('N', size($A,1), bandwidth($A,1), bandwidth($A,2),1.0, bandeddata($A), $v, 0.0, $y);
6.184 μs (0 allocations: 0 bytes)
julia> @btime mul!($y, $(Diagonal(A)), $v, 1.0, 0.0);
133.336 ns (0 allocations: 0 bytes)
Right multiplication by a Diagonal matrix should also be improved:
julia> A = BandedMatrix(-1=>rand(999), 0=>rand(1000), 1=>rand(999));
julia> D = Diagonal(rand(1000));
julia> A * D ≈ BandedMatrices._BandedMatrix(A.data .* D.diag', size(A,1), bandwidths(A)...)
true
julia> @btime $A * $D;
20.770 μs (3 allocations: 23.53 KiB)
julia> @btime BandedMatrices._BandedMatrix($A.data .* $D.diag', size($A,1), bandwidths($A)...);
4.954 μs (3 allocations: 23.53 KiB)
Edit: This should be fixed by https://github.com/JuliaLinearAlgebra/BandedMatrices.jl/pull/302