BandedMatrices.jl
BandedMatrices.jl copied to clipboard
Matrix-vector product with a transposed/adjoint `BandedMatrix` is slow
julia> B = BandedMatrix(1=>Float64.(1:7))
8×8 BandedMatrix{Float64} with bandwidths (-1, 1):
⋅ 1.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ 2.0 ⋅ ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ 3.0 ⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ 4.0 ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ 5.0 ⋅ ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 6.0 ⋅
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 7.0
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅
julia> v = Float64[i for i in axes(B,2)];
julia> @btime transpose($B) * $v; # uses gbmv
989.600 ns (3 allocations: 304 bytes)
julia> @btime transpose(transpose(v) * $B); # uses default_blasmul!
254.820 ns (3 allocations: 160 bytes)
julia> @btime $B' * v;
1.014 μs (2 allocations: 144 bytes)
julia> @btime ($v' * $B)';
191.123 ns (1 allocation: 128 bytes)
Edit: This behavior seems to switch at larger sizes, so I'm not sure if there's a universal fix
julia> B = BandedMatrix(rand(100, 100));
julia> v = Float64[i for i in axes(B,2)];
julia> @btime $B' * v;
1.755 μs (2 allocations: 912 bytes)
julia> @btime ($v' * $B)';
4.973 μs (1 allocation: 896 bytes)