julia icon indicating copy to clipboard operation
julia copied to clipboard

Avoid materializing arrays in bidiag matmul

Open jishnub opened this issue 1 year ago • 6 comments

Currently, small Bidiagonal/Tridiagonal matrices are materialized in matrix multiplications, but this is wasteful and unnecessary. This PR changes this to use a naive matrix multiplication for small matrices, and fall back to the banded multiplication for larger ones. Multiplication by a Bidiagonal falls back to a banded matrix multiplication for all sizes in the current implementation, and iterates in a cache-friendly manner for the non-Bidiagonal matrix.

In certain cases, the matrices were being materialized if the non-structured matrix was small, even if the structured matrix was large. This is changed as well in this PR.

Some improvements in performance:

julia> B = Bidiagonal(rand(3), rand(2), :U); A = rand(size(B)...); C = similar(A);

julia> @btime mul!($C, $A, $B);
  193.152 ns (6 allocations: 352 bytes) # nightly v"1.12.0-DEV.1034"
  18.826 ns (0 allocations: 0 bytes) # This PR

julia> T = Tridiagonal(rand(99), rand(100), rand(99)); A = rand(2, size(T,2)); C = similar(A);

julia> @btime mul!($C, $A, $T);
  9.398 μs (8 allocations: 79.94 KiB) # nightly
  416.407 ns (0 allocations: 0 bytes) # This PR

julia> B = Bidiagonal(rand(300), rand(299), :U); A = rand(20000, size(B,2)); C = similar(A);

julia> @btime mul!($C, $A, $B);
  33.395 ms (0 allocations: 0 bytes) # nightly
  6.695 ms (0 allocations: 0 bytes) # This PR (cache-friendly)

Closes https://github.com/JuliaLang/julia/pull/55414

jishnub avatar Aug 10 '24 15:08 jishnub

After 2d15c70, the performance of a Bidiagonal * Bidiagonal improves as the destination is accessed in a column-major order.

julia> BL = Bidiagonal([1:400;], [1:399;], :L);

julia> C = similar(BL, size(BL));

julia> @btime mul!($C, $BL, $BL);
  28.031 μs (6 allocations: 6.38 KiB) # nightly
  24.398 μs (0 allocations: 0 bytes) # This PR

jishnub avatar Aug 12 '24 09:08 jishnub

Uff, this was supposed to supersede https://github.com/JuliaLang/julia/pull/55414, I guess?

dkarrasch avatar Aug 25 '24 09:08 dkarrasch

Yes, I should probably have closed that one. I'll rebase this to resolve the conflicts.

jishnub avatar Aug 25 '24 09:08 jishnub

I'll rebase this to resolve the conflicts.

I have done that already. In the resolution, I kept the zero-size early drop-out.

dkarrasch avatar Aug 25 '24 09:08 dkarrasch

Thanks, that was quick!

jishnub avatar Aug 25 '24 09:08 jishnub

Any idea why the apple-darwin addmul tests are failing?

jishnub avatar Aug 25 '24 12:08 jishnub

test failures seem to be timeouts now

jishnub avatar Aug 29 '24 14:08 jishnub

Gentle bump

jishnub avatar Sep 09 '24 07:09 jishnub