SparseArrays.jl icon indicating copy to clipboard operation
SparseArrays.jl copied to clipboard

Extra allocations when using generalized `mul!` operation

Open albertomercurio opened this issue 1 year ago • 0 comments

Minimal working example:

N = 3000
A = sprand(ComplexF64, N, N, 1/N)
# A = rand(ComplexF64, N, N)
x = randn(N) + 1im*randn(N)
y = randn(N) + 1im*randn(N)
α = 1.0 + 0.0im
β = 0.0 + 0.0im

@benchmark mul!($y, $A, $x, $α, $β)
@benchmark mul!($y, $A, $x)
@benchmark mul!($y, $A, $x, true, false)

The first one gives

BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.763 μs …   9.748 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.207 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.125 μs ± 172.364 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▂                 ▂██▇▅▂           ▁  ▃██▇▅▃▁         ▁    ▃
  ███▅▅▁▁▁▃▁▃▁▃▁▁▄▇█▆███████▇▆▆▅▆▆▇▇██████████████▇▇▆▇▇▆▆█▇▆▆ █
  3.76 μs      Histogram: log(frequency) by time      4.42 μs <

 Memory estimate: 96 bytes, allocs estimate: 2.

The second one

BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.242 μs …   7.697 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.553 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.488 μs ± 151.330 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▁▁          ▁▅▇█▇▅▂   ▁▂▁▁          ▃▆██▇▆▃▁              ▃
  ▆████▆▄▁▅▃▅▄▁▅███████▇▆█████▆▆▇██▇▅▇▇██████████▇▇▇▇▇▇▇▇▆▆▅▆ █
  3.24 μs      Histogram: log(frequency) by time      3.71 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

and the third one

BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.224 μs …   6.931 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.403 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.473 μs ± 131.089 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                  ▇█                   ▄▆▂                     
  ▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▆███▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▅███▅▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  3.22 μs         Histogram: frequency by time        3.74 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

I have also noticed that the MulAddMul struct gives some type instabilities. Thus, I could be related to that.

The allocation size doesn't increase with the size of the matrix, but, since I have to apply this function a lot of times in my DiffEq integration, I get millions of allocations.

Anyways, the same problem with a dense matrix works well.

albertomercurio avatar Feb 15 '24 14:02 albertomercurio