SparseArrays.jl
SparseArrays.jl copied to clipboard
Extra allocations when using generalized `mul!` operation
Minimal working example:
N = 3000
A = sprand(ComplexF64, N, N, 1/N)
# A = rand(ComplexF64, N, N)
x = randn(N) + 1im*randn(N)
y = randn(N) + 1im*randn(N)
α = 1.0 + 0.0im
β = 0.0 + 0.0im
@benchmark mul!($y, $A, $x, $α, $β)
@benchmark mul!($y, $A, $x)
@benchmark mul!($y, $A, $x, true, false)
The first one gives
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
Range (min … max): 3.763 μs … 9.748 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.207 μs ┊ GC (median): 0.00%
Time (mean ± σ): 4.125 μs ± 172.364 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ ▂██▇▅▂ ▁ ▃██▇▅▃▁ ▁ ▃
███▅▅▁▁▁▃▁▃▁▃▁▁▄▇█▆███████▇▆▆▅▆▆▇▇██████████████▇▇▆▇▇▆▆█▇▆▆ █
3.76 μs Histogram: log(frequency) by time 4.42 μs <
Memory estimate: 96 bytes, allocs estimate: 2.
The second one
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
Range (min … max): 3.242 μs … 7.697 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.553 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.488 μs ± 151.330 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▁ ▁▅▇█▇▅▂ ▁▂▁▁ ▃▆██▇▆▃▁ ▃
▆████▆▄▁▅▃▅▄▁▅███████▇▆█████▆▆▇██▇▅▇▇██████████▇▇▇▇▇▇▇▇▆▆▅▆ █
3.24 μs Histogram: log(frequency) by time 3.71 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
and the third one
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
Range (min … max): 3.224 μs … 6.931 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 3.403 μs ┊ GC (median): 0.00%
Time (mean ± σ): 3.473 μs ± 131.089 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▇█ ▄▆▂
▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▆███▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▅███▅▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
3.22 μs Histogram: frequency by time 3.74 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
I have also noticed that the MulAddMul struct gives some type instabilities. Thus, I could be related to that.
The allocation size doesn't increase with the size of the matrix, but, since I have to apply this function a lot of times in my DiffEq integration, I get millions of allocations.
Anyways, the same problem with a dense matrix works well.