DiffEqOperators.jl
DiffEqOperators.jl copied to clipboard
Unexpectedly many allocations for in place operations
Firstly, thank you. I'm really enjoying the SciML ecosystem.
I'm noticing an unexpectedly high number of allocations when applying operators in place. Here's a minimal example:
d2x = CenteredDifference{1}(2, 2, 1.0, 5)
bcx = Neumann0BC(1.0, 1)
op = d2x * bcx
u = zeros(5,5)
u[3,3] = 1.0
@btime du = op * u # 99.501 μs (1155 allocations: 95.17 KiB)
du = zeros(5,5)
@btime mul!(du, op, u) # 97.665 μs (1154 allocations: 94.89 KiB)
I was expecting a bigger reduction in allocations during the in-place application.
yes, that looks bad.
I confirm this problem is still there. And it's very strange that it takes such long time for such small, 5 * 5, matrix. And surprisingly, when you do this operation on only one column, the difference is so dramatic. I used a 100*100 matrix.
julia> @btime du = op * u[:,1];
733.333 ns (5 allocations: 1.95 KiB)
julia> @btime du = op * u;
23.092 ms (481702 allocations: 31.38 MiB)
there is one allocation can be reduced by:
julia> dux = @view du[:,1]
julia> @btime mul!(dux, op, u[:,1])
721.519 ns (4 allocations: 1.08 KiB)
Then I define
function f1!(du, u)
m,n = size(u)
@assert (m, n) == size(du)
for i = 1:n
du[:, i] = op * u[:,i]
end
end
This is much faster than do it directly.
julia> @benchmark f1!(du, u)
BenchmarkTools.Trial:
memory estimate: 195.31 KiB
allocs estimate: 500
--------------
minimum time: 63.400 μs (0.00% GC)
median time: 137.200 μs (0.00% GC)
mean time: 1.051 ms (87.11% GC)
maximum time: 680.927 ms (99.98% GC)
--------------
samples: 5230
evals/sample: 1
julia> @benchmark mul!(du, op, u)
BenchmarkTools.Trial:
memory estimate: 31.30 MiB
allocs estimate: 481700
--------------
minimum time: 22.251 ms (0.00% GC)
median time: 28.158 ms (0.00% GC)
mean time: 33.626 ms (17.35% GC)
maximum time: 70.771 ms (0.00% GC)
--------------
samples: 149
evals/sample: 1
But it seems not possible to define mul!(du[:,i], op, u[:,i])
inside the function. So i have to define something like :
function f2!(du, u)
m,n = size(u)
@assert (m, n) == size(du)
dui = zeros(m)
for i = 1:n
mul!(dui, op, u[:,i])
@. du[:, i] = dui
end
end
and results in some improvement:
julia> @benchmark f2!(du, u)
BenchmarkTools.Trial:
memory estimate: 107.12 KiB
allocs estimate: 301
--------------
minimum time: 48.500 μs (0.00% GC)
median time: 80.600 μs (0.00% GC)
mean time: 440.562 μs (81.42% GC)
maximum time: 421.137 ms (99.97% GC)
--------------
samples: 10000
evals/sample: 1
It may be a threading performance issue. We may need to use @turbo
or whatnot.
It may be a threading performance issue. We may need to use
@turbo
or whatnot.
Hi Chris, did you notice the difference in allocations between operating the whole matrix or only one column?