DiffEqOperators.jl icon indicating copy to clipboard operation
DiffEqOperators.jl copied to clipboard

Unexpectedly many allocations for in place operations

Open atiyo opened this issue 3 years ago • 4 comments

Firstly, thank you. I'm really enjoying the SciML ecosystem.

I'm noticing an unexpectedly high number of allocations when applying operators in place. Here's a minimal example:

d2x = CenteredDifference{1}(2, 2, 1.0, 5)
bcx = Neumann0BC(1.0, 1)
op = d2x * bcx
u = zeros(5,5)
u[3,3] = 1.0
@btime du = op * u # 99.501 μs (1155 allocations: 95.17 KiB)
du = zeros(5,5)
@btime mul!(du, op, u) # 97.665 μs (1154 allocations: 94.89 KiB)

I was expecting a bigger reduction in allocations during the in-place application.

atiyo avatar Mar 24 '21 20:03 atiyo

yes, that looks bad.

ChrisRackauckas avatar Mar 24 '21 20:03 ChrisRackauckas

I confirm this problem is still there. And it's very strange that it takes such long time for such small, 5 * 5, matrix. And surprisingly, when you do this operation on only one column, the difference is so dramatic. I used a 100*100 matrix.

julia> @btime du = op * u[:,1];
  733.333 ns (5 allocations: 1.95 KiB)

julia> @btime du = op * u;
  23.092 ms (481702 allocations: 31.38 MiB)

there is one allocation can be reduced by:

julia> dux = @view du[:,1]
julia> @btime mul!(dux, op, u[:,1])
  721.519 ns (4 allocations: 1.08 KiB)

Then I define

function f1!(du, u)
    m,n = size(u)
    @assert (m, n) == size(du)
    for i = 1:n
        du[:, i] = op * u[:,i]
    end
end

This is much faster than do it directly.

julia> @benchmark f1!(du, u)
BenchmarkTools.Trial:
  memory estimate:  195.31 KiB
  allocs estimate:  500
  --------------
  minimum time:     63.400 μs (0.00% GC)
  median time:      137.200 μs (0.00% GC)
  mean time:        1.051 ms (87.11% GC)
  maximum time:     680.927 ms (99.98% GC)
  --------------
  samples:          5230
  evals/sample:     1

julia> @benchmark mul!(du, op, u)
BenchmarkTools.Trial:
  memory estimate:  31.30 MiB
  allocs estimate:  481700
  --------------
  minimum time:     22.251 ms (0.00% GC)
  median time:      28.158 ms (0.00% GC)
  mean time:        33.626 ms (17.35% GC)
  maximum time:     70.771 ms (0.00% GC)
  --------------
  samples:          149
  evals/sample:     1

But it seems not possible to define mul!(du[:,i], op, u[:,i]) inside the function. So i have to define something like :

function f2!(du, u)
    m,n = size(u)
    @assert (m, n) == size(du)
    dui = zeros(m)
    for i = 1:n
        mul!(dui, op, u[:,i])
        @. du[:, i] = dui
    end
end

and results in some improvement:

julia> @benchmark f2!(du, u)
BenchmarkTools.Trial:
  memory estimate:  107.12 KiB
  allocs estimate:  301
  --------------
  minimum time:     48.500 μs (0.00% GC)
  median time:      80.600 μs (0.00% GC)
  mean time:        440.562 μs (81.42% GC)
  maximum time:     421.137 ms (99.97% GC)
  --------------
  samples:          10000
  evals/sample:     1

hurricane007 avatar Jul 07 '21 16:07 hurricane007

It may be a threading performance issue. We may need to use @turbo or whatnot.

ChrisRackauckas avatar Jul 08 '21 10:07 ChrisRackauckas

It may be a threading performance issue. We may need to use @turbo or whatnot.

Hi Chris, did you notice the difference in allocations between operating the whole matrix or only one column?

hurricane007 avatar Jul 08 '21 12:07 hurricane007