ForwardDiff.jl icon indicating copy to clipboard operation
ForwardDiff.jl copied to clipboard

Compiletime for tensor very slow

Open michel2323 opened this issue 8 years ago • 5 comments
trafficstars

We do prototyping of a time integration using a handwritten residual function (50 lines) and Jacobian (100 lines):

    while norm_res > eps
        iteration = iteration + 1
        for iter in eachindex(J)
            J[iter]=0.0
        end
        jac_beuler(x, xold, h, e_fd, p_m, J, Ymat)
        x = x - inv(J)*F
        residual_beuler(x, xold, h, e_fd, p_m, F, Ymat)
        norm_res = norm(F)
    end

Computing the Jacobian, Hessian and tensor of one timestep with

    jac_cfg = ForwardDiff.JacobianConfig(integrate_wrapper, x, ForwardDiff.Chunk{1}())
    jac = x -> ForwardDiff.jacobian(integrate_wrapper, x, jac_cfg)
    
    hes_jac = x -> ForwardDiff.jacobian(integrate_wrapper, x)
    hes_cfg = ForwardDiff.JacobianConfig(hes_jac, x, ForwardDiff.Chunk{1}())
    hes = x -> ForwardDiff.jacobian(hes_jac, x, hes_cfg)

    ten_hes = x -> ForwardDiff.jacobian(hes_jac, x)
    ten_cfg = ForwardDiff.JacobianConfig(ten_hes, x, ForwardDiff.Chunk{1}())
    ten = x -> ForwardDiff.jacobian(ten_hes, x, ten_cfg)

results in the following runtime:

  1st Jacobian: 2.629844 seconds (1.13 M allocations: 49.980 MiB, 1.57% gc time)
  2nd Jacobian 0.005359 seconds (14.67 k allocations: 751.859 KiB)
  1st Hessian 26.822772 seconds (20.38 M allocations: 625.502 MiB, 2.31% gc time)
  2nd Hessian 0.151826 seconds (58.35 k allocations: 11.634 MiB, 22.91% gc time)
  1st tensor 6536.218095 seconds (12.29 G allocations: 319.796 GiB, 2.90% gc time)

I already profiled and typed the code as much as possible I could. This is run with Julia -O0. With these JIT compilation times I would be happy to sacrifice a bit of runtime for a faster JIT.

Is there any way to reduce the time for the tensor?

michel2323 avatar Oct 17 '17 14:10 michel2323

I have the same issue and I'm wondering if there is any solution for this.

misun6312 avatar Feb 06 '18 18:02 misun6312

ref https://github.com/JuliaDiff/ForwardDiff.jl/issues/266

Playing around with --compile and/or @nospecialize might help...interpretation in Julia is pretty slow right now, but it still might be faster than the insane compilation times you're hitting here.

It might be worth it to try removing all of ForwardDiff's @inlines annotations and see what it does to performance. Those annotations were necessary a long time ago in order to guarantee performance, but the compiler's inlining heuristic has since gotten more advanced and might do better nowadays.

jrevels avatar Feb 06 '18 21:02 jrevels

Is this still an issue? There's no MWE given to test it. #266 does much better on v1.0 though.

ChrisRackauckas avatar Nov 14 '18 14:11 ChrisRackauckas

using ForwardDiff


function speelpenning(x)
  res = [1.0]
  for i in x
    res = res*i
  end 
  return res 
end

dim = parse(Int,ARGS[1])
println("Speelpenning with dim = ", dim)
@time fjac = x0 -> ForwardDiff.jacobian(speelpenning, x0) 
@time fhes = x0 -> ForwardDiff.jacobian(fhes_jac, x0) 
@time ften = x0 -> ForwardDiff.jacobian(ften_hes, x0) 
x1 = ones(dim)
@time fjac(x1)
@time fhes(x1)
@time ften(x1)

No it's not resolved. With dim=10 this code takes forever. That's weird for speelpenning which is a classic example in AD.

michel2323 avatar Nov 14 '18 15:11 michel2323

Here's a corrected version of the code as an MWE:

using ForwardDiff

function speelpenning(x)
  res = [one(eltype(x))]
  for i in x
    res .= res.*i
  end
  return res
end

dim = 10
println("Speelpenning with dim = ", dim)
fjac = x0 -> ForwardDiff.jacobian(speelpenning, x0)
fhes_jac = x0 -> ForwardDiff.jacobian(speelpenning, x0)
fhes = x0 -> ForwardDiff.jacobian(fhes_jac, x0)
ften_hes = x0 -> ForwardDiff.jacobian(fhes_jac, x0)
ften = x0 -> ForwardDiff.jacobian(ften_hes, x0)
x1 = ones(dim)
println("Jac with compile")
@time fjac(x1)
println("Jac without compile")
@time fjac(x1)
println("Hes with compile")
@time fhes(x1)
println("Hes without compile")
@time fhes(x1)
println("Ten with compile")
@time ften(x1)
println("Ten without compile")
@time ften(x1)

Which outputs:

Speelpenning with dim = 10
Jac with compile
  0.676379 seconds (1.86 M allocations: 96.334 MiB, 3.62% gc time)
Jac without compile
  0.000071 seconds (8 allocations: 2.344 KiB)
Hes with compile
  1.022722 seconds (1.72 M allocations: 81.470 MiB, 1.85% gc time)
Hes without compile
  0.000103 seconds (11 allocations: 23.250 KiB)
Ten with compile
 22.808301 seconds (9.07 M allocations: 347.316 MiB, 0.78% gc time)
Ten without compile
  0.000317 seconds (15 allocations: 255.906 KiB)

With -O0:

Speelpenning with dim = 10
Jac with compile
  0.553653 seconds (1.86 M allocations: 96.355 MiB, 5.27% gc time)
Jac without compile
  0.000026 seconds (38 allocations: 4.063 KiB)
Hes with compile
  0.444710 seconds (1.72 M allocations: 81.481 MiB, 5.49% gc time)
Hes without compile
  0.000078 seconds (41 allocations: 33.719 KiB)
Ten with compile
  1.802644 seconds (9.07 M allocations: 347.459 MiB, 8.56% gc time)
Ten without compile
  0.000361 seconds (45 allocations: 361.531 KiB)

ChrisRackauckas avatar Nov 14 '18 16:11 ChrisRackauckas