ForwardDiff.jl
ForwardDiff.jl copied to clipboard
Compiletime for tensor very slow
We do prototyping of a time integration using a handwritten residual function (50 lines) and Jacobian (100 lines):
while norm_res > eps
iteration = iteration + 1
for iter in eachindex(J)
J[iter]=0.0
end
jac_beuler(x, xold, h, e_fd, p_m, J, Ymat)
x = x - inv(J)*F
residual_beuler(x, xold, h, e_fd, p_m, F, Ymat)
norm_res = norm(F)
end
Computing the Jacobian, Hessian and tensor of one timestep with
jac_cfg = ForwardDiff.JacobianConfig(integrate_wrapper, x, ForwardDiff.Chunk{1}())
jac = x -> ForwardDiff.jacobian(integrate_wrapper, x, jac_cfg)
hes_jac = x -> ForwardDiff.jacobian(integrate_wrapper, x)
hes_cfg = ForwardDiff.JacobianConfig(hes_jac, x, ForwardDiff.Chunk{1}())
hes = x -> ForwardDiff.jacobian(hes_jac, x, hes_cfg)
ten_hes = x -> ForwardDiff.jacobian(hes_jac, x)
ten_cfg = ForwardDiff.JacobianConfig(ten_hes, x, ForwardDiff.Chunk{1}())
ten = x -> ForwardDiff.jacobian(ten_hes, x, ten_cfg)
results in the following runtime:
1st Jacobian: 2.629844 seconds (1.13 M allocations: 49.980 MiB, 1.57% gc time)
2nd Jacobian 0.005359 seconds (14.67 k allocations: 751.859 KiB)
1st Hessian 26.822772 seconds (20.38 M allocations: 625.502 MiB, 2.31% gc time)
2nd Hessian 0.151826 seconds (58.35 k allocations: 11.634 MiB, 22.91% gc time)
1st tensor 6536.218095 seconds (12.29 G allocations: 319.796 GiB, 2.90% gc time)
I already profiled and typed the code as much as possible I could. This is run with Julia -O0. With these JIT compilation times I would be happy to sacrifice a bit of runtime for a faster JIT.
Is there any way to reduce the time for the tensor?
I have the same issue and I'm wondering if there is any solution for this.
ref https://github.com/JuliaDiff/ForwardDiff.jl/issues/266
Playing around with --compile and/or @nospecialize might help...interpretation in Julia is pretty slow right now, but it still might be faster than the insane compilation times you're hitting here.
It might be worth it to try removing all of ForwardDiff's @inlines annotations and see what it does to performance. Those annotations were necessary a long time ago in order to guarantee performance, but the compiler's inlining heuristic has since gotten more advanced and might do better nowadays.
Is this still an issue? There's no MWE given to test it. #266 does much better on v1.0 though.
using ForwardDiff
function speelpenning(x)
res = [1.0]
for i in x
res = res*i
end
return res
end
dim = parse(Int,ARGS[1])
println("Speelpenning with dim = ", dim)
@time fjac = x0 -> ForwardDiff.jacobian(speelpenning, x0)
@time fhes = x0 -> ForwardDiff.jacobian(fhes_jac, x0)
@time ften = x0 -> ForwardDiff.jacobian(ften_hes, x0)
x1 = ones(dim)
@time fjac(x1)
@time fhes(x1)
@time ften(x1)
No it's not resolved. With dim=10 this code takes forever. That's weird for speelpenning which is a classic example in AD.
Here's a corrected version of the code as an MWE:
using ForwardDiff
function speelpenning(x)
res = [one(eltype(x))]
for i in x
res .= res.*i
end
return res
end
dim = 10
println("Speelpenning with dim = ", dim)
fjac = x0 -> ForwardDiff.jacobian(speelpenning, x0)
fhes_jac = x0 -> ForwardDiff.jacobian(speelpenning, x0)
fhes = x0 -> ForwardDiff.jacobian(fhes_jac, x0)
ften_hes = x0 -> ForwardDiff.jacobian(fhes_jac, x0)
ften = x0 -> ForwardDiff.jacobian(ften_hes, x0)
x1 = ones(dim)
println("Jac with compile")
@time fjac(x1)
println("Jac without compile")
@time fjac(x1)
println("Hes with compile")
@time fhes(x1)
println("Hes without compile")
@time fhes(x1)
println("Ten with compile")
@time ften(x1)
println("Ten without compile")
@time ften(x1)
Which outputs:
Speelpenning with dim = 10
Jac with compile
0.676379 seconds (1.86 M allocations: 96.334 MiB, 3.62% gc time)
Jac without compile
0.000071 seconds (8 allocations: 2.344 KiB)
Hes with compile
1.022722 seconds (1.72 M allocations: 81.470 MiB, 1.85% gc time)
Hes without compile
0.000103 seconds (11 allocations: 23.250 KiB)
Ten with compile
22.808301 seconds (9.07 M allocations: 347.316 MiB, 0.78% gc time)
Ten without compile
0.000317 seconds (15 allocations: 255.906 KiB)
With -O0:
Speelpenning with dim = 10
Jac with compile
0.553653 seconds (1.86 M allocations: 96.355 MiB, 5.27% gc time)
Jac without compile
0.000026 seconds (38 allocations: 4.063 KiB)
Hes with compile
0.444710 seconds (1.72 M allocations: 81.481 MiB, 5.49% gc time)
Hes without compile
0.000078 seconds (41 allocations: 33.719 KiB)
Ten with compile
1.802644 seconds (9.07 M allocations: 347.459 MiB, 8.56% gc time)
Ten without compile
0.000361 seconds (45 allocations: 361.531 KiB)