BoundaryValueDiffEq.jl icon indicating copy to clipboard operation
BoundaryValueDiffEq.jl copied to clipboard

disable precompilation for now

Open KristofferC opened this issue 2 months ago • 0 comments

This package takes a massive amount of time to precompile (~100s) and most of that is due to the precompile workload. Removing the precompile workload makes it take 15s (and reduces the size of the precompile file from 135MB -> 5 MB).

Unfortunately, the solve methods getting compiled here specialize on input functions which means they are only valid if those identical functions are used. This means that the precompile workload is pretty much useless to reduce latency for a user. As an example, running an identical workload as to what is used in the precompile workload:

using BoundaryValueDiffEq

import FastClosures: @closure

begin
f1! = @closure (du, u, p, t) -> begin
du[1] = u[2]
du[2] = 0
end

bc1! = @closure (residual, u, p, t) -> begin
residual[1] = u[1][1] - 5
residual[2] = u[lastindex(u)][1]
end
tspan = (0.0, 5.0)
u0 = [5.0, -3.5]
prob = BVProblem(f1!, bc1!, u0, tspan; nlls = Val(false))
jac_alg = BVPJacobianAlgorithm(AutoForwardDiff(; chunksize = 2))
alg = MIRK2(; jac_alg)
@time @eval solve(prob, alg; dt = 0.2)

end

We can see that it takes

5.421249 seconds (5.52 M allocations: 375.658 MiB, 1.99% gc time, 99.98% compilation time)

with precompilation active and

6.549551 seconds (9.12 M allocations: 618.889 MiB, 6.21% gc time, 99.98% compilation time)

with precompilation not active. There is some common code that can be reused but most of it has to be done from scratch. For example the solve function which is expensive is recompiled:

precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:dt,), Tuple{Float64}}, typeof(CommonSolve.solve), SciMLBase.BVProblem{Array{Float64, 1}, Tuple{Float64, Float64}, true, false, SciMLBase.NullParameters, SciMLBase.BVPFunction{true, SciMLBase.FullSpecialize, false, Main.var"#1#3", Main.var"#2#4",

The

Main.var"#1#3", Main.var"#2#4"

part in the type shows that this is specialized and compiled for the input closures.

Until the package has been restructured to allow more of the compiled code to be reused I don't think it is worth spending this much time for all users in generating compiled code that is almost worthless to a user.

Alternatively, it could be possible to only precompile one combination of algorithm and problem which would probably compile the common parts that are reused by user code just as well as the current 36 alg + prob combinations that are precompiled.

KristofferC avatar Apr 25 '24 09:04 KristofferC