BoundaryValueDiffEq.jl
BoundaryValueDiffEq.jl copied to clipboard
disable precompilation for now
This package takes a massive amount of time to precompile (~100s) and most of that is due to the precompile workload. Removing the precompile workload makes it take 15s (and reduces the size of the precompile file from 135MB -> 5 MB).
Unfortunately, the solve
methods getting compiled here specialize on input functions which means they are only valid if those identical functions are used. This means that the precompile workload is pretty much useless to reduce latency for a user. As an example, running an identical workload as to what is used in the precompile workload:
using BoundaryValueDiffEq
import FastClosures: @closure
begin
f1! = @closure (du, u, p, t) -> begin
du[1] = u[2]
du[2] = 0
end
bc1! = @closure (residual, u, p, t) -> begin
residual[1] = u[1][1] - 5
residual[2] = u[lastindex(u)][1]
end
tspan = (0.0, 5.0)
u0 = [5.0, -3.5]
prob = BVProblem(f1!, bc1!, u0, tspan; nlls = Val(false))
jac_alg = BVPJacobianAlgorithm(AutoForwardDiff(; chunksize = 2))
alg = MIRK2(; jac_alg)
@time @eval solve(prob, alg; dt = 0.2)
end
We can see that it takes
5.421249 seconds (5.52 M allocations: 375.658 MiB, 1.99% gc time, 99.98% compilation time)
with precompilation active and
6.549551 seconds (9.12 M allocations: 618.889 MiB, 6.21% gc time, 99.98% compilation time)
with precompilation not active. There is some common code that can be reused but most of it has to be done from scratch. For example the solve
function which is expensive is recompiled:
precompile(Tuple{typeof(Core.kwcall), NamedTuple{(:dt,), Tuple{Float64}}, typeof(CommonSolve.solve), SciMLBase.BVProblem{Array{Float64, 1}, Tuple{Float64, Float64}, true, false, SciMLBase.NullParameters, SciMLBase.BVPFunction{true, SciMLBase.FullSpecialize, false, Main.var"#1#3", Main.var"#2#4",
The
Main.var"#1#3", Main.var"#2#4"
part in the type shows that this is specialized and compiled for the input closures.
Until the package has been restructured to allow more of the compiled code to be reused I don't think it is worth spending this much time for all users in generating compiled code that is almost worthless to a user.
Alternatively, it could be possible to only precompile one combination of algorithm and problem which would probably compile the common parts that are reused by user code just as well as the current 36 alg + prob combinations that are precompiled.