OrdinaryDiffEq.jl
OrdinaryDiffEq.jl copied to clipboard
Default solver selection very slow (x20) compared to `Rodas5P` on a given problem
The system is a bit unusual (has a large number of similar variables). I am also not sure how interesting cases like this is, and how feasible they are to deal with. However, since the performance difference is large, I figured it could be interesting to know.
# Fetch packages.
using Catalyst, OrdinaryDiffEqDefault, OrdinaryDiffEqRosenbrock, FiniteStateProjection, BenchmarkTools
# Create the model.
rs = @reaction_network begin
(p,d), 0 <--> X
(kB,kD), 2X <--> X2
end
# Create the ODEProblem.
ps = [:p => 1.0, :d => 0.2, :kB => 2.0, :kD => 5.0]
u0 = zeros(25,25)
u0[1,1] = 1.0
fsp_sys = FSPSystem(rs, [:X, :X2])
oprob = ODEProblem(fsp_sys, u0, 50.0, ps)
# Simulate using default solver and Rodas5P.
@btime sol1 = solve($oprob, Rodas5P()) # 126.148 ms (3719 allocations: 6.96 MiB)
@btime sol2 = solve($oprob) # 2.689 s (309050 allocations: 138.95 MiB)
Here, the state is a 25x25 matrix (so 625 variables in total).
The number of allocations seems like a regression somewhere, some kind of inference issue.
What's happening here is that this is a big enough problem (length 625) that we're using FBDF with an Krylov. Also, since the mass matrix isn't singular, we're starting the integration with Tsit5 which seems to cause the time-stepping to be worse. In short, we seem to be making basically every wrong choice possible for this problem (other than using FBDF, plain FBDF is doing great here).
julia> @btime solve(oprob, AutoTsit5(FBDF(linsolve=LinearSolve.KrylovJL(), autodiff=AutoFiniteDiff())));
2.365 s (304006 allocations: 145.28 MiB)
julia> @btime solve(oprob, FBDF(linsolve=LinearSolve.KrylovJL(), autodiff=AutoFiniteDiff()));
411.827 ms (49606 allocations: 19.18 MiB)
julia> @btime solve(oprob, FBDF(linsolve=LinearSolve.KrylovJL()));
244.829 ms (35446 allocations: 18.30 MiB)
julia> @btime solve(oprob, FBDF());
56.068 ms (1379 allocations: 7.25 MiB)
here are the associated stats:
julia> solve(oprob, AutoTsit5(FBDF(linsolve=LinearSolve.KrylovJL(), autodiff=AutoFiniteDiff()))).stats
SciMLBase.DEStats
Number of function 1 evaluations: 78552
Number of function 2 evaluations: 0
Number of W matrix evaluations: 155
Number of linear solves: 336
Number of Jacobians created: 0
Number of nonlinear solver iterations: 336
Number of nonlinear solver convergence failures: 0
Number of fixed-point solver iterations: 0
Number of fixed-point solver convergence failures: 0
Number of rootfind condition calls: 0
Number of accepted steps: 129
Number of rejected steps: 48
julia> solve(oprob, FBDF(linsolve=LinearSolve.KrylovJL(), autodiff=AutoFiniteDiff())).stats
SciMLBase.DEStats
Number of function 1 evaluations: 13510
Number of function 2 evaluations: 0
Number of W matrix evaluations: 27
Number of linear solves: 64
Number of Jacobians created: 0
Number of nonlinear solver iterations: 64
Number of nonlinear solver convergence failures: 0
Number of fixed-point solver iterations: 0
Number of fixed-point solver convergence failures: 0
Number of rootfind condition calls: 0
Number of accepted steps: 59
Number of rejected steps: 0
julia> solve(oprob, FBDF(linsolve=LinearSolve.KrylovJL())).stats
SciMLBase.DEStats
Number of function 1 evaluations: 6633
Number of function 2 evaluations: 0
Number of W matrix evaluations: 28
Number of linear solves: 69
Number of Jacobians created: 0
Number of nonlinear solver iterations: 69
Number of nonlinear solver convergence failures: 0
Number of fixed-point solver iterations: 0
Number of fixed-point solver convergence failures: 0
Number of rootfind condition calls: 0
Number of accepted steps: 59
Number of rejected steps: 1
julia> solve(oprob, FBDF()).stats
SciMLBase.DEStats
Number of function 1 evaluations: 238
Number of function 2 evaluations: 0
Number of W matrix evaluations: 27
Number of linear solves: 120
Number of Jacobians created: 1
Number of nonlinear solver iterations: 120
Number of nonlinear solver convergence failures: 0
Number of fixed-point solver iterations: 0
Number of fixed-point solver convergence failures: 0
Number of rootfind condition calls: 0
Number of accepted steps: 61
Number of rejected steps: 1
We can bump the cutoff to using Krylov methods to be higher. Since it's not preconditioned that would help in some cases.
Also @TorkelE, could Catalyst start providing jacobian sparsity? It's a free 2x speedup.
julia> using SparseConnectivityTracer
julia> detector = TracerSparsityDetector()
TracerSparsityDetector()
julia> du0 = copy(u0);
julia> jac_sparsity = ADTypes.jacobian_sparsity(
(du, u) -> oprob.f(du, u, oprob.p, 0.0), du0, u0, detector);
julia> ofun = ODEFunction(oprob.f.f; sparsity=float.(jac_sparsity));
julia> spoprob = ODEProblem(ofun, oprob.u0, oprob.tspan, oprob.p);
julia> @btime solve(spoprob, FBDF());
22.222 ms (2947 allocations: 16.89 MiB)
julia> @btime solve(spoprob, FBDF(linsolve=LinearSolve.KLUFactorization()));
7.483 ms (1780 allocations: 2.76 MiB)
It does seem like we're too small for the iterative solvers to help even with preconditioning. The best I've gotten with them is 36ms (or ~250ms without preconditioners). Shall we bump the krylov threshold to 2000?
yeah bump it up.
Also @TorkelE, could Catalyst start providing jacobian sparsity?
It should just always do that IMO
at least for anything sufficiently large.
Oh, part of what's happening here is really subtle and really sucks. FBDF is a pretty bad algorithm to use with auto-switching because it makes it so the cost of a failed switch to the nonstiff integrator is starting over from order 1. e.g. on the dense problem, we see that AutoTsit5(FBDF()) is ~2x slower than AutoTsit5(Rodas5P()) even though FBDF is ~4x faster than Rodas5P
julia> @btime solve(oprob, AutoTsit5(Rodas5P()));
144.011 ms (3354 allocations: 7.24 MiB)
julia> @btime solve(oprob, AutoTsit5(FBDF()));
337.748 ms (2689 allocations: 8.48 MiB)
julia> @btime solve(oprob, Rodas5P());
172.873 ms (3717 allocations: 6.96 MiB)
julia> @btime solve(oprob, FBDF());
83.403 ms (1377 allocations: 7.25 MiB)
Oh interesting, we should make it so if you switch but never actually step that you can keep history alive
also @jClugstor in the debug info, we really should be logging when we try to switch algs and whether we're successful in doing so.
unfortunately, it might not be quite this easy. By the time we fail the check a couple times, we've trashed our dt value. In order to make this work, we would probably have to significantly rework the auto-switch logic to do a dry run of the nonstiff alg before switching.
I also wonder if we want to use Rodas5P for somewhat bigger sizes if we have a sparse jacobian provided. Since all the cost of rosenbrock methods is in the factorization, so if we have a sparse matrix, we probably can stay with rosenbrock for longer.
Well sparse matrices will be slow for things like 100x100 lu factorization. So you don't want to do it if too small. But yes, if in that large enough range, if you have a sparse matrix then direct methods will be relatively better than Krylov methods, which ignore that, and so that will increase the cutoff point that we should want.
what I'm saying is that since BDF methods exchange lower order for fewer factorizations, if we have a sparse jacobian, it might make sense to avoid going to FBDF until a larger size than we normally would. Also, as a temporary patch to the AutoSwitch/FBDF friction, how would you feel about bumping our threshold for FBDF methods from 50 to 100?
That's fine. I know some cases where that will be slower, so I think overall it's a wash.
yeah. Part of the issue we're having here is that this problem is just on the edge of stiff. the eigenvalue is only ~1e3, so Tsit5 is actually able to solve this problem (although it takes 2s to do so).
Also @TorkelE, could Catalyst start providing jacobian sparsity?
I am all for a 2x speed-up. This isn't something that MTK can/should determine automatically though? I.e. Catalyst jus tegenrates normal ODESystem and sends to the ODEProblem, so unless there is something special about Catalyst system's where we want to use this option.
Or you mean that it might make sense for these FSPSystems though, I imagine they have a very specific sparse structure to use? That should be fairly straightforward to add to the ODEProblem(::FSPSystem, ...) call.
Oh right, MTK should do this if you do sparseprob = ODEProblem(sys,... jac = true, sparse = true)
Only if the size is sufficiently large. Since a smaller sparse matrix would be slower.
And it needs to be "sparse enough"