OrdinaryDiffEq.jl icon indicating copy to clipboard operation
OrdinaryDiffEq.jl copied to clipboard

10x - 100x speed up when using own implementation of Implicit Euler or Implicit Midpoint on linear test problem

Open cwittens opened this issue 10 months ago • 13 comments

Describe the bug 🐞

When using my own implementation of Implicit Euler or Implicit Midpoint (definition below), I get a 10x - 100x speed up on the linear test problem (du, u, p, t) -> du .= p .* u when using FunctionOperator in the ODEProblem definition. (and also when solving Parabolic PDEs as discovered by @ranocha , but this is not shown in this MRE).

Minimal Reproducible Example 👇

using Pkg
Pkg.activate(".")
Pkg.add(["OrdinaryDiffEq", "SciMLOperators", "LinearSolve",  "BenchmarkTools", "LinearAlgebra"])
using OrdinaryDiffEq, SciMLOperators, LinearSolve, BenchmarkTools, LinearAlgebra

# function definition
function my_implicit_euler(ode, dt, solver=NaN, abstol=NaN, reltol=NaN)
	u = copy(ode.u0)
	prob = LinearProblem(I - dt * ode.f.f, vec(zero(u)), abstol = abstol, reltol = reltol)
	linsolve = init(prob, solver)
	t = ode.tspan[begin]
	while t < ode.tspan[end]
		t += dt
		copyto!(linsolve.b, vec(u))
		solve!(linsolve)
		copyto!(vec(u), linsolve.u)
	end

	return (; u = [copy(ode.u0), u], prob = ode)
end

# setup the ODE, once using FunctionOperator and once without
linear! = (du, u, p, t) -> du .= p .* u

function setup_ode_Operator(N)
    tspan = (0.0, 5.0)
    u0 = ones(N)
    parameters = 1.01
    linear_operator = FunctionOperator(linear!, u0; p = parameters) 

    ode = ODEProblem(linear_operator, u0, tspan, parameters)
end

function setup_ode(N)
    tspan = (0.0, 5.0)
    u0 = ones(N)
    parameters = 1.01

    ode = ODEProblem(linear!, u0, tspan, parameters)
end

dt = 0.001
prob_op = setup_ode_Operator(100);
prob = setup_ode(100);

# my_implicit_euler roughly 100x faster then ImplicitEuler()
@btime sol1 = my_implicit_euler(prob_op, dt); 
@btime sol2 = solve(prob_op, ImplicitEuler(), dt=dt, adaptive = false, save_everystep = false);
@btime sol3 = solve(prob, ImplicitEuler(), dt=dt, adaptive = false, save_everystep = false);
# the operator version and the normal version of the ODE give almost the same results using OrdinaryDiffEq
# note that my_implicit_euler(prob, dt) throws an error
# "MethodError: no method matching *(::Float64, ::var"#13#14")"

# sol1.u[2] ≈ sol2.u[2] gives true

# using not the default linear solvers my_implicit_euler still 10x faster
solver = KrylovJL_GMRES();
@btime sol4 = my_implicit_euler(prob_op, dt, solver);
@btime sol5 = solve(prob_op, ImplicitEuler(linsolve = solver), dt=dt, adaptive = false, save_everystep = false);
@btime sol6 = solve(prob, ImplicitEuler(linsolve = solver), dt=dt, adaptive = false, save_everystep = false);

Output:

  3.452 ms (5237 allocations: 364.78 KiB)

  242.541 ms (201 allocations: 289.88 KiB)

  179.656 ms (364 allocations: 278.62 KiB)


  3.467 ms (5149 allocations: 423.23 KiB)

  29.191 ms (385231 allocations: 123.56 MiB)

  29.418 ms (390224 allocations: 123.78 MiB)

And the results are pretty much the same for my_implicit_midpoint and ImplicitMidpoint.


function my_implicit_midpoint(ode, dt, solver=NaN, abstol=NaN, reltol=NaN)
	u = copy(ode.u0)
	utmp = zero(u)
	dt_2 = dt / 2
	prob = LinearProblem(I - dt_2 * ode.f.f, vec(zero(u)), abstol = abstol, reltol = reltol)
	linsolve = init(prob, solver)
	t = ode.tspan[begin]
	while t < ode.tspan[end]
		t += dt
		ode.f.f(utmp, u, ode.p, t)
		@. utmp = u + dt_2 * utmp
		copyto!(linsolve.b, vec(utmp))
		solve!(linsolve)
		copyto!(vec(u), linsolve.u)
	end

	return (; u = [copy(ode.u0), u], prob = ode)
end
# 100x speedup
@btime sol7 = my_implicit_midpoint(prob_op, dt);
@btime sol8 = solve(prob_op, ImplicitMidpoint(), dt=dt, adaptive = false, save_everystep = false);
@btime sol9 = solve(prob, ImplicitMidpoint(), dt=dt, adaptive = false, save_everystep = false);

# sol7.u[2] ≈ sol8.u[2] gives true

# using not default solvers still 10x speedup
solver = KrylovJL_GMRES();
@btime sol10 = my_implicit_midpoint(prob_op, dt, solver);
@btime sol11 = solve(prob_op, ImplicitMidpoint(linsolve = solver), dt=dt, adaptive = false, save_everystep = false);
@btime sol12 = solve(prob, ImplicitMidpoint(linsolve = solver), dt=dt, adaptive = false, save_everystep = false);

Output:

  3.877 ms (5239 allocations: 365.69 KiB)

  207.945 ms (188 allocations: 287.72 KiB)

  193.269 ms (351 allocations: 276.55 KiB)


  3.805 ms (5151 allocations: 424.14 KiB)

  33.241 ms (425186 allocations: 125.38 MiB)

  35.551 ms (430179 allocations: 125.61 MiB)

Obviously the speed up depends on the system size.

Environment (please complete the following information):

  • Output of using Pkg; Pkg.status()
julia> Pkg.status()
Status `..\ImplicitEuler_test\Project.toml`
  [6e4b80f9] BenchmarkTools v1.6.0
  [7ed4a6bd] LinearSolve v2.38.0
  [1dea7af3] OrdinaryDiffEq v6.90.1
  [c0aeaf25] SciMLOperators v0.3.12
  [37e2e46d] LinearAlgebra v1.11.0
  • Output of using Pkg; Pkg.status(; mode = PKGMODE_MANIFEST)
julia> Pkg.status(; mode = PKGMODE_MANIFEST)
Status ..\ImplicitEuler_test\Manifest.toml`
  [47edcb42] ADTypes v1.12.1
  [7d9f7c33] Accessors v0.1.41
  [79e6a3ab] Adapt v4.1.1
  [ec485272] ArnoldiMethod v0.4.0
  [4fba245c] ArrayInterface v7.18.0
  [4c555306] ArrayLayouts v1.11.0
  [6e4b80f9] BenchmarkTools v1.6.0
  [62783981] BitTwiddlingConvenienceFunctions v0.1.6
  [70df07ce] BracketingNonlinearSolve v1.1.0
  [2a0fbf3d] CPUSummary v0.2.6
  [d360d2e6] ChainRulesCore v1.25.1
  [fb6a15b2] CloseOpenIntervals v0.1.13
  [38540f10] CommonSolve v0.2.4
  [bbf7d656] CommonSubexpressions v0.3.1
  [f70d9fcc] CommonWorldInvalidations v1.0.0
  [34da2185] Compat v4.16.0
  [a33af91c] CompositionsBase v0.1.2
  [2569d6c7] ConcreteStructs v0.2.3
  [187b0558] ConstructionBase v1.5.8
  [adafc99b] CpuId v0.3.1
  [9a962f9c] DataAPI v1.16.0
  [864edb3b] DataStructures v0.18.20
  [e2d170a0] DataValueInterfaces v1.0.0
  [2b5f629d] DiffEqBase v6.161.0
  [163ba53b] DiffResults v1.1.0
  [b552c78f] DiffRules v1.15.1
  [a0c0ee7d] DifferentiationInterface v0.6.39
  [ffbed154] DocStringExtensions v0.9.3
  [4e289a0a] EnumX v1.0.4
  [f151be2c] EnzymeCore v0.8.8
  [d4d017d3] ExponentialUtilities v1.27.0
  [e2ba6199] ExprTools v0.1.10
⌅ [6b7a57c9] Expronicon v0.8.5
  [7034ab61] FastBroadcast v0.3.5
  [9aa1b823] FastClosures v0.3.2
  [442a2c76] FastGaussQuadrature v1.0.2
  [29a986be] FastLapackInterface v2.0.4
  [a4df4552] FastPower v1.1.1
  [1a297f60] FillArrays v1.13.0
  [6a86dc24] FiniteDiff v2.27.0
  [f6369f11] ForwardDiff v0.10.38
  [069b7b12] FunctionWrappers v1.1.3
  [77dc65aa] FunctionWrappersWrappers v0.1.3
  [46192b85] GPUArraysCore v0.2.0
  [c145ed77] GenericSchur v0.5.4
  [86223c79] Graphs v1.12.0
  [3e5b6fbb] HostCPUFeatures v0.1.17
  [615f187c] IfElse v0.1.1
  [d25df0c9] Inflate v0.1.5
  [3587e190] InverseFunctions v0.1.17
  [92d709cd] IrrationalConstants v0.2.4
  [82899510] IteratorInterfaceExtensions v1.0.0
  [692b3bcd] JLLWrappers v1.7.0
  [682c06a0] JSON v0.21.4
  [ef3ab10e] KLU v0.6.0
  [ba0b0d4f] Krylov v0.9.9
  [10f19ff3] LayoutPointers v0.1.17
  [5078a376] LazyArrays v2.4.0
  [87fe0de2] LineSearch v0.1.4
  [d3d80556] LineSearches v7.3.0
  [7ed4a6bd] LinearSolve v2.38.0
  [2ab3a3ac] LogExpFunctions v0.3.29
  [bdcacae8] LoopVectorization v0.12.171
  [d8e11817] MLStyle v0.4.17
  [1914dd2f] MacroTools v0.5.15
  [d125e4d3] ManualMemory v0.1.8
  [bb5d69b7] MaybeInplace v0.1.4
  [46d2c3a1] MuladdMacro v0.2.4
  [d41bc354] NLSolversBase v7.8.3
  [77ba4419] NaNMath v1.1.2
  [8913a72c] NonlinearSolve v4.3.0
  [be0214bd] NonlinearSolveBase v1.4.0
  [5959db7a] NonlinearSolveFirstOrder v1.2.0
  [9a2c21bd] NonlinearSolveQuasiNewton v1.1.0
  [26075421] NonlinearSolveSpectralMethods v1.1.0
  [6fe1bfb0] OffsetArrays v1.15.0
  [bac558e1] OrderedCollections v1.8.0
  [1dea7af3] OrdinaryDiffEq v6.90.1
  [89bda076] OrdinaryDiffEqAdamsBashforthMoulton v1.1.0
  [6ad6398a] OrdinaryDiffEqBDF v1.2.0
  [bbf590c4] OrdinaryDiffEqCore v1.15.1
  [50262376] OrdinaryDiffEqDefault v1.2.0
  [4302a76b] OrdinaryDiffEqDifferentiation v1.3.0
  [9286f039] OrdinaryDiffEqExplicitRK v1.1.0
  [e0540318] OrdinaryDiffEqExponentialRK v1.2.0
  [becaefa8] OrdinaryDiffEqExtrapolation v1.3.0
  [5960d6e9] OrdinaryDiffEqFIRK v1.6.0
  [101fe9f7] OrdinaryDiffEqFeagin v1.1.0
  [d3585ca7] OrdinaryDiffEqFunctionMap v1.1.1
  [d28bc4f8] OrdinaryDiffEqHighOrderRK v1.1.0
  [9f002381] OrdinaryDiffEqIMEXMultistep v1.2.0
  [521117fe] OrdinaryDiffEqLinear v1.1.0
  [1344f307] OrdinaryDiffEqLowOrderRK v1.2.0
  [b0944070] OrdinaryDiffEqLowStorageRK v1.2.1
  [127b3ac7] OrdinaryDiffEqNonlinearSolve v1.3.0
  [c9986a66] OrdinaryDiffEqNordsieck v1.1.0
  [5dd0a6cf] OrdinaryDiffEqPDIRK v1.2.0
  [5b33eab2] OrdinaryDiffEqPRK v1.1.0
  [04162be5] OrdinaryDiffEqQPRK v1.1.0
  [af6ede74] OrdinaryDiffEqRKN v1.1.0
  [43230ef6] OrdinaryDiffEqRosenbrock v1.4.0
  [2d112036] OrdinaryDiffEqSDIRK v1.2.0
  [669c94d9] OrdinaryDiffEqSSPRK v1.2.0
  [e3e12d00] OrdinaryDiffEqStabilizedIRK v1.2.0
  [358294b1] OrdinaryDiffEqStabilizedRK v1.1.0
  [fa646aed] OrdinaryDiffEqSymplecticRK v1.1.0
  [b1df2697] OrdinaryDiffEqTsit5 v1.1.0
  [79d7bb75] OrdinaryDiffEqVerner v1.1.1
  [65ce6f38] PackageExtensionCompat v1.0.2
  [d96e819e] Parameters v0.12.3
  [69de0a69] Parsers v2.8.1
  [f517fe37] Polyester v0.7.16
  [1d0040c9] PolyesterWeave v0.2.2
  [d236fae5] PreallocationTools v0.4.24
  [aea7be01] PrecompileTools v1.2.1
  [21216c6a] Preferences v1.4.3
  [3cdcf5f2] RecipesBase v1.3.4
  [731186ca] RecursiveArrayTools v3.28.0
  [f2c3362d] RecursiveFactorization v0.2.23
  [189a3867] Reexport v1.2.2
  [ae029012] Requires v1.3.0
  [7e49a35a] RuntimeGeneratedFunctions v0.5.13
  [94e857df] SIMDTypes v0.1.0
  [476501e8] SLEEFPirates v0.6.43
  [0bca4576] SciMLBase v2.72.2
  [19f34311] SciMLJacobianOperators v0.1.1
  [c0aeaf25] SciMLOperators v0.3.12
  [53ae85a6] SciMLStructures v1.6.1
  [efcf1570] Setfield v1.1.1
  [727e6d20] SimpleNonlinearSolve v2.1.0
  [699a6c99] SimpleTraits v0.9.4
  [ce78b400] SimpleUnPack v1.1.0
  [47a9eef4] SparseDiffTools v2.23.1
  [0a514795] SparseMatrixColorings v0.4.12
  [e56a9233] Sparspak v0.3.9
  [276daf66] SpecialFunctions v2.5.0
  [aedffcd0] Static v1.1.1
  [0d7ed370] StaticArrayInterface v1.8.0
  [90137ffa] StaticArrays v1.9.11
  [1e83bf80] StaticArraysCore v1.4.3
  [10745b16] Statistics v1.11.1
  [7792a7ef] StrideArraysCore v0.5.7
  [2efcf032] SymbolicIndexingInterface v0.3.37
  [3783bdb8] TableTraits v1.0.1
  [bd369af6] Tables v1.12.0
  [8290d209] ThreadingUtilities v0.5.2
  [a759f4b9] TimerOutputs v0.5.26
  [d5829a12] TriangularSolve v0.2.1
  [781d530d] TruncatedStacktraces v1.4.0
  [3a884ed6] UnPack v1.0.2
  [3d5dd08c] VectorizationBase v0.21.71
  [19fa3120] VertexSafeGraphs v0.2.0
  [1d5cc7b8] IntelOpenMP_jll v2025.0.4+0
  [856f044c] MKL_jll v2025.0.1+1
  [efe28fd5] OpenSpecFun_jll v0.5.6+0
  [1317d2d5] oneTBB_jll v2021.12.0+0
  [0dad84c5] ArgTools v1.1.2
  [56f22d72] Artifacts v1.11.0
  [2a0f44e3] Base64 v1.11.0
  [ade2ca70] Dates v1.11.0
  [8ba89e20] Distributed v1.11.0
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching v1.11.0
  [9fa8497b] Future v1.11.0
  [b77e0a4c] InteractiveUtils v1.11.0
  [4af54fe1] LazyArtifacts v1.11.0
  [b27032c2] LibCURL v0.6.4
  [76f85450] LibGit2 v1.11.0
  [8f399da3] Libdl v1.11.0
  [37e2e46d] LinearAlgebra v1.11.0
  [56ddb016] Logging v1.11.0
  [d6f4376e] Markdown v1.11.0
  [a63ad114] Mmap v1.11.0
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.11.0
  [de0858da] Printf v1.11.0
  [9abbd945] Profile v1.11.0
  [9a3f8284] Random v1.11.0
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization v1.11.0
  [1a1011a3] SharedArrays v1.11.0
  [6462fe0b] Sockets v1.11.0
  [2f01184e] SparseArrays v1.11.0
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [8dfed614] Test v1.11.0
  [cf7118a7] UUIDs v1.11.0
  [4ec0a83e] Unicode v1.11.0
  [e66e0078] CompilerSupportLibraries_jll v1.1.1+0
  [deac9b47] LibCURL_jll v8.6.0+0
  [e37daf67] LibGit2_jll v1.7.2+0
  [29816b5a] LibSSH2_jll v1.11.0+1
  [c8ffd9c3] MbedTLS_jll v2.28.6+0
  [14a3606d] MozillaCACerts_jll v2023.12.12
  [4536629a] OpenBLAS_jll v0.3.27+1
  [05823500] OpenLibm_jll v0.8.1+2
  [bea87d4a] SuiteSparse_jll v7.7.0+0
  [83775a58] Zlib_jll v1.2.13+1
  [8e850b90] libblastrampoline_jll v5.11.0+0
  [8e850ede] nghttp2_jll v1.59.0+0
  [3f19e933] p7zip_jll v17.4.0+2
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`
  • Output of versioninfo()
julia> versioninfo()
Julia Version 1.11.3
Commit d63adeda50 (2025-01-21 19:42 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, tigerlake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code

Additional context

Add any other context about the problem here.

cwittens avatar Feb 04 '25 17:02 cwittens

I assume this is because my_implicit_euler() is able to reformulate $u^{n+1} = u^{n} + \Delta t f(u^{n+1})$ into $(I - \Delta t f) u^{n+1} = u^{n}$ using FunctionOperator to make $(I - \Delta t f)$ behave as one Matrix/Operator, where as this is not possible when f(u) is just a normal function.

cwittens avatar Feb 04 '25 18:02 cwittens

If you check sol.stats, is it refactorizing? My guess is that the "just let it be nonlinear" form is not recognizing that it can do the whole problem with 1 LU factorization and most of the time is just new LU facts.

ChrisRackauckas avatar Feb 06 '25 15:02 ChrisRackauckas

Note that there is also a significant difference with iterative solvers (code copied from the OP):

# using not the default linear solvers my_implicit_euler still 10x faster
solver = KrylovJL_GMRES();
@btime sol4 = my_implicit_euler(prob_op, dt, solver);
@btime sol5 = solve(prob_op, ImplicitEuler(linsolve = solver), dt=dt, adaptive = false, save_everystep = false);
@btime sol6 = solve(prob, ImplicitEuler(linsolve = solver), dt=dt, adaptive = false, save_everystep = false);

ranocha avatar Feb 06 '25 18:02 ranocha

If you check sol.stats, is it refactorizing? My guess is that the "just let it be nonlinear" form is not recognizing that it can do the whole problem with 1 LU factorization and most of the time is just new LU facts.

julia> sol2.stats
SciMLBase.DEStats
Number of function 1 evaluations:                  510005
Number of function 2 evaluations:                  0
Number of W matrix evaluations:                    5000
Number of linear solves:                           5004
Number of Jacobians created:                       5000
Number of nonlinear solver iterations:             5004
Number of nonlinear solver convergence failures:   0
Number of fixed-point solver iterations:                     0
Number of fixed-point solver convergence failures:           0
Number of rootfind condition calls:                0
Number of accepted steps:                          5000
Number of rejected steps:                          0

cwittens avatar Feb 08 '25 16:02 cwittens

Oh adaptive=false doesn't have a mechanism for rejections and storing factorizations.

ChrisRackauckas avatar Feb 08 '25 16:02 ChrisRackauckas

This is now slowly beyond me expertise, but does this also explain the 10x difference when using indirect solvers like KrylovJL_GMRES()? Because there you dont have any factorizations to store as far is I know.

cwittens avatar Feb 08 '25 17:02 cwittens

Maybe I am wrong, but isn't the W matrix only needed when solving nonlinear systems, and could it be that considerable time is spent evaluating it even when solving linear problems?

julia> sol6 = solve(prob, ImplicitEuler(linsolve = KrylovJL_GMRES()), dt=0.01, adaptive = false, save_everystep = false);

julia> sol6.stats
SciMLBase.DEStats
Number of function 1 evaluations:                  1507
Number of function 2 evaluations:                  0
Number of W matrix evaluations:                    500
Number of linear solves:                           506
Number of Jacobians created:                       0
Number of nonlinear solver iterations:             506
Number of nonlinear solver convergence failures:   0
Number of fixed-point solver iterations:                     0
Number of fixed-point solver convergence failures:           0
Number of rootfind condition calls:                0
Number of accepted steps:                          500
Number of rejected steps:                          0

Or maybe it is just different default tolerances in LinearProblem() and in OrdinaryDiffEq, but here I am not quit convinced.

cwittens avatar Feb 08 '25 17:02 cwittens

You also need it for linear systems, but for a linear ODE you have that W is always the same, and so lu(W) is always the same. What's going on here is that W is constructed 500 times, and lu(W) is done 500 times, when you only need to do it once. Again, this is a biproduct of safety measures on adaptive=false, and we can improve that.

ChrisRackauckas avatar Feb 11 '25 05:02 ChrisRackauckas

But there is also a big difference when no LU decomposition is used, just an iterative solver.

ranocha avatar Feb 11 '25 09:02 ranocha

Can you share a profile for that case?

ChrisRackauckas avatar Feb 11 '25 11:02 ChrisRackauckas

This is the result of running

# profiling
solver = KrylovJL_GMRES();

# my implicit euler
@profview for i in 1:500 my_implicit_euler(prob_op, dt, solver); end


# ImplicitEuler
@profview for i in 1:500 solve(prob_op, ImplicitEuler(linsolve = solver), dt=dt, adaptive = false, save_everystep = false); end

https://github.com/cwittens/two_html_profiles

I couldn't upload .html files to this comment and pictures or just text makes to profiles not so nice to look at I figured, so I uploaded them to a repo.

cwittens avatar Feb 11 '25 18:02 cwittens

it's a difference in which krylov solver we're using. We're spending all the time here https://github.com/JuliaSmoothOptimizers/Krylov.jl/blob/f03bafbf3d7124fb7397493741a7420f39b82042/src/krylov_solvers.jl#L2500C5-L2500C15 which seems very odd. We appear to be making thousands of tiny arrays instead of a matrix or something like that...

oscardssmith avatar Feb 11 '25 18:02 oscardssmith

Note sure if this is relevant/related, but the same performance different can be seen using KrylovJL_BICGSTAB() for example.

And KrylovJL_CG() gives an warning when using ImplicitEuler(linsolve = KrylovJL_CG())

┌ Warning: cg! doesn't support right preconditioning.
└ @ LinearSolve C:\Users\colli\.julia\packages\LinearSolve\WDeMC\src\iterative_wrappers.jl:286
┌ Warning: cg! doesn't support right preconditioning.
└ @ LinearSolve C:\Users\colli\.julia\packages\LinearSolve\WDeMC\src\iterative_wrappers.jl:286
┌ Warning: cg! doesn't support right preconditioning.
└ @ LinearSolve C:\Users\colli\.julia\packages\LinearSolve\WDeMC\src\iterative_wrappers.jl:286
....

even when explicitly telling the FunctionOperator that it is posdef and symmetric FunctionOperator(linear!, u0; p = parameters, issquare = true, isposdef = true, issymmetric = true).

cwittens avatar Feb 11 '25 20:02 cwittens