BoundaryValueDiffEq.jl Add Lobatto IIIa-c and Radau IIa solvers

Features

17 new Fully Implicit Runge-Kutta (FIRK) solvers: Lobatto IIIa-c 2 to 5 stage versions, and Radau IIa 1,2,3,5,7 - stage versions
Choice of cache with nested nonlinear solve (better performance for large problems) or expanded cache (seems to be more robust)
Adaptivity and interpolations
Sparse Jacobians

Issues with the code

Tests pass for the expanded cache, except for in some cases with the convergence tests for the higher order solvers, I think that the reason for this is that the error is low in absolute terms. I've done a check where the tests are marked as broken if the solution error is less than 1e-12

Also there is a runtime dispatch error when calling the JET tests. I have commented the lines in src/solve/firk.jl where the error occurs. I couldn't figure out how to resolve them.

For the nested cache, in addition to the tests that the expanded one doesn't pass, it also seem to have poor performance on NLLNS and Vector of vector initials tests. I haven't found a solver for which the nested ones converges there. For now these tests are commented out.

Also, the nested cache doesn't support ForwardDiff yet.

Checklist

[x ] Appropriate tests were added
[x ] Any code changes were done in a way that does not break public API
[x ] All documentation related to code changes were updated
[ ] The new code follows the contributor guidelines, in particular the SciML Style Guide and COLPRAC. (I commented out tests for nested cache as a WIP thing)
[ ] Any new documentation only uses public API (I don't understand what this means :( )

Mar 31 '24 04:03 axla-io

Tests pass for the expanded cache, except for in some cases with the convergence tests for the higher order solvers, I think that the reason for this is that the error is low in absolute terms. I've done a check where the tests are marked as broken if the solution error is less than 1e-12

Can you plot what those look like? for higher order you likely need to increase the dts

Mar 31 '24 19:03 ChrisRackauckas

Also let's merge the tests for MIRK and the other methods, except Shooting. The only reason Shooting is separate, is that the other methods currently don't support interpolation in the solution.

Mar 31 '24 21:03 avik-pal

Also let's merge the tests for MIRK and the other methods, except Shooting. The only reason Shooting is separate, is that the other methods currently don't support interpolation in the solution.

@avik-pal didn't the MIRK solvers already incorporate interpolations in https://github.com/SciML/BoundaryValueDiffEq.jl/pull/112?

Apr 01 '24 02:04 ErikQQY

No I meant the interpolation in the residual function

Apr 01 '24 02:04 avik-pal

No I meant the interpolation in the residual function

Oh, indeed, there are no interpolations in the residual functions for MIRK methods right now.

Apr 01 '24 02:04 ErikQQY

@ChrisRackauckas This is how it looks like. I tried increasing / decreasing dts with LobattoIIIb stage 5, but to no avail. lobattoIIIa_p3 lobattoIIIa_p7 lobattoIIIb_p3 lobattoIIIb_p7 lobattoIIIc_p3 lobattoIIIc_p7 radau_p3 radau_p7

Apr 01 '24 03:04 axla-io

That looks like the opposite direction that the convergence plot recipe normally does. Is this not just plot(sim)?

Apr 01 '24 11:04 ChrisRackauckas

@ChrisRackauckas Oops, I wasn't aware of that one. Here are the highest order solvers plotted with plot(sim) lobattoIIIa_p3 lobattoIIIa_p7 lobattoIIIb_p3 lobattoIIIb_p7 lobattoIIIc_p3 lobattoIIIc_p7 radau_p3 radau_p7

Apr 04 '24 04:04 axla-io

It's just below Float64 tolerance. Can you change it to BigFloat in state and time and calculate the convergence with arbitrary precision?

Apr 07 '24 16:04 ChrisRackauckas

@ChrisRackauckas I get an ERROR: UndefRefError: access to undefined reference when using BigFloat for these solvers, and also MIRK. It seems to come from the autodiff and the error is there both for sparse and dense AD, both forward mode and finitediff

Apr 25 '24 02:04 axla-io

BigFloat semantics are somewhat incompatible with how some of the solvers are written.

https://github.com/SciML/BoundaryValueDiffEq.jl/issues/43#issuecomment-2016859268

Apr 25 '24 03:04 avik-pal

BigFloat semantics are somewhat incompatible with how some of the solvers are written.

This was fixed upstream; forgot to mention it here.

Jul 23 '24 04:07 avik-pal

Benchmark Results

	master	f8f5b3579e47df...	master/f8f5b3579e47df...
Simple Pendulum/IIP/BoundaryValueDiffEq.MIRK2()	6.48 ± 0.29 ms	6.46 ± 0.21 ms	1
Simple Pendulum/IIP/BoundaryValueDiffEq.MIRK3()	2.34 ± 0.086 ms	2.31 ± 0.084 ms	1.01
Simple Pendulum/IIP/BoundaryValueDiffEq.MIRK4()	0.813 ± 0.032 ms	0.808 ± 0.03 ms	1.01
Simple Pendulum/IIP/BoundaryValueDiffEq.MIRK5()	0.861 ± 0.039 ms	0.856 ± 0.037 ms	1.01
Simple Pendulum/IIP/BoundaryValueDiffEq.MIRK6()	0.973 ± 0.034 ms	0.973 ± 0.033 ms	1
Simple Pendulum/IIP/MultipleShooting(10, Tsit5; grid_coarsening = false)	1.35 ± 0.11 ms	1.32 ± 0.097 ms	1.02
Simple Pendulum/IIP/MultipleShooting(10, Tsit5; grid_coarsening = true)	2.61 ± 0.2 ms	2.58 ± 0.19 ms	1.01
Simple Pendulum/IIP/MultipleShooting(100, Tsit5; grid_coarsening = false)	0.0353 ± 0.0058 s	0.0353 ± 0.0054 s	1
Simple Pendulum/IIP/MultipleShooting(100, Tsit5; grid_coarsening = true)	0.114 ± 0.0036 s	0.115 ± 0.0038 s	0.991
Simple Pendulum/IIP/Shooting(Tsit5())	0.208 ± 0.013 ms	0.206 ± 0.006 ms	1.01
Simple Pendulum/OOP/BoundaryValueDiffEq.MIRK2()	0.0413 ± 0.0032 s	0.04 ± 0.00095 s	1.03
Simple Pendulum/OOP/BoundaryValueDiffEq.MIRK3()	11.9 ± 0.18 ms	11.5 ± 0.16 ms	1.04
Simple Pendulum/OOP/BoundaryValueDiffEq.MIRK4()	3.45 ± 0.074 ms	3.33 ± 0.07 ms	1.04
Simple Pendulum/OOP/BoundaryValueDiffEq.MIRK5()	3.55 ± 0.077 ms	3.4 ± 0.073 ms	1.04
Simple Pendulum/OOP/BoundaryValueDiffEq.MIRK6()	3.65 ± 0.089 ms	3.56 ± 0.1 ms	1.02
Simple Pendulum/OOP/MultipleShooting(10, Tsit5; grid_coarsening = false)	3.28 ± 0.87 ms	3.31 ± 0.9 ms	0.991
Simple Pendulum/OOP/MultipleShooting(10, Tsit5; grid_coarsening = true)	5.88 ± 1.3 ms	5.91 ± 1.2 ms	0.996
Simple Pendulum/OOP/MultipleShooting(100, Tsit5; grid_coarsening = false)	0.128 ± 0.012 s	0.126 ± 0.01 s	1.02
Simple Pendulum/OOP/MultipleShooting(100, Tsit5; grid_coarsening = true)	0.39 ± 0.0073 s	0.384 ± 0.007 s	1.01
Simple Pendulum/OOP/Shooting(Tsit5())	0.753 ± 0.044 ms	0.759 ± 0.039 ms	0.992
time_to_load	6.01 ± 0.011 s	6.01 ± 0.0097 s	1

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

Jul 23 '24 17:07 github-actions[bot]

Current implementation passes basic tests for a BVP solver, e.g. discrete solving, adaptivity, interpolations, and plotting are all good now.

Current issues of this PR:

BigFloat semantics(I tried to solve the BigFloat issue for both MIRK and FIRK by following the patch in NonlinearSolve.jl, but seemed the current diffcache used in doesn't compatible with bigfloat, need to do some further searching)
For some large systems, such as the overconstrained and underconstrained tests in nonlinear least square tests fail

@ChrisRackauckas @avik-pal, could you review this PR and see what else is still missing to get this PR merged? I will try to address these issues

Jul 24 '24 17:07 ErikQQY

BigFloat semantics(I tried to solve the BigFloat issue for both MIRK and FIRK by following the patch in NonlinearSolve.jl, but seemed the current diffcache used in doesn't compatible with bigfloat, need to do some further searching)

What is the error?

Jul 27 '24 03:07 avik-pal

How about switching to a lazy buffer cache to get this completed?

Aug 10 '24 19:08 ChrisRackauckas

How about switching to a lazy buffer cache to get this completed?

I have tried both LazyBufferCache and FixedSizeDiffCache, but none of them seems to support ForwardDiff AD when we are using BigFloat in BoundaryValueDiffEq.jl, for example the LazyBufferCache version https://github.com/SciML/PreallocationTools.jl/issues/72 still fail.

Aug 11 '24 09:08 ErikQQY

The current issue is that without using DiffCache, it is hard to build such Dual number with BigFloat type to support ForwardDiff AD things. I don't think changing to LazyBufferCache or GeneralLazyBufferCache can solve this.

Aug 11 '24 14:08 ErikQQY

The Lobatto and Radau solvers without nested nonlinear solving are working fine now, but those using nested nonlinear solving behave bad and don't converge on several test problems.

Maybe we can get those Lobatto and Radau without nested nonlinear solving in first?

Aug 22 '24 15:08 ErikQQY

How come that doesn't show up as a test failure?

Aug 22 '24 16:08 ChrisRackauckas

Those failed tests are commented out, the current CI only test FIRK solvers without nested nonlinear solving, but from local testing, the nested nonlinear solving choice only passes the basic test cases like affineness and linear problem tests.

Aug 22 '24 16:08 ErikQQY

I believe @axla-io had said nested had issues earlier. Let's get this to merge where nested throws a warning where it's still experimental and its tests are properly set to @test_broken.

Aug 25 '24 11:08 ChrisRackauckas

The ForwardDiff issue have been fixed and the nested nonlinear solving FIRK methods are working fine now, I also cleaned the PR to be compatible with the current style. So this PR is ready finally!!!🎉🎉🎉🎉

Now the only problem we have is that the tests take a loooong time to complete😅, are we going to move them to buildkite or decrease the number of the testing solvers(NLLS testing cases test over 100 combination of FIRK solver with different NonlinearSolve solvers, which is the most time-consuming)?

Aug 25 '24 16:08 ErikQQY

Parts of the code need some work for performance. But let's not worry about that for now.

Aug 25 '24 21:08 ChrisRackauckas

Fix the deprecations and the @test_brokens and this is then good to merge.

Aug 25 '24 21:08 ChrisRackauckas

Hit a time limit? We really need to split this into different test groups.

Aug 28 '24 16:08 ChrisRackauckas

Yeah, I will split them into different test groups

Aug 28 '24 16:08 ErikQQY

Is the Retest parallelism actually helpful? BLAS will multithread, so we may be slamming the threads on any large enough problem. I don't think that part makes sense in the BVP library, or ML @avik-pal ?

Aug 29 '24 14:08 ChrisRackauckas

Is the Retest parallelism actually helpful? BLAS will multithread, so we may be slamming the threads on any large enough problem. I don't think that part makes sense in the BVP library, or ML @avik-pal ?

Are any of the tests here heavy BLAS? In the ML context, none of the Lux tests are heavy enough to cause issues with oversubscribing, and it helps bring down the build times to about 25 mins (compared to > 1hr) on decent machines (the cuda ones on buildkite). Not the most helpful for github actions.

That said for testing heavy workloads (both compute or memory), I do disable parallelism in testing. I still keep ReTestItems because it lets you isolate the testitem that fails and for testing Enzyme which can segfault/assert crash julia it will just crash the worker, so the test shows up as failed rather than the main process crashing.

Aug 29 '24 15:08 avik-pal

The LU/QR will all multithread above 200x200, which I presume we must be hitting?

Aug 29 '24 15:08 ChrisRackauckas

BoundaryValueDiffEq.jl BoundaryValueDiffEq.jl copied to clipboard

Add Lobatto IIIa-c and Radau IIa solvers

Features

Issues with the code

Checklist

Benchmark Results

Benchmark Plots

BoundaryValueDiffEq.jl
BoundaryValueDiffEq.jl copied to clipboard