BoundaryValueDiffEq.jl
BoundaryValueDiffEq.jl copied to clipboard
Add Lobatto IIIa-c and Radau IIa solvers
Features
- 17 new Fully Implicit Runge-Kutta (FIRK) solvers: Lobatto IIIa-c 2 to 5 stage versions, and Radau IIa 1,2,3,5,7 - stage versions
- Choice of cache with nested nonlinear solve (better performance for large problems) or expanded cache (seems to be more robust)
- Adaptivity and interpolations
- Sparse Jacobians
Issues with the code
Tests pass for the expanded cache, except for in some cases with the convergence tests for the higher order solvers, I think that the reason for this is that the error is low in absolute terms. I've done a check where the tests are marked as broken if the solution error is less than 1e-12
Also there is a runtime dispatch error when calling the JET tests. I have commented the lines in src/solve/firk.jl where the error occurs. I couldn't figure out how to resolve them.
For the nested cache, in addition to the tests that the expanded one doesn't pass, it also seem to have poor performance on NLLNS and Vector of vector initials tests. I haven't found a solver for which the nested ones converges there. For now these tests are commented out.
Also, the nested cache doesn't support ForwardDiff yet.
Checklist
- [x ] Appropriate tests were added
- [x ] Any code changes were done in a way that does not break public API
- [x ] All documentation related to code changes were updated
- [ ] The new code follows the contributor guidelines, in particular the SciML Style Guide and COLPRAC. (I commented out tests for nested cache as a WIP thing)
- [ ] Any new documentation only uses public API (I don't understand what this means :( )
Tests pass for the expanded cache, except for in some cases with the convergence tests for the higher order solvers, I think that the reason for this is that the error is low in absolute terms. I've done a check where the tests are marked as broken if the solution error is less than 1e-12
Can you plot what those look like? for higher order you likely need to increase the dts
Also let's merge the tests for MIRK and the other methods, except Shooting. The only reason Shooting is separate, is that the other methods currently don't support interpolation in the solution.
Also let's merge the tests for MIRK and the other methods, except Shooting. The only reason Shooting is separate, is that the other methods currently don't support interpolation in the solution.
@avik-pal didn't the MIRK solvers already incorporate interpolations in https://github.com/SciML/BoundaryValueDiffEq.jl/pull/112?
No I meant the interpolation in the residual function
No I meant the interpolation in the residual function
Oh, indeed, there are no interpolations in the residual functions for MIRK methods right now.
@ChrisRackauckas This is how it looks like. I tried increasing / decreasing dts with LobattoIIIb stage 5, but to no avail.
That looks like the opposite direction that the convergence plot recipe normally does. Is this not just plot(sim)?
@ChrisRackauckas Oops, I wasn't aware of that one. Here are the highest order solvers plotted with plot(sim)
It's just below Float64 tolerance. Can you change it to BigFloat in state and time and calculate the convergence with arbitrary precision?
@ChrisRackauckas I get an ERROR: UndefRefError: access to undefined reference when using BigFloat for these solvers, and also MIRK. It seems to come from the autodiff and the error is there both for sparse and dense AD, both forward mode and finitediff
BigFloat semantics are somewhat incompatible with how some of the solvers are written.
https://github.com/SciML/BoundaryValueDiffEq.jl/issues/43#issuecomment-2016859268
BigFloat semantics are somewhat incompatible with how some of the solvers are written.
This was fixed upstream; forgot to mention it here.
Benchmark Results
| master | f8f5b3579e47df... | master/f8f5b3579e47df... | |
|---|---|---|---|
| Simple Pendulum/IIP/BoundaryValueDiffEq.MIRK2() | 6.48 ± 0.29 ms | 6.46 ± 0.21 ms | 1 |
| Simple Pendulum/IIP/BoundaryValueDiffEq.MIRK3() | 2.34 ± 0.086 ms | 2.31 ± 0.084 ms | 1.01 |
| Simple Pendulum/IIP/BoundaryValueDiffEq.MIRK4() | 0.813 ± 0.032 ms | 0.808 ± 0.03 ms | 1.01 |
| Simple Pendulum/IIP/BoundaryValueDiffEq.MIRK5() | 0.861 ± 0.039 ms | 0.856 ± 0.037 ms | 1.01 |
| Simple Pendulum/IIP/BoundaryValueDiffEq.MIRK6() | 0.973 ± 0.034 ms | 0.973 ± 0.033 ms | 1 |
| Simple Pendulum/IIP/MultipleShooting(10, Tsit5; grid_coarsening = false) | 1.35 ± 0.11 ms | 1.32 ± 0.097 ms | 1.02 |
| Simple Pendulum/IIP/MultipleShooting(10, Tsit5; grid_coarsening = true) | 2.61 ± 0.2 ms | 2.58 ± 0.19 ms | 1.01 |
| Simple Pendulum/IIP/MultipleShooting(100, Tsit5; grid_coarsening = false) | 0.0353 ± 0.0058 s | 0.0353 ± 0.0054 s | 1 |
| Simple Pendulum/IIP/MultipleShooting(100, Tsit5; grid_coarsening = true) | 0.114 ± 0.0036 s | 0.115 ± 0.0038 s | 0.991 |
| Simple Pendulum/IIP/Shooting(Tsit5()) | 0.208 ± 0.013 ms | 0.206 ± 0.006 ms | 1.01 |
| Simple Pendulum/OOP/BoundaryValueDiffEq.MIRK2() | 0.0413 ± 0.0032 s | 0.04 ± 0.00095 s | 1.03 |
| Simple Pendulum/OOP/BoundaryValueDiffEq.MIRK3() | 11.9 ± 0.18 ms | 11.5 ± 0.16 ms | 1.04 |
| Simple Pendulum/OOP/BoundaryValueDiffEq.MIRK4() | 3.45 ± 0.074 ms | 3.33 ± 0.07 ms | 1.04 |
| Simple Pendulum/OOP/BoundaryValueDiffEq.MIRK5() | 3.55 ± 0.077 ms | 3.4 ± 0.073 ms | 1.04 |
| Simple Pendulum/OOP/BoundaryValueDiffEq.MIRK6() | 3.65 ± 0.089 ms | 3.56 ± 0.1 ms | 1.02 |
| Simple Pendulum/OOP/MultipleShooting(10, Tsit5; grid_coarsening = false) | 3.28 ± 0.87 ms | 3.31 ± 0.9 ms | 0.991 |
| Simple Pendulum/OOP/MultipleShooting(10, Tsit5; grid_coarsening = true) | 5.88 ± 1.3 ms | 5.91 ± 1.2 ms | 0.996 |
| Simple Pendulum/OOP/MultipleShooting(100, Tsit5; grid_coarsening = false) | 0.128 ± 0.012 s | 0.126 ± 0.01 s | 1.02 |
| Simple Pendulum/OOP/MultipleShooting(100, Tsit5; grid_coarsening = true) | 0.39 ± 0.0073 s | 0.384 ± 0.007 s | 1.01 |
| Simple Pendulum/OOP/Shooting(Tsit5()) | 0.753 ± 0.044 ms | 0.759 ± 0.039 ms | 0.992 |
| time_to_load | 6.01 ± 0.011 s | 6.01 ± 0.0097 s | 1 |
Benchmark Plots
A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).
Current implementation passes basic tests for a BVP solver, e.g. discrete solving, adaptivity, interpolations, and plotting are all good now.
Current issues of this PR:
- BigFloat semantics(I tried to solve the BigFloat issue for both MIRK and FIRK by following the patch in NonlinearSolve.jl, but seemed the current diffcache used in doesn't compatible with bigfloat, need to do some further searching)
- For some large systems, such as the overconstrained and underconstrained tests in nonlinear least square tests fail
@ChrisRackauckas @avik-pal, could you review this PR and see what else is still missing to get this PR merged? I will try to address these issues
BigFloat semantics(I tried to solve the BigFloat issue for both MIRK and FIRK by following the patch in NonlinearSolve.jl, but seemed the current diffcache used in doesn't compatible with bigfloat, need to do some further searching)
What is the error?
How about switching to a lazy buffer cache to get this completed?
How about switching to a lazy buffer cache to get this completed?
I have tried both LazyBufferCache and FixedSizeDiffCache, but none of them seems to support ForwardDiff AD when we are using BigFloat in BoundaryValueDiffEq.jl, for example the LazyBufferCache version https://github.com/SciML/PreallocationTools.jl/issues/72 still fail.
The current issue is that without using DiffCache, it is hard to build such Dual number with BigFloat type to support ForwardDiff AD things. I don't think changing to LazyBufferCache or GeneralLazyBufferCache can solve this.
The Lobatto and Radau solvers without nested nonlinear solving are working fine now, but those using nested nonlinear solving behave bad and don't converge on several test problems.
Maybe we can get those Lobatto and Radau without nested nonlinear solving in first?
How come that doesn't show up as a test failure?
Those failed tests are commented out, the current CI only test FIRK solvers without nested nonlinear solving, but from local testing, the nested nonlinear solving choice only passes the basic test cases like affineness and linear problem tests.
I believe @axla-io had said nested had issues earlier. Let's get this to merge where nested throws a warning where it's still experimental and its tests are properly set to @test_broken.
The ForwardDiff issue have been fixed and the nested nonlinear solving FIRK methods are working fine now, I also cleaned the PR to be compatible with the current style. So this PR is ready finally!!!🎉🎉🎉🎉
Now the only problem we have is that the tests take a loooong time to complete😅, are we going to move them to buildkite or decrease the number of the testing solvers(NLLS testing cases test over 100 combination of FIRK solver with different NonlinearSolve solvers, which is the most time-consuming)?
Parts of the code need some work for performance. But let's not worry about that for now.
Fix the deprecations and the @test_brokens and this is then good to merge.
Hit a time limit? We really need to split this into different test groups.
Yeah, I will split them into different test groups
Is the Retest parallelism actually helpful? BLAS will multithread, so we may be slamming the threads on any large enough problem. I don't think that part makes sense in the BVP library, or ML @avik-pal ?
Is the Retest parallelism actually helpful? BLAS will multithread, so we may be slamming the threads on any large enough problem. I don't think that part makes sense in the BVP library, or ML @avik-pal ?
Are any of the tests here heavy BLAS? In the ML context, none of the Lux tests are heavy enough to cause issues with oversubscribing, and it helps bring down the build times to about 25 mins (compared to > 1hr) on decent machines (the cuda ones on buildkite). Not the most helpful for github actions.
That said for testing heavy workloads (both compute or memory), I do disable parallelism in testing. I still keep ReTestItems because it lets you isolate the testitem that fails and for testing Enzyme which can segfault/assert crash julia it will just crash the worker, so the test shows up as failed rather than the main process crashing.
The LU/QR will all multithread above 200x200, which I presume we must be hitting?