Ribasim
Ribasim copied to clipboard
Use PrecompileTools to compile basic model on build.
Should help with latency (#1942) on our executables. The basic testmodel runs in half a second, and models related to basic see improvements too (but much less) with this PR.
This sets up a precompile workload using PrecompileTools, which also compiles the statements of other packages with our types, instead of just Ribasim. This also sets the specialization level of SciML to full, which is recommended for simulations.
However, as both will significantly increase compile time (and the model printing progress during it), I've disabled it by default for developers.
I believe that if we fix the type explosion of our Parameters we will have a fast executable for all models, although we ideally should have a workload that includes all our functionality (arrow, alloc, etc.).
Wow, this crashes hard on TC:
Windows
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x0 -- unknown function (ip: 0000000000000000)
in expression starting at C:\Users\svc-teamcity-ansible\.julia\packages\SimpleNonlinearSolve\1h0BO\src\SimpleNonlinearSolve.jl:143
unknown function (ip: 0000000000000000)
Allocations: 38904371 (Pool: 38903382; Big: 989); GC: 30
ERROR: failed process: Process(`'C:\Users\svc-teamcity-ansible\.julia\juliaup\julia-1.11.3+0.x64.w64.mingw32\bin\julia.exe' --color=auto --startup-file=no --pkgimages=no '--cpu-target=generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)' '--sysimage=D:\buildAgent\temp\buildTmp\jl_nXpMnc\tmp_sys.dll' '--project=D:\buildAgent\work\ecd2b8f9b25b1609\ribasim\core' '--output-o=D:\buildAgent\temp\buildTmp\jl_7EHglq5Wd4-o.a' 'D:\buildAgent\temp\buildTmp\jl_HOzqJyl6d4'`, ProcessExited(1)) [1]
Linux
[1083399] signal 11 (1): Segmentation fault
in expression starting at /u/svc-teamcity-ansible/.julia/packages/SimpleNonlinearSolve/1h0BO/src/SimpleNonlinearSolve.jl:143
unknown function (ip: (nil))
Allocations: 38164904 (Pool: 38163932; Big: 972); GC: 29
ERROR: failed process: Process(`/u/svc-teamcity-ansible/.julia/juliaup/julia-1.11.3+0.x64.linux.gnu/bin/julia --color=auto --startup-file=no --pkgimages=no '--cpu-target=generic;sandybridge,-xsaveopt,clone_all;haswell,-rdrnd,base(1)' --sysimage=/opt/teamcityagent/temp/buildTmp/jl_J234qA/tmp_sys.so --project=/opt/teamcityagent/work/ecd2b8f9b25b1609/ribasim/core --output-o=/opt/teamcityagent/temp/buildTmp/jl_qFglC5m7en-o.a /opt/teamcityagent/temp/buildTmp/jl_QPqpwR412H`, ProcessSignaled(11)) [0]
I think that's the fullspecialize? Also note that the second run of the testmodel is not instant, if anything it's slower. @visr, could you build locally and see what happens?
I'll first finish the state Vector, which may help here. And if that doesn't help lets try disabling FullSpecialize.
Also discussed with @evetion that if all goes well we may be able to only use PrecompileTools and not PackageCompiler precompilation on top.
And if PrecompileTools speeds up other models as well we should always enable it.
Still failing at PackageCompiler creating a sysimage:
15:07:01 Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
15:07:01 Exception: EXCEPTION_ACCESS_VIOLATION at 0x0 -- unknown function (ip: 0000000000000000)
15:07:01 in expression starting at C:\Users\svc-teamcity-ansible\.julia\packages\SimpleNonlinearSolve\1h0BO\src\SimpleNonlinearSolve.jl:143
15:07:01 unknown function (ip: 0000000000000000)
15:07:01 Allocations: 37311783 (Pool: 37310847; Big: 936); GC: 29
This is possibly because with PrecompileTools it will compile more good leading to a sysimage that is too large?
The line that it is failing is also using PrecompileTools. https://github.com/SciML/NonlinearSolve.jl/blob/v4.4.0/lib/SimpleNonlinearSolve/src/SimpleNonlinearSolve.jl#L143
Sounds like we need to put this on hold for a little longer.
On Linux this seems to work now (even though the build log doesn't look like it), as tested on our cluster:
(base) pronk_mn@v-slurmsub001 ~ $ time ./ribasim/ribasim basic/ribasim.toml
┌ Info: Starting a Ribasim simulation.
│ toml_path = "basic/ribasim.toml"
│ cli.ribasim_version = "2025.1.0"
│ starttime = 2020-01-01T00:00:00
└ endtime = 2021-01-01T00:00:00
┌ Warning: The following experimental features are enabled: concentration
└ @ Ribasim /opt/teamcityagent/work/ecd2b8f9b25b1609/ribasim/core/src/main.jl:42
Simulating 100%|████████████████████████████████████████████████████████████| Time: 0:00:00
┌ Info: Convergence bottlenecks in descending order of severity:
│ LinearResistance #12 = 8.149730746987561e-6
│ Basin #6 = 2.7041589660110494e-6
│ TabulatedRatingCurve #4 = 1.0239692169298505e-6
│ Basin #3 = 1.0238832862484456e-6
└ Basin #1 = 9.825234235602696e-7
[ Info: The model finished successfully.
real 0m1.987s
user 0m1.833s
sys 0m0.211s
@visr If you can confirm this for the Windows build, I'd say this is good to merge.
Comparing the CLI from this branch against main I don't see any speedup from this at all. The basic model is 1s for both, and the trivial model is 16s for both. Even though trivial is a subset of basic feature wise, hinting that perhaps #2127 won't help much at this point either.
That doesn't mean we shouldn't merge it though, hopefully this will start making a difference with further latency work.
@SouthEndMusic I've set the AD chunk_size to 1 here, which causes the subnetwork test models to become unstable. Could you take a look why that is?
@visr How do you want to move on with this PR? If we want to tackle the cli latency, we need to build two executables.
Perhaps we should focus this PR only on latency for CI and developers and tackle the CLI latency separately? Before introducing two CLIs it would be nice to have more data on the tradeoff, to see if it is worth it at all.
Perhaps we should focus this PR only on latency for CI and developers and tackle the CLI latency separately? Before introducing two CLIs it would be nice to have more data on the tradeoff, to see if it is worth it at all.
Ok, this is now good to go. Please review :)
Julia testmodels now runs in 10-14 minutes, where it was 20-30 minutes.