PyBaMM icon indicating copy to clipboard operation
PyBaMM copied to clipboard

[Roadmap area] Solver improvements

Open brosaplanella opened this issue 1 year ago • 14 comments

I'd like to continue making pybamm easier to work with for the parameter inference libraries like PyBOP (https://github.com/pybop-team/PyBOP). At the moment I think this includes:

  • GPU as well as multithreaded support for running many simulations in parallel (https://github.com/pybamm-team/PyBaMM/issues/3826)
  • Integrate sensitivities more into Simulation (https://github.com/pybamm-team/PyBaMM/issues/3834)
  • Run multiple simulations from different starting points (https://github.com/pybamm-team/PyBaMM/issues/3713). Should be possible to reset the starting points during the middle of a solve
  • This is generally "making it faster", so profiling would fit in here (see https://github.com/pybamm-team/PyBaMM/pull/3862)
  • others?....

Originally posted by @martinjrobins in https://github.com/pybamm-team/PyBaMM/issues/3839#issuecomment-1999464816

brosaplanella avatar Mar 20 '24 13:03 brosaplanella

Not solver, but this is related to speed-up: https://github.com/pybamm-team/PyBaMM/issues/4058

rtimms avatar Jun 07 '24 15:06 rtimms

This is my attempt at a roadmap for PyBaMM's solvers, comments / suggestions welcome

Sensitivities

  • [ ] Sensitivities not currently usable in experiments, add this functionality
    • Requires propagating sensitivities through events
    • Casadi will have this functionality in Autumn ‘24, integrate this into pybamm
  • [ ] Automatic differentiation (forwards + backwards) through solvers using Jax. This builds on existing work wrapping idaklu solver as a Jax operation.
  • [ ] Higher order gradients (e.g. hessians)

Multithreaded/GPU support:

  • [ ] merge in multithreaded Idaklu solver code from https://github.com/pybamm-team/PyBaMM/pull/4260
  • [ ] once casadi solver has events support: use map to parallise over inputs
  • [ ] gpu support in casadi or idaklu
    • most likely we can do this for idaklu mlir solver once iree supports 64 bit floats
    • or casadi if they are planning this, but no sign of this.

Post Processing optimisation and refactoring

  • [ ] Solution objects (and associated ProcessedVariables objects): clean up the code, refactor so casadi dependency is reduced, and optimise the postprocessing computation for speed of execution

Solver refactoring

  • [ ] Remove unnecessary code, e.g. python idaklu solver #4223
  • [ ] Separate out idaklu solver to separate library, Will reduce complexity of PyBaMM releases and infrastructure development
  • [ ] Reduce duplicated functionality across solvers

Solver documentation

  • [ ] Create a “high performance pybamm” user guide to explain how to use the solvers.

martinjrobins avatar Jul 15 '24 18:07 martinjrobins

Here are my thoughts on improving the solver. Please let me know if you have comments or questions @martinjrobins and others.

Solver options

  • [x] https://github.com/pybamm-team/PyBaMM/pull/4282

Initialization

  • [x] https://github.com/pybamm-team/PyBaMM/pull/4301

Time stepping

In PyBaMM, the solver currently stops at all values in the t_eval vector. It seems that we use t_eval and set some dt for a few distinct purposes:

  1. To enforce time-dependent discontinuities within a single step, like with a drive cycle
  2. To set the data collection frequency (as in the period experiment kwarg)
  3. Setting a small dt to force better solver convergence (take smaller time steps)

These three reasons for setting a dt are all valid, but stopping the simulation at all time steps can have a drastic impact on the adaptive time stepping and performance. For example, consider a full C/10 discharge with

  • a. t_eval = [0, 36000] (i.e., no stops)
  • b. t_eval = np.arange(0, 36060, 60) (pybamm default)

If we compare the solver stats,

  • Number of steps: a. 165 vs b. 715
  • Number of residual calls: a. 288 vs b. 823
  • Number of linear solver setups: a. 28 vs b. 91
  • Number of nonlinear iterations: a. 286 vs b. 821
  • DAE integration time: a. 25.5 ms vs b. 97.1 ms

Even though we solve the same system, the dense t_eval b. is nearly 4x slower! To address these issues, I propose the following changes that align the time-stepping options with Julia's differential equation solvers (see Output Control):

  • [ ] (Breaking) By default, save every t and y determined by IDA's adaptive time stepping algorithm. This eliminates issues like the one above with the C/10 discharge. We can accurately post-interpolate the solution onto specific times with IDA's Hermite interpolator. This is a huge benefit for parameter estimation because we will always obtain the full solution accuracy regardless of the data collection frequency.
  • [ ] (Non-breaking) Change the description of t_eval to match Julia's tstops: "Denotes extra times that the time-stepping algorithm must step to. This should be used to help the solver deal with discontinuities and singularities, since stepping exactly at the time of the discontinuity will improve accuracy." With this option, drive cycles in 1. still work
  • [ ] (Non-breaking) Add a solver option that matches Julia's saveat: "Denotes specific times to save the solution at, during the solving phase. The solver will save at each of the timepoints in this array in the most efficient manner available to the solver." This addresses the data collection frequency in 2. without negatively affecting performance since it interpolates the solution with IDAGetDky().
  • [ ] (Non-breaking) Discourage modifying t_eval for performance issues and encourage modifying appropriate solver options (rtol, atol, dt_max, etc.), which addresses 3.

MarcBerliner avatar Jul 31 '24 15:07 MarcBerliner

yea, agree that it would be great to get rid of IDASetStopTime! or at least reduce the number of times we use it. Would this play nicely with the casadi solver?

martinjrobins avatar Jul 31 '24 18:07 martinjrobins

Would this play nicely with the casadi solver?

If we can access IDA's internal time steps via Casadi, I think we can do most of this stuff

MarcBerliner avatar Jul 31 '24 21:07 MarcBerliner

Corollary from the PyBaMM developer meeting on 05/08/2024 as a part of PyBaMM running in WASM:

  • CasADi can be compiled with the Emscripten toolchain starting with the upcoming v3.6.6 and will be available in Pyodide with the next patch or minor version (either v0.26.3 or v0.27.0).
    • uses NumPy for computations, so no floating point imprecisions were noticed (so far)
  • Compiling IDAS doesn't take too much effort; linking with KLU could be difficult. I'm unsure if IREE is available/possible. Reference BLAS/LAPACK implementations should be available and configurable.
    • the IDAKLU solver should be possible to compile in theory, but threading/OpenMP will need to be disabled via #4260 or similar
    • the current sdist is broken, can be fixed via further changes in #4242
  • Best way to proceed with this is to compile PyBaMM without the IDAKLU solver for now (i.e., as a pure Python package) and provide pybamm.CasadiSolver() through a Pyodide instance, which can then be used in the docs for the example notebooks or other in-browser uses (more or less, JupyterLite)
    • ship a pure Python wheel (with BUILD_IDAKLU set to OFF) in addition to platform-specific wheels, since pip always chooses the most specific wheel available

agriyakhetarpal avatar Aug 06 '24 14:08 agriyakhetarpal

@agriyakhetarpal: FYI, I've compiled IDA with KLU using enscripten in the past, see this repo: https://github.com/martinjrobins/diffeq-runtime. However I agree that focusing on the casadi solver for now is the best approach since you have it already compiled and in Pyodide

martinjrobins avatar Aug 06 '24 14:08 martinjrobins

Thanks for the resource, @martinjrobins! I see you've built a static lib for KLU and also used NO_LAPACK – should work well. I'll tinker with the settings and maybe I can reduce the binary size a bit :)

agriyakhetarpal avatar Aug 06 '24 15:08 agriyakhetarpal

To improve the speed of our ODE models, I'd like to add CVODE from SUNDIALS to our suite of C solvers in addition to IDA. I have a few questions about the implementation, and I'd like to hear your thoughts @jsbrittain, @martinjrobins, @pipliggins, and others.

  • In C++, we can make a base solver class and derived classes for IDA and CVODE. The solver code structure for both will be very similar, so this will help with code reuse. However, this will cause a headache for some of the existing PRs.
  • The CasadiSolver automatically determines if the system is an ODE or a DAE and passes it to the appropriate solver. I think this approach also makes sense for our C solvers. If we do take this approach, then...
  • One minor issue is that the IDAKLUSolver name will no longer be accurate. Since we plan on streamlining the IDAKLU installation process and changing the default away from CasadiSolver, maybe we can also rename IDAKLUSolver to SundialsSolver or something. This might make the "new and improved default solver" announcement splashier (cc @valentinsulzer @rtimms).

And just FYI, I don't plan on starting this work for at least a couple of weeks.

MarcBerliner avatar Aug 27 '24 14:08 MarcBerliner