RocketPy icon indicating copy to clipboard operation
RocketPy copied to clipboard

ENH: Parallel mode for monte-carlo simulations

Open brunosorban opened this issue 1 year ago • 8 comments

This pull request implements the option to run simulations in parallel to the MonteCarlo class. The feature is using a context manager named MonteCarloManager to centralize all workers and shared objects, ensuring proper termination of the sub-processes.

A second feature is the possibility to export (close to) all simulation inputs and outputs to an .h5 file. The file can be visualized via HDF View (or similar) software. Since it's a not so conventional file, method to read and a structure to post-process multiple simulations was also added under rocketpy/stochastic/post_processing. There's a cache handling the data manipulation where a 3D numpy array is returned with all simulations, the shape corresponds to (simulation_index, time_index, column). column is reserved for vector data, where x,y and z, for example, may be available under the same data. For example, under cache.read_inputs('motors/thrust_source') time and thrust will be found.

Pull request type

  • [x] Code changes (bugfix, features)

Checklist

  • [ ] Tests for the changes have been added (if needed)
  • [x] Docs have been reviewed and added / updated
  • [ ] Lint (black rocketpy/ tests/) has passed locally
  • [ ] All tests (pytest tests -m slow --runslow) have passed locally
  • [ ] CHANGELOG.md has been updated (if relevant)

Current behavior

In the current moment, montecarlo simulations must run in parallel and all outputs a txt file

New behavior

The montecarlo simulations may now be executed in parallel and all outputs may be exported to a txt or an h5 file, saving some key data or everything.

Breaking change

  • [ ] Yes
  • [x] No

Additional information

None

brunosorban avatar Jun 09 '24 13:06 brunosorban

Benchmark of the results. A machine with 6 cores(12 threads) was used.

workers_performance

brunosorban avatar Jun 09 '24 19:06 brunosorban

Amazing feature, as the results show the MonteCarlo class has great potential for parallelization.

The only blocking issue I see with this PR is the serialization code. It still does not support all of rocketpy features and requires a lot of maintanance and updates on our end.

Do you see any other option for performing the serialization of inputs?

@phmbressan we should make all the classes json serializable, it's an open issue at #522 . In the meantime, maybe we could still use the _encoders module to serialize inputs.

I agree with you that implementing flight class serialization within this PR may conflict create maintenance issues for us. The simplest solution would be to delete the flightv1_serializer (and similar) function.

Gui-FernandesBR avatar Jun 18 '24 10:06 Gui-FernandesBR

Codecov Report

Attention: Patch coverage is 35.49784% with 149 lines in your changes missing coverage. Please review.

Project coverage is 79.34%. Comparing base (83aa20e) to head (4e0ef92). Report is 15 commits behind head on develop.

Files with missing lines Patch % Lines
rocketpy/simulation/monte_carlo.py 26.17% 141 Missing :warning:
rocketpy/rocket/components.py 33.33% 4 Missing :warning:
rocketpy/stochastic/stochastic_rocket.py 81.81% 4 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #619      +/-   ##
===========================================
+ Coverage    76.42%   79.34%   +2.92%     
===========================================
  Files           95       95              
  Lines        11090    11496     +406     
===========================================
+ Hits          8475     9121     +646     
+ Misses        2615     2375     -240     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jul 29 '24 14:07 codecov[bot]

The monte_carlo_class_usage notebook currently does not work with parallel, I did not have time to look into it, and so I did not review the parallel part of the code

I know your review was just temporary, but could you be a bit more specific on the parallel side not working? It might be an OS related issue that we should fix of course, but here things were working fine.

phmbressan avatar Aug 16 '24 21:08 phmbressan

I know your review was just temporary, but could you be a bit more specific on the parallel side not working? It might be an OS related issue that we should fix of course, but here things were working fine.

Open the monte_carlo_class_usage.ipynb and run all cells.

The parameter parallel is set to True, so the simulation runs in parallel.

After the sim is done, nothing is saved to the .inputs.txt or .outputs.txt files

If you set parallel to False instead, the results are saved correctly

MateusStano avatar Aug 16 '24 23:08 MateusStano

I have pushed a fix for the issue on file writing when running on Windows (more accurately on processes spawn mode). I have tested it on a Windows machine and it was running correctly, but I invite reviewers to test also in different OS configs.

Issues solved by this PR:

  • [X] MonteCarlo simulations have a parallel mode;
  • [X] Both the simulation execution and data saving are executed in parallel (producer - consumer);
  • [X] There are performance gains on large simulations;
  • [X] The serial simulations can be executed in the same fasion and the outputs of both ways are compatible.

Points of Improvement:

  • [ ] Soft Interrupts of parallel simulations (e.g. an exception or Ctrl-C) are only effective on Linux. Spawned processes (Windows) currently are hard stopping.
  • [ ] On Windows, the Jupyter notebook will not show the status update prints (running the simulations in a terminal is fine). This seems to be a OS level std output change that is not easily solved.

Some of these points could become issues of the repository. Stating them here for proper PR documentation.

Future Considerations:

  • Python 3.14 and forward will make the spawn the default start method for all OS. We could change RocketPy start method stay as fork on Linux if this undermines too much the performance;
  • The Python GIL should be removed some years from now (PEP703), this could bring performance benefits, since Threads are generally faster to start.

phmbressan avatar Aug 23 '24 15:08 phmbressan

@phmbressan I like the way this PR was refactored. Many thanks for your effort.

Please fix the pylint errors and solve all the open conversations in this PR so we can approve and merge it onto develop!

Optionally, try to rebase the PR to get the latest commits from develop.

Gui-FernandesBR avatar Aug 28 '24 04:08 Gui-FernandesBR

Converted to draft until you solve the remaining issues, specially the random number generation problem, @phmbressan

Gui-FernandesBR avatar Sep 08 '24 23:09 Gui-FernandesBR

I believe this PR is ready again for another round of review. These are the changes since the previous review:

  1. @phmbressan has done some great work simplifying and optimizing even further the parallel structure, and a sim_consumer process is no longer needed;
  2. @phmbressan and I fixed the random number generator bug. The solution consisted in resetting all stochastic structures inside the StochasticRocket and their position. The simplest solution we found, without changing things that go directly to either Rocket and Flight, is implemented in the methods _set_stochastic and __reset_components of StochasticRocket, so please take a closer look at both;
  3. a very very minor fix in some of the methods of Components, just make sure that they make sense.

Overall, it seems that the time per iteration is even faster now, at least by my local measurements. @phmbressan might want to complement the information provided here, he knows this PR much better than I do!

Please, make sure to take a careful look at the Monte Carlo .input file to check that there is indeed no dependency on the generated random variables.

Lucas-Prates avatar Dec 18 '24 13:12 Lucas-Prates

Another important issue: I currently can not interrupt the MonteCarlo.simulate method smoothly when it is run in parallel, all attempts lead to killing the notebook :fearful: ! Would be great to check if the same is happening in your own machines.

Lucas-Prates avatar Dec 18 '24 15:12 Lucas-Prates

Please follow this one: https://github.com/RocketPy-Team/RocketPy/pull/768

Gui-FernandesBR avatar Feb 10 '25 08:02 Gui-FernandesBR