memilio icon indicating copy to clipboard operation
memilio copied to clipboard

652 Add serialization to ABM

Open reneSchm opened this issue 1 year ago • 4 comments

Changes and Information

Please briefly list the changes (main added features, changed items, or corrected bugs) made:

  • Add an default serialization feature to make adding (de)serialization more convenient.
  • Add a TimeSeriesFunctor to replace std::functions in ABM parameters.
  • Make several objects serializable.

If need be, add additional information and what the reviewer should look out for in particular:

Merge Request - Guideline Checklist

Please check our git workflow. Use the draft feature if the Pull Request is not yet ready to review.

Checks by code author

  • [x] Every addressed issue is linked (use the "Closes #ISSUE" keyword below)
  • [x] New code adheres to coding guidelines
  • [x] No large data files have been added (files should in sum not exceed 100 KB, avoid PDFs, Word docs, etc.)
  • [x] Tests are added for new functionality and a local test run was successful (with and without OpenMP)
  • [x] Appropriate documentation for new functionality has been added (Doxygen in the code and Markdown files if necessary)
  • [x] Proper attention to licenses, especially no new third-party software with conflicting license has been added
  • [x] (For ABM development) Checked benchmark results and ran and posted a local test above from before and after development to ensure performance is monitored.
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
abm_benchmark/abm_benchmark_50k        4066 ms         4066 ms            1
abm_benchmark/abm_benchmark_100k       7722 ms         7722 ms            1
abm_benchmark/abm_benchmark_200k      17136 ms        17134 ms            1

Checks by code reviewer(s)

  • [ ] Corresponding issue(s) is/are linked and addressed
  • [ ] Code is clean of development artifacts (no deactivated or commented code lines, no debugging printouts, etc.)
  • [ ] Appropriate unit tests have been added, CI passes, code coverage and performance is acceptable (did not decrease)
  • [ ] No large data files added in the whole history of commits(files should in sum not exceed 100 KB, avoid PDFs, Word docs, etc.)
  • [ ] On merge, add 2-5 lines with the changes (main added features, changed items, or corrected bugs) to the merge-commit-message. This can be taken from the briefly-list-the-changes above (best case) or the separate commit messages (worst case).

Closes #652

reneSchm avatar Jul 19 '24 17:07 reneSchm

Codecov Report

Attention: Patch coverage is 98.29932% with 5 lines in your changes missing coverage. Please review.

Project coverage is 96.84%. Comparing base (0134012) to head (91b6213).

Files with missing lines Patch % Lines
cpp/memilio/math/time_series_functor.h 84.61% 4 Missing :warning:
cpp/memilio/io/io.h 95.83% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1072      +/-   ##
==========================================
+ Coverage   96.59%   96.84%   +0.25%     
==========================================
  Files         137      140       +3     
  Lines       11057    11363     +306     
==========================================
+ Hits        10680    11005     +325     
+ Misses        377      358      -19     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Jul 19 '24 18:07 codecov[bot]

In the last commits I added

  • the option for auto-serialization to directly use lists. this ends up making no difference in the json output of tested objects (i.e. most objects used in the ABM), so I will remove it again since it only makes the code more complicated. Note that the serialize_internal for Containers is never used, even when exclusively using add_object, hence e.g. vectors are added as Json lists and not as a Json object named "List".
  • a mio::apply taking a tuple, and used it in auto_deserialize_impl. I don't particularly like the result, as I had to add a assignment operator to set NVP::value through std::tuple::operator=. This is needed as I never unpack the NVP-tuple in the mio::apply function.

I will undo these changes in the next commit.

reneSchm avatar Aug 07 '24 09:08 reneSchm

On this branch:

---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
abm_benchmark/abm_benchmark_50k        4066 ms         4066 ms            1
abm_benchmark/abm_benchmark_100k       7722 ms         7722 ms            1
abm_benchmark/abm_benchmark_200k      17136 ms        17134 ms            1

On main:

---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
abm_benchmark/abm_benchmark_50k        4219 ms         4219 ms            1
abm_benchmark/abm_benchmark_100k       8656 ms         8655 ms            1
abm_benchmark/abm_benchmark_200k      18886 ms        18884 ms            1

Benchmarks are made on the Team Server. I would say this looks like no significant change.

reneSchm avatar Sep 17 '24 13:09 reneSchm

I also wrote a makeshift benchmark to compare the TimeSeriesFunctor with the std::function set to

[](ScalarType t) -> ScalarType {
    return mio::linear_interpolation_of_data_set<ScalarType, ScalarType>({... data ...}, t);
};

that was used as a ABM parameter prior to this branch, since abm_benchmark does not set it to anything other than zero.

The results are

---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
TimeSeriesFunctor Random         59.0 ns         59.0 ns      9929083
Lambda Random                     107 ns          107 ns      6539624
TimeSeriesFunctor Circular       8.22 ns         8.22 ns     79767155
Lambda Circular                  33.0 ns         33.0 ns     21066329
TimeSeriesFunctor Zero           4.80 ns         4.80 ns    149843086
Lambda Zero                      1.56 ns         1.56 ns    437602705

with

  • Random: Uses data taken from simulations/abm.cpp:474, evaluated with seeded uniform random numbers within the given time points
  • Circular: Same data, evaluated with t going from the smalest to largest time point with 0.1 increments, restarting at the smallest
  • Zero: Uses default constructed TimeSeriesFunctor and [](ScalarType t) {return 0.;};, evaluated at 0

Looks like the functor is actually more performant than the std::function, which is a nice plus. Only Zero case could be improved for the functor, but it is probably fast enough.

Update

The current non-lazy version performs as shown in the table below. Not impressive, but still fast enough.

---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
TimeSeriesFunctor Random         92.5 ns         92.5 ns      7035594
Lambda Random                     104 ns          104 ns      6721721
TimeSeriesFunctor Circular       25.1 ns         25.1 ns     23194957
Lambda Circular                  33.7 ns         33.7 ns     20853859
TimeSeriesFunctor Zero           19.7 ns         19.7 ns     34891272
Lambda Zero                      1.33 ns         1.33 ns    525952368

reneSchm avatar Sep 17 '24 13:09 reneSchm