memilio icon indicating copy to clipboard operation
memilio copied to clipboard

757 dont store pointers in the abm

Open reneSchm opened this issue 1 year ago • 3 comments

Changes and Information

Please briefly list the changes made, additional Information and what the Reviewer should look out for:

  • remove pointers from World, Location, Person
  • this may change quite a bit more

Merge Request - Guideline Checklist

Please check our git workflow. Use the draft feature if the Pull Request is not yet ready to review.

Checks by code author

  • [x] Every addressed issue is linked (use the "Closes #ISSUE" keyword below)
  • [x] New code adheres to coding guidelines
  • [x] No large data files have been added (files should in sum not exceed 100 KB, avoid PDFs, Word docs, etc.)
  • [x] Tests are added for new functionality and a local test run was successful
  • [x] Appropriate documentation for new functionality has been added (Doxygen in the code and Markdown files if necessary)
  • [x] Proper attention to licenses, especially no new third-party software with conflicting license has been added
  • [x] (For ABM development) Checked benchmark results
abm_benchmark/abm_benchmark_50k        4290 ms         4288 ms            1
abm_benchmark/abm_benchmark_100k       9741 ms         9737 ms            1
abm_benchmark/abm_benchmark_200k      20698 ms        20691 ms            1

Checks by code reviewer(s)

  • [ ] Corresponding issue(s) is/are linked and addressed
  • [ ] Code is clean of development artifacts (no deactivated or commented code lines, no debugging printouts, etc.)
  • [ ] Appropriate unit tests have been added, CI passes and code coverage is acceptable (did not decrease)
  • [ ] No large data files added in the whole history of commits(files should in sum not exceed 100 KB, avoid PDFs, Word docs, etc.)

reneSchm avatar Dec 07 '23 16:12 reneSchm

I have a few open items that may need a bit of discussion:

  • Naming: I moved several methods around, without changing their names. None of them are particularly bad, but I think there should be better names out there - I would really appreciate some suggestions. Some examples:
    • get_number_persons returns #persons at a location, which in general is not the same as the size of get_persons
    • get_subpopulation is okay, but does not fit well with get_number_persons
    • The names migrate and interact both in the World and as free functions are not really descriptive
    • The header functions.h still needs a more meaningful name. It contains ABM defining free functions (i.e. non class member functions) like migrate and interact.
  • Caching: Currently, I only cache the population size at each location. As far as I can tell that is the only metric used during simulations, but we are missing e.g. cell populations or persons of a given infection state. The latter would be used by get_subpopulations. Should I stratify the cache or add more caches? If so, which?
  • Should we add functions like Person::migrate(_to) back? I don't think so, since we only replace one argument by the calling class, and it adds more code to maintain without adding any functionality.
  • For now I kept most tests running, but I think they need a bigger rework. I suggest doing that in another issue, since I barely changed any effective behavior here (but a lot of internals/interfaces). This way we could merge this PR faster.
  • I marked the move (assignment) constructor for World as default, so they will just make a copy using the custom copy ctor. I don't think we gain much here by actually moving something, but we could define a custom move ctor anyways.

reneSchm avatar Mar 14 '24 14:03 reneSchm

Codecov Report

Attention: Patch coverage is 97.43590% with 10 lines in your changes missing coverage. Please review.

Project coverage is 96.09%. Comparing base (9cf30ee) to head (e380124). Report is 3 commits behind head on main.

Files Patch % Lines
cpp/models/abm/model_functions.cpp 92.42% 5 Missing :warning:
cpp/models/abm/world.h 94.64% 3 Missing :warning:
cpp/models/abm/person.h 88.88% 1 Missing :warning:
cpp/models/abm/world.cpp 99.04% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #863      +/-   ##
==========================================
- Coverage   96.24%   96.09%   -0.15%     
==========================================
  Files         127      131       +4     
  Lines       10887    11046     +159     
==========================================
+ Hits        10478    10615     +137     
- Misses        409      431      +22     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 19 '24 07:03 codecov[bot]

On performance: There are some small improvements, but unsurprisingly nothing big. Here is single threaded (OMP_NUM_THREADS=1) performance of the current main branch as reference:

Running ../build/bin/abm_benchmark
Run on (56 X 3700 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x56)
  L1 Instruction 32 KiB (x56)
  L2 Unified 1024 KiB (x56)
  L3 Unified 19712 KiB (x4)
Load Average: 1.03, 2.78, 1.36
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
abm_benchmark/abm_benchmark_50k        4617 ms         4615 ms            1
abm_benchmark/abm_benchmark_100k      10602 ms        10598 ms            1
abm_benchmark/abm_benchmark_200k      23102 ms        23094 ms            1

All benchmarks are made on the HPDA Cluster using one node (with --exclusive). I will leave out the preamble now, since it does not change (apart from the load average, which is usually less then 2.00).

Here are the current results for this PR:

abm_benchmark/abm_benchmark_50k        4290 ms         4288 ms            1
abm_benchmark/abm_benchmark_100k       9741 ms         9737 ms            1
abm_benchmark/abm_benchmark_200k      20698 ms        20691 ms            1

So we got very roughly a 5-10% performance gain.

Now to the parallel performance. This is more difficult to measure, as the timing results seem somewhat random. Perhaps this is due to the CPU scaling, but at least it is consistent in both branches. Here is an example for 20 threads, launched with (OMP_NUM_THREADS=20 ./build/bin/abm_benchmark) && (OMP_NUM_THREADS=20 ./build/bin/abm_benchmark), i.e. the second benchmark is run immediately after the first:

abm_benchmark/abm_benchmark_50k        2078 ms         2077 ms            1
abm_benchmark/abm_benchmark_100k       4364 ms         4363 ms            1
abm_benchmark/abm_benchmark_200k       9036 ms         9033 ms            1
---------------------------------------------------------------------------
abm_benchmark/abm_benchmark_50k        1115 ms         1115 ms            1
abm_benchmark/abm_benchmark_100k       2354 ms         2353 ms            1
abm_benchmark/abm_benchmark_200k       5316 ms         5312 ms            1

Note that the scaling seems to be consistent with the first run's time. But that time changes randomly every time. Also, the second run may be the slower of the two, or equally as fast. I see no pattern here.

Anyways, to get just a quick idea how well this branch scales, I ran 10 benchmarks and picked only the fastest result:

--1-thread-----------------------------------------------------------------
abm_benchmark/abm_benchmark_50k        4267 ms         4264 ms            1
abm_benchmark/abm_benchmark_100k       9820 ms         9816 ms            1
abm_benchmark/abm_benchmark_200k      20881 ms        20874 ms            1
--2-threads----------------------------------------------------------------
abm_benchmark/abm_benchmark_50k        2368 ms         2366 ms            1
abm_benchmark/abm_benchmark_100k       5415 ms         5412 ms            1
abm_benchmark/abm_benchmark_200k      11600 ms        11596 ms            1
--4-threads----------------------------------------------------------------
abm_benchmark/abm_benchmark_50k        1676 ms         1675 ms            1
abm_benchmark/abm_benchmark_100k       3839 ms         3837 ms            1
abm_benchmark/abm_benchmark_200k       8238 ms         8232 ms            1
--8-threads----------------------------------------------------------------
abm_benchmark/abm_benchmark_50k        1364 ms         1363 ms            1
abm_benchmark/abm_benchmark_100k       2914 ms         2913 ms            1
abm_benchmark/abm_benchmark_200k       6412 ms         6391 ms            1
-16-threads----------------------------------------------------------------
abm_benchmark/abm_benchmark_50k         931 ms          930 ms            1
abm_benchmark/abm_benchmark_100k       1918 ms         1917 ms            1
abm_benchmark/abm_benchmark_200k       4798 ms         4791 ms            1
-32-threads----------------------------------------------------------------
abm_benchmark/abm_benchmark_50k        1068 ms         1068 ms            1
abm_benchmark/abm_benchmark_100k       2264 ms         2263 ms            1
abm_benchmark/abm_benchmark_200k       5218 ms         5214 ms            1

reneSchm avatar Mar 26 '24 09:03 reneSchm

We could omit the parameter LocationType in a lot of places where it is only necessary to identify the location, as the Id is unique for a location.

My IDE usually complaints if there are unused variables, so I don't think that happens too often in our code. But we probably could rework e.g. the testing strategy, so that it stores two vectors, one for location id and one for type. This way we could also remove all the find-ifs and use constant time lookup instead. But again, I would prefer opening a new issue instead of doing this in here as well.

reneSchm avatar Jul 17 '24 12:07 reneSchm

Quick remark:

In the benchmark get_subpopulation_combined is very slow, this maybe needs to be investigated.

xsaschako avatar Aug 22 '24 12:08 xsaschako

In the benchmark get_subpopulation_combined is very slow, this maybe needs to be investigated.

This is partially intentional, as get_subpopulation needs to iterate over all persons instead of all persons at the given location, which we currently do not cache. Although get_subpopulation_combined is in fact a lot slower than it has to be right now, instead of calling get_subpopulation it should iterate over all persons exactly once. Something like:

size_t World::get_subpopulation_combined(TimePoint t, InfectionState s) const
{
    return std::count_if(m_persons.begin(), m_persons.end(), [&](const auto& p) {
        return p.get_infection_state(t) == s;
    });
}

reneSchm avatar Aug 27 '24 06:08 reneSchm

Thank you! I mean i don't think its high priority, but as it is in the benchmark debuggging, it becomes unusable. Maybe we just think about a better way to debug the benchmark or implement this. but again, very low priority just something i noticed

xsaschako avatar Aug 28 '24 13:08 xsaschako