memilio
memilio copied to clipboard
757 dont store pointers in the abm
Changes and Information
Please briefly list the changes made, additional Information and what the Reviewer should look out for:
- remove pointers from World, Location, Person
- this may change quite a bit more
Merge Request - Guideline Checklist
Please check our git workflow. Use the draft feature if the Pull Request is not yet ready to review.
Checks by code author
- [x] Every addressed issue is linked (use the "Closes #ISSUE" keyword below)
- [x] New code adheres to coding guidelines
- [x] No large data files have been added (files should in sum not exceed 100 KB, avoid PDFs, Word docs, etc.)
- [x] Tests are added for new functionality and a local test run was successful
- [x] Appropriate documentation for new functionality has been added (Doxygen in the code and Markdown files if necessary)
- [x] Proper attention to licenses, especially no new third-party software with conflicting license has been added
- [x] (For ABM development) Checked benchmark results
abm_benchmark/abm_benchmark_50k 4290 ms 4288 ms 1
abm_benchmark/abm_benchmark_100k 9741 ms 9737 ms 1
abm_benchmark/abm_benchmark_200k 20698 ms 20691 ms 1
Checks by code reviewer(s)
- [ ] Corresponding issue(s) is/are linked and addressed
- [ ] Code is clean of development artifacts (no deactivated or commented code lines, no debugging printouts, etc.)
- [ ] Appropriate unit tests have been added, CI passes and code coverage is acceptable (did not decrease)
- [ ] No large data files added in the whole history of commits(files should in sum not exceed 100 KB, avoid PDFs, Word docs, etc.)
I have a few open items that may need a bit of discussion:
- Naming: I moved several methods around, without changing their names. None of them are particularly bad, but I think there should be better names out there - I would really appreciate some suggestions. Some examples:
-
get_number_persons
returns #persons at a location, which in general is not the same as the size ofget_persons
-
get_subpopulation
is okay, but does not fit well withget_number_persons
- The names
migrate
andinteract
both in the World and as free functions are not really descriptive - The header
functions.h
still needs a more meaningful name. It contains ABM defining free functions (i.e. non class member functions) like migrate and interact.
-
- Caching: Currently, I only cache the population size at each location. As far as I can tell that is the only metric used during simulations, but we are missing e.g. cell populations or persons of a given infection state. The latter would be used by get_subpopulations. Should I stratify the cache or add more caches? If so, which?
- Should we add functions like Person::migrate(_to) back? I don't think so, since we only replace one argument by the calling class, and it adds more code to maintain without adding any functionality.
- For now I kept most tests running, but I think they need a bigger rework. I suggest doing that in another issue, since I barely changed any effective behavior here (but a lot of internals/interfaces). This way we could merge this PR faster.
- I marked the move (assignment) constructor for World as default, so they will just make a copy using the custom copy ctor. I don't think we gain much here by actually moving something, but we could define a custom move ctor anyways.
Codecov Report
Attention: Patch coverage is 97.43590%
with 10 lines
in your changes missing coverage. Please review.
Project coverage is 96.09%. Comparing base (
9cf30ee
) to head (e380124
). Report is 3 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
cpp/models/abm/model_functions.cpp | 92.42% | 5 Missing :warning: |
cpp/models/abm/world.h | 94.64% | 3 Missing :warning: |
cpp/models/abm/person.h | 88.88% | 1 Missing :warning: |
cpp/models/abm/world.cpp | 99.04% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #863 +/- ##
==========================================
- Coverage 96.24% 96.09% -0.15%
==========================================
Files 127 131 +4
Lines 10887 11046 +159
==========================================
+ Hits 10478 10615 +137
- Misses 409 431 +22
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
On performance: There are some small improvements, but unsurprisingly nothing big. Here is single threaded (OMP_NUM_THREADS=1) performance of the current main branch as reference:
Running ../build/bin/abm_benchmark
Run on (56 X 3700 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x56)
L1 Instruction 32 KiB (x56)
L2 Unified 1024 KiB (x56)
L3 Unified 19712 KiB (x4)
Load Average: 1.03, 2.78, 1.36
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
abm_benchmark/abm_benchmark_50k 4617 ms 4615 ms 1
abm_benchmark/abm_benchmark_100k 10602 ms 10598 ms 1
abm_benchmark/abm_benchmark_200k 23102 ms 23094 ms 1
All benchmarks are made on the HPDA Cluster using one node (with --exclusive). I will leave out the preamble now, since it does not change (apart from the load average, which is usually less then 2.00).
Here are the current results for this PR:
abm_benchmark/abm_benchmark_50k 4290 ms 4288 ms 1
abm_benchmark/abm_benchmark_100k 9741 ms 9737 ms 1
abm_benchmark/abm_benchmark_200k 20698 ms 20691 ms 1
So we got very roughly a 5-10% performance gain.
Now to the parallel performance. This is more difficult to measure, as the timing results seem somewhat random. Perhaps this is due to the CPU scaling, but at least it is consistent in both branches. Here is an example for 20 threads, launched with
(OMP_NUM_THREADS=20 ./build/bin/abm_benchmark) && (OMP_NUM_THREADS=20 ./build/bin/abm_benchmark)
, i.e. the second benchmark is run immediately after the first:
abm_benchmark/abm_benchmark_50k 2078 ms 2077 ms 1
abm_benchmark/abm_benchmark_100k 4364 ms 4363 ms 1
abm_benchmark/abm_benchmark_200k 9036 ms 9033 ms 1
---------------------------------------------------------------------------
abm_benchmark/abm_benchmark_50k 1115 ms 1115 ms 1
abm_benchmark/abm_benchmark_100k 2354 ms 2353 ms 1
abm_benchmark/abm_benchmark_200k 5316 ms 5312 ms 1
Note that the scaling seems to be consistent with the first run's time. But that time changes randomly every time. Also, the second run may be the slower of the two, or equally as fast. I see no pattern here.
Anyways, to get just a quick idea how well this branch scales, I ran 10 benchmarks and picked only the fastest result:
--1-thread-----------------------------------------------------------------
abm_benchmark/abm_benchmark_50k 4267 ms 4264 ms 1
abm_benchmark/abm_benchmark_100k 9820 ms 9816 ms 1
abm_benchmark/abm_benchmark_200k 20881 ms 20874 ms 1
--2-threads----------------------------------------------------------------
abm_benchmark/abm_benchmark_50k 2368 ms 2366 ms 1
abm_benchmark/abm_benchmark_100k 5415 ms 5412 ms 1
abm_benchmark/abm_benchmark_200k 11600 ms 11596 ms 1
--4-threads----------------------------------------------------------------
abm_benchmark/abm_benchmark_50k 1676 ms 1675 ms 1
abm_benchmark/abm_benchmark_100k 3839 ms 3837 ms 1
abm_benchmark/abm_benchmark_200k 8238 ms 8232 ms 1
--8-threads----------------------------------------------------------------
abm_benchmark/abm_benchmark_50k 1364 ms 1363 ms 1
abm_benchmark/abm_benchmark_100k 2914 ms 2913 ms 1
abm_benchmark/abm_benchmark_200k 6412 ms 6391 ms 1
-16-threads----------------------------------------------------------------
abm_benchmark/abm_benchmark_50k 931 ms 930 ms 1
abm_benchmark/abm_benchmark_100k 1918 ms 1917 ms 1
abm_benchmark/abm_benchmark_200k 4798 ms 4791 ms 1
-32-threads----------------------------------------------------------------
abm_benchmark/abm_benchmark_50k 1068 ms 1068 ms 1
abm_benchmark/abm_benchmark_100k 2264 ms 2263 ms 1
abm_benchmark/abm_benchmark_200k 5218 ms 5214 ms 1
We could omit the parameter LocationType in a lot of places where it is only necessary to identify the location, as the Id is unique for a location.
My IDE usually complaints if there are unused variables, so I don't think that happens too often in our code. But we probably could rework e.g. the testing strategy, so that it stores two vectors, one for location id and one for type. This way we could also remove all the find-ifs and use constant time lookup instead. But again, I would prefer opening a new issue instead of doing this in here as well.
Quick remark:
In the benchmark get_subpopulation_combined is very slow, this maybe needs to be investigated.
In the benchmark get_subpopulation_combined is very slow, this maybe needs to be investigated.
This is partially intentional, as get_subpopulation needs to iterate over all persons instead of all persons at the given location, which we currently do not cache. Although get_subpopulation_combined is in fact a lot slower than it has to be right now, instead of calling get_subpopulation it should iterate over all persons exactly once. Something like:
size_t World::get_subpopulation_combined(TimePoint t, InfectionState s) const
{
return std::count_if(m_persons.begin(), m_persons.end(), [&](const auto& p) {
return p.get_infection_state(t) == s;
});
}
Thank you! I mean i don't think its high priority, but as it is in the benchmark debuggging, it becomes unusable. Maybe we just think about a better way to debug the benchmark or implement this. but again, very low priority just something i noticed