fiction
fiction copied to clipboard
:zap: Collection of optimizations and a bug fix for `charge_distribution_surface`
Description
This PR introduces a collection of optimisations, mainly with regard to charge_distribution_surface
.
The commit messages describe the changes that are made.
Most prominently, we have:
- Fixed a bug where if the charge index is changed from i to i', and i' has more leading zeroes than i, then there would be cells that stay unchanged that need to be assigned a negative charge state.
- The system energy is now by default only recomputed for physically valid charge distributions.
- The
validity_check
function now returns as soon as a verdict can be made. - Functions that modify the private storage object can be made
const
apparently. This may be applicable in other files as well. Currently I applied the change incharge_distribution_surface.hpp
andcell_level_layout.hpp
. - Removed an unnecessary mutex lock in
quicksim.hpp
. Also theupdate_after_charge_change()
call before calling the threads can be omitted, since in each thread we first make such a call before checking physical validity. Hence also we have a proper initialisation of the local potentials and system energy before callingadjacent_search
.
Checklist:
- [x] The pull request only contains commits that are related to it.
- [x] I have added appropriate tests and documentation.
- [ ] I have made sure that all CI jobs on GitHub pass.
- [x] The pull request introduces no new warnings and follows the project's style guidelines.
Looks like I missed some tests still. Probably just some silliness
@wlambooy Thank you for your efforts! As soon as it passes the tests, I will take a closer look at it. Please note that const
indicates to the compiler that the function is a read-only operation.
@wlambooy Thank you for your efforts! As soon as it passes the tests, I will take a closer look at it. Please note that
const
indicates to the compiler that the function is a read-only operation.
I missed some consequences of making the empty layout physically valid, hopefully everything passes now.
My intuition is to use keywords like const
as much as possible to help the compiler, since the typing system makes sure you can't overdo it. Yet I would think that a function that modifies a private object cannot be const, yet it is accepted. Do you understand why?
Codecov Report
Merging #325 (22367d9) into main (14d3d71) will decrease coverage by
0.01%
. The diff coverage is100.00%
.
Additional details and impacted files
@@ Coverage Diff @@
## main #325 +/- ##
==========================================
- Coverage 96.04% 96.04% -0.01%
==========================================
Files 102 102
Lines 10075 10059 -16
==========================================
- Hits 9677 9661 -16
Misses 398 398
Files | Coverage Δ | |
---|---|---|
...lation/sidb/exhaustive_ground_state_simulation.hpp | 95.23% <100.00%> (+0.50%) |
:arrow_up: |
.../fiction/algorithms/simulation/sidb/quickexact.hpp | 98.89% <100.00%> (+0.01%) |
:arrow_up: |
...de/fiction/algorithms/simulation/sidb/quicksim.hpp | 98.68% <100.00%> (-0.02%) |
:arrow_down: |
...on/algorithms/simulation/sidb/time_to_solution.hpp | 97.43% <100.00%> (ø) |
|
include/fiction/layouts/bounding_box.hpp | 100.00% <100.00%> (ø) |
|
...fiction/technology/charge_distribution_surface.hpp | 99.37% <100.00%> (-0.02%) |
:arrow_down: |
Continue to review full report in Codecov by Sentry.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 14d3d71...22367d9. Read the comment docs.
I see what you mean, but this is not the best style. Using the keyword const
when the member function obviously changes the logical state of the object violates semantic clarity and logical constness. I would ask you to only use/add const where it is really appropriate. Thank you!
I see what you mean, but this is not the best style. Using the keyword
const
when the member function obviously changes the logical state of the object violates semantic clarity and logical constness. I would ask you to only use/add const where it is really appropriate. Thank you!
My assumption that the C++ compiler catches this turns out to be incorrect, and our example in which a dereferenced pointer to a object is changed show this. I made sure to verify the logical constness in a new commit.
While trying to improve the code coverage, I stumbled on a layout that generates no charge distributions for ExGS, but two for QuickExact. The test looks as follows:
TEMPLATE_TEST_CASE(
"ExGS simulation of two SiDBs placed directly next to each other with non-realistic relative permittivity", "[exhaustive-ground-state-simulation]",
(cell_level_layout<sidb_technology, clocked_layout<cartesian_layout<siqad::coord_t>>>),
(charge_distribution_surface<cell_level_layout<sidb_technology, clocked_layout<cartesian_layout<siqad::coord_t>>>>))
{
TestType lyt{};
lyt.assign_cell_type({1, 3, 0}, TestType::cell_type::NORMAL);
lyt.assign_cell_type({2, 3, 0}, TestType::cell_type::NORMAL);
const quickexact_params<TestType> params{sidb_simulation_parameters{2, -0.32, 1.0e-3}};
const auto simulation_results = quickexact<TestType>(lyt, params);
CHECK(simulation_results.charge_distributions.empty());
}
A degenerate charge distribution is returned by QuickExact with a system energy of -3472.6959461051524. Perhaps QuickExact is not defined on such non-realistic relative permittivity values?
I see what you mean, but this is not the best style. Using the keyword
const
when the member function obviously changes the logical state of the object violates semantic clarity and logical constness. I would ask you to only use/add const where it is really appropriate. Thank you!My assumption that the C++ compiler catches this turns out to be incorrect, and our example in which a dereferenced pointer to a object is changed show this. I made sure to verify the logical constness in a new commit.
While trying to improve the code coverage, I stumbled on a layout that generates no charge distributions for ExGS, but two for QuickExact. The test looks as follows:
TEMPLATE_TEST_CASE( "ExGS simulation of two SiDBs placed directly next to each other with non-realistic relative permittivity", "[exhaustive-ground-state-simulation]", (cell_level_layout<sidb_technology, clocked_layout<cartesian_layout<siqad::coord_t>>>), (charge_distribution_surface<cell_level_layout<sidb_technology, clocked_layout<cartesian_layout<siqad::coord_t>>>>)) { TestType lyt{}; lyt.assign_cell_type({1, 3, 0}, TestType::cell_type::NORMAL); lyt.assign_cell_type({2, 3, 0}, TestType::cell_type::NORMAL); const quickexact_params<TestType> params{sidb_simulation_parameters{2, -0.32, 1.0e-3}}; const auto simulation_results = quickexact<TestType>(lyt, params); CHECK(simulation_results.charge_distributions.empty()); }
A degenerate charge distribution is returned by QuickExact with a system energy of -3472.6959461051524. Perhaps QuickExact is not defined on such non-realistic relative permittivity values?
Thank you for your efforts! 🙏 That's a good example, even though it's not realistic. You are right. I found the problem and will fix it as soon as possible. Just for your information: Be aware of this additional parameter, which can also lead to deviations between both simulators, but not here.
const quickexact_params<TestType> params{sidb_simulation_parameters{2, -0.32, 1.0e-3},
quickexact_params<TestType>::automatic_base_number_detection::OFF};
@wlambooy Thank you very much for your effort! Just klick on re-request review
when you are ready.
@wlambooy many thanks also from my side for the additions! :pray:
@Drewniok thanks to you as well for taking your time reviewing the changes! I'll have a quick scan as well and then it should be ready for merge :muscle:
@wlambooy Thanks again for your efforts! @marcelwa just asked me to check again if your changes do not increase the simulation runtimes. However, when running our recently added benchmark script (fiction -> test -> benchmark -> simulation.cpp), a careful analysis of the results shows that the runtimes increase significantly (~10%), which is unacceptable. As far as I can see, this is not caused by the bugfix, but rather by the other changes.
@wlambooy Thanks again for your efforts! @marcelwa just asked me to check again if your changes do not increase the simulation runtimes. However, when running our recently added benchmark script (fiction -> test -> benchmark -> simulation.cpp), a careful analysis of the results shows that the runtimes increase significantly (~10%), which is unacceptable. As far as I can see, this is not caused by the bugfix, but rather by the other changes.
"Unacceptable" is maybe a bit harsh a statement. We aim for the most performant code possible for any given scenario and we indeed take pride in our achievements. @wlambooy if the benchmark code @Drewniok mentioned helps you figure out where exactly the runtime increase was caused, we'd appreciate a fix. Many thanks!
Thank you for doing the analysis @Drewniok! I will try to uncover what changes makes this performance decrease.
Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with mean
, low mean
and high mean
columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?
Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with
mean
,low mean
andhigh mean
columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?
@wlambooy See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md for a definition of samples, iterations, etc. In our case, the crossing gate will be simulated 100 times due to 100 samples and only 1 iteration each.
Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with
mean
,low mean
andhigh mean
columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?@wlambooy See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md for a definition of samples, iterations, etc. In our case, the crossing gate will be simulated 100 times due to 100 samples and only 1 iteration each.
I think @wlambooy's question was where the output comes from. It will be automatically generated by the benchmark script. Simply pass -DFICTION_BENCHMARK=ON
to your CMake
call and run the resulting binary after compilation in Release
mode.
Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with
mean
,low mean
andhigh mean
columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?@wlambooy See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md for a definition of samples, iterations, etc. In our case, the crossing gate will be simulated 100 times due to 100 samples and only 1 iteration each.
I see, I was running the benchmarks from within CLion so I was getting some XML output that didn't look like the output in the comment. I have to say that on my system at least, I'm always getting quite a spread in the benchmark results, and usually the more often I run it without rebuilding the better the times will get. You can see this in the benchmark results below, which were performed directly in sequence (look at the mean value
s)
QuickExact
<!-- All values in nano seconds -->
<mean value="2.42783e+09" lowerBound="2.35123e+09" upperBound="2.54392e+09" ci="0.95"/>
<standardDeviation value="4.74379e+08" lowerBound="3.47914e+08" upperBound="6.37069e+08" ci="0.95"/>
<outliers variance="0.936384" lowMild="0" lowSevere="0" highMild="2" highSevere="7"/>
QuickSim
<!-- All values in nano seconds -->
<mean value="1.33811e+07" lowerBound="1.31979e+07" upperBound="1.35853e+07" ci="0.95"/>
<standardDeviation value="987767" lowerBound="835378" upperBound="1.19855e+06" ci="0.95"/>
<outliers variance="0.676348" lowMild="2" lowSevere="0" highMild="6" highSevere="1"/>
QuickExact
<!-- All values in nano seconds -->
<mean value="2.11722e+09" lowerBound="2.08929e+09" upperBound="2.15057e+09" ci="0.95"/>
<standardDeviation value="1.55694e+08" lowerBound="1.30843e+08" upperBound="1.96559e+08" ci="0.95"/>
<outliers variance="0.666639" lowMild="0" lowSevere="0" highMild="3" highSevere="0"/>
QuickSim
<!-- All values in nano seconds -->
<mean value="1.06471e+07" lowerBound="1.04801e+07" upperBound="1.08103e+07" ci="0.95"/>
<standardDeviation value="845190" lowerBound="739383" upperBound="995966" ci="0.95"/>
<outliers variance="0.707146" lowMild="2" lowSevere="0" highMild="1" highSevere="0"/>
QuickExact
<!-- All values in nano seconds -->
<mean value="2.01781e+09" lowerBound="2.00066e+09" upperBound="2.04291e+09" ci="0.95"/>
<standardDeviation value="1.04298e+08" lowerBound="7.61848e+07" upperBound="1.52602e+08" ci="0.95"/>
<outliers variance="0.494789" lowMild="0" lowSevere="0" highMild="3" highSevere="2"/>
Quicksim
<!-- All values in nano seconds -->
<mean value="9.83447e+06" lowerBound="9.68097e+06" upperBound="9.98761e+06" ci="0.95"/>
<standardDeviation value="783977" lowerBound="703335" upperBound="901620" ci="0.95"/>
<outliers variance="0.707254" lowMild="0" lowSevere="0" highMild="0" highSevere="0"/>
Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with
mean
,low mean
andhigh mean
columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?@wlambooy See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md for a definition of samples, iterations, etc. In our case, the crossing gate will be simulated 100 times due to 100 samples and only 1 iteration each.
I see, I was running the benchmarks from within CLion so I was getting some XML output that didn't look like the output in the comment. I have to say that on my system at least, I'm always getting quite a spread in the benchmark results, and usually the more often I run it without rebuilding the better the times will get. You can see this in the benchmark results below, which were performed directly in sequence (look at the
mean value
s)QuickExact <!-- All values in nano seconds --> <mean value="2.42783e+09" lowerBound="2.35123e+09" upperBound="2.54392e+09" ci="0.95"/> <standardDeviation value="4.74379e+08" lowerBound="3.47914e+08" upperBound="6.37069e+08" ci="0.95"/> <outliers variance="0.936384" lowMild="0" lowSevere="0" highMild="2" highSevere="7"/> QuickSim <!-- All values in nano seconds --> <mean value="1.33811e+07" lowerBound="1.31979e+07" upperBound="1.35853e+07" ci="0.95"/> <standardDeviation value="987767" lowerBound="835378" upperBound="1.19855e+06" ci="0.95"/> <outliers variance="0.676348" lowMild="2" lowSevere="0" highMild="6" highSevere="1"/> QuickExact <!-- All values in nano seconds --> <mean value="2.11722e+09" lowerBound="2.08929e+09" upperBound="2.15057e+09" ci="0.95"/> <standardDeviation value="1.55694e+08" lowerBound="1.30843e+08" upperBound="1.96559e+08" ci="0.95"/> <outliers variance="0.666639" lowMild="0" lowSevere="0" highMild="3" highSevere="0"/> QuickSim <!-- All values in nano seconds --> <mean value="1.06471e+07" lowerBound="1.04801e+07" upperBound="1.08103e+07" ci="0.95"/> <standardDeviation value="845190" lowerBound="739383" upperBound="995966" ci="0.95"/> <outliers variance="0.707146" lowMild="2" lowSevere="0" highMild="1" highSevere="0"/> QuickExact <!-- All values in nano seconds --> <mean value="2.01781e+09" lowerBound="2.00066e+09" upperBound="2.04291e+09" ci="0.95"/> <standardDeviation value="1.04298e+08" lowerBound="7.61848e+07" upperBound="1.52602e+08" ci="0.95"/> <outliers variance="0.494789" lowMild="0" lowSevere="0" highMild="3" highSevere="2"/> Quicksim <!-- All values in nano seconds --> <mean value="9.83447e+06" lowerBound="9.68097e+06" upperBound="9.98761e+06" ci="0.95"/> <standardDeviation value="783977" lowerBound="703335" upperBound="901620" ci="0.95"/> <outliers variance="0.707254" lowMild="0" lowSevere="0" highMild="0" highSevere="0"/>
It would be best to run the benchmarks from the command line while having all other tasks/programs closed. Particularly your web-browser and CLion are running expensive background tasks that can impact the measurements.
After doing a couple of benchmark runs with pre-built binaries in a more processor-usage sterile environement, I am able to confirm a performance difference with the following findings:
The QuickExact time is slower (+5.4% runtime on average) while QuickSim is faster (-7.1% runtime on average). All benchmark runs were consistent in the differences: each QuickExact benchmark run of the main branch was faster than each QuickExact benchmark run of this branch, and each QuickSim benchmark run of the main branch was slower than each QuickSim benchmark run of this branch.
I'll try to figure out why this is.
@wlambooy Thanks again for your efforts! I know, sometimes it is not so easy to find out why the code is slower. Take your time and don't hesitate to contact us.
Since it would be great to have this bugfix in the main
branch as soon as possible, can I ask you to maybe split this PR into two - one for the bugfix and one for the other changes. What do you think about that? What do you think @marcelwa?
I agree that it would be very helpful to have the bugfix available as quickly as possible.
I was working on more pressing things the last week so I haven't gotten into a code speed investigation. I put the bugfixes in a new PR: #347