fiction icon indicating copy to clipboard operation
fiction copied to clipboard

:zap: Collection of optimizations and a bug fix for `charge_distribution_surface`

Open wlambooy opened this issue 8 months ago • 21 comments

Description

This PR introduces a collection of optimisations, mainly with regard to charge_distribution_surface. The commit messages describe the changes that are made.

Most prominently, we have:

  • Fixed a bug where if the charge index is changed from i to i', and i' has more leading zeroes than i, then there would be cells that stay unchanged that need to be assigned a negative charge state.
  • The system energy is now by default only recomputed for physically valid charge distributions.
  • The validity_check function now returns as soon as a verdict can be made.
  • Functions that modify the private storage object can be made const apparently. This may be applicable in other files as well. Currently I applied the change in charge_distribution_surface.hpp and cell_level_layout.hpp.
  • Removed an unnecessary mutex lock in quicksim.hpp. Also the update_after_charge_change() call before calling the threads can be omitted, since in each thread we first make such a call before checking physical validity. Hence also we have a proper initialisation of the local potentials and system energy before calling adjacent_search.

Checklist:

  • [x] The pull request only contains commits that are related to it.
  • [x] I have added appropriate tests and documentation.
  • [ ] I have made sure that all CI jobs on GitHub pass.
  • [x] The pull request introduces no new warnings and follows the project's style guidelines.

wlambooy avatar Nov 05 '23 15:11 wlambooy

Looks like I missed some tests still. Probably just some silliness

wlambooy avatar Nov 05 '23 15:11 wlambooy

@wlambooy Thank you for your efforts! As soon as it passes the tests, I will take a closer look at it. Please note that const indicates to the compiler that the function is a read-only operation.

Drewniok avatar Nov 05 '23 15:11 Drewniok

@wlambooy Thank you for your efforts! As soon as it passes the tests, I will take a closer look at it. Please note that const indicates to the compiler that the function is a read-only operation.

I missed some consequences of making the empty layout physically valid, hopefully everything passes now. My intuition is to use keywords like const as much as possible to help the compiler, since the typing system makes sure you can't overdo it. Yet I would think that a function that modifies a private object cannot be const, yet it is accepted. Do you understand why?

wlambooy avatar Nov 05 '23 18:11 wlambooy

Codecov Report

Merging #325 (22367d9) into main (14d3d71) will decrease coverage by 0.01%. The diff coverage is 100.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #325      +/-   ##
==========================================
- Coverage   96.04%   96.04%   -0.01%     
==========================================
  Files         102      102              
  Lines       10075    10059      -16     
==========================================
- Hits         9677     9661      -16     
  Misses        398      398              
Files Coverage Δ
...lation/sidb/exhaustive_ground_state_simulation.hpp 95.23% <100.00%> (+0.50%) :arrow_up:
.../fiction/algorithms/simulation/sidb/quickexact.hpp 98.89% <100.00%> (+0.01%) :arrow_up:
...de/fiction/algorithms/simulation/sidb/quicksim.hpp 98.68% <100.00%> (-0.02%) :arrow_down:
...on/algorithms/simulation/sidb/time_to_solution.hpp 97.43% <100.00%> (ø)
include/fiction/layouts/bounding_box.hpp 100.00% <100.00%> (ø)
...fiction/technology/charge_distribution_surface.hpp 99.37% <100.00%> (-0.02%) :arrow_down:

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 14d3d71...22367d9. Read the comment docs.

codecov[bot] avatar Nov 05 '23 18:11 codecov[bot]

I see what you mean, but this is not the best style. Using the keyword const when the member function obviously changes the logical state of the object violates semantic clarity and logical constness. I would ask you to only use/add const where it is really appropriate. Thank you!

Drewniok avatar Nov 06 '23 06:11 Drewniok

I see what you mean, but this is not the best style. Using the keyword const when the member function obviously changes the logical state of the object violates semantic clarity and logical constness. I would ask you to only use/add const where it is really appropriate. Thank you!

My assumption that the C++ compiler catches this turns out to be incorrect, and our example in which a dereferenced pointer to a object is changed show this. I made sure to verify the logical constness in a new commit.

While trying to improve the code coverage, I stumbled on a layout that generates no charge distributions for ExGS, but two for QuickExact. The test looks as follows:

TEMPLATE_TEST_CASE(
    "ExGS simulation of two SiDBs placed directly next to each other with non-realistic relative permittivity", "[exhaustive-ground-state-simulation]",
    (cell_level_layout<sidb_technology, clocked_layout<cartesian_layout<siqad::coord_t>>>),
    (charge_distribution_surface<cell_level_layout<sidb_technology, clocked_layout<cartesian_layout<siqad::coord_t>>>>))
{
    TestType lyt{};
    lyt.assign_cell_type({1, 3, 0}, TestType::cell_type::NORMAL);
    lyt.assign_cell_type({2, 3, 0}, TestType::cell_type::NORMAL);

    const quickexact_params<TestType> params{sidb_simulation_parameters{2, -0.32, 1.0e-3}};

    const auto simulation_results = quickexact<TestType>(lyt, params);

    CHECK(simulation_results.charge_distributions.empty());
}

A degenerate charge distribution is returned by QuickExact with a system energy of -3472.6959461051524. Perhaps QuickExact is not defined on such non-realistic relative permittivity values?

wlambooy avatar Nov 06 '23 14:11 wlambooy

I see what you mean, but this is not the best style. Using the keyword const when the member function obviously changes the logical state of the object violates semantic clarity and logical constness. I would ask you to only use/add const where it is really appropriate. Thank you!

My assumption that the C++ compiler catches this turns out to be incorrect, and our example in which a dereferenced pointer to a object is changed show this. I made sure to verify the logical constness in a new commit.

While trying to improve the code coverage, I stumbled on a layout that generates no charge distributions for ExGS, but two for QuickExact. The test looks as follows:

TEMPLATE_TEST_CASE(
    "ExGS simulation of two SiDBs placed directly next to each other with non-realistic relative permittivity", "[exhaustive-ground-state-simulation]",
    (cell_level_layout<sidb_technology, clocked_layout<cartesian_layout<siqad::coord_t>>>),
    (charge_distribution_surface<cell_level_layout<sidb_technology, clocked_layout<cartesian_layout<siqad::coord_t>>>>))
{
    TestType lyt{};
    lyt.assign_cell_type({1, 3, 0}, TestType::cell_type::NORMAL);
    lyt.assign_cell_type({2, 3, 0}, TestType::cell_type::NORMAL);

    const quickexact_params<TestType> params{sidb_simulation_parameters{2, -0.32, 1.0e-3}};

    const auto simulation_results = quickexact<TestType>(lyt, params);

    CHECK(simulation_results.charge_distributions.empty());
}

A degenerate charge distribution is returned by QuickExact with a system energy of -3472.6959461051524. Perhaps QuickExact is not defined on such non-realistic relative permittivity values?

Thank you for your efforts! 🙏 That's a good example, even though it's not realistic. You are right. I found the problem and will fix it as soon as possible. Just for your information: Be aware of this additional parameter, which can also lead to deviations between both simulators, but not here.

    const quickexact_params<TestType> params{sidb_simulation_parameters{2, -0.32, 1.0e-3},
                                             quickexact_params<TestType>::automatic_base_number_detection::OFF};

Drewniok avatar Nov 06 '23 14:11 Drewniok

@wlambooy Thank you very much for your effort! Just klick on re-request review when you are ready.

Drewniok avatar Nov 21 '23 06:11 Drewniok

@wlambooy many thanks also from my side for the additions! :pray:

@Drewniok thanks to you as well for taking your time reviewing the changes! I'll have a quick scan as well and then it should be ready for merge :muscle:

marcelwa avatar Nov 27 '23 10:11 marcelwa

@wlambooy Thanks again for your efforts! @marcelwa just asked me to check again if your changes do not increase the simulation runtimes. However, when running our recently added benchmark script (fiction -> test -> benchmark -> simulation.cpp), a careful analysis of the results shows that the runtimes increase significantly (~10%), which is unacceptable. As far as I can see, this is not caused by the bugfix, but rather by the other changes.

Drewniok avatar Nov 27 '23 12:11 Drewniok

@wlambooy Thanks again for your efforts! @marcelwa just asked me to check again if your changes do not increase the simulation runtimes. However, when running our recently added benchmark script (fiction -> test -> benchmark -> simulation.cpp), a careful analysis of the results shows that the runtimes increase significantly (~10%), which is unacceptable. As far as I can see, this is not caused by the bugfix, but rather by the other changes.

"Unacceptable" is maybe a bit harsh a statement. We aim for the most performant code possible for any given scenario and we indeed take pride in our achievements. @wlambooy if the benchmark code @Drewniok mentioned helps you figure out where exactly the runtime increase was caused, we'd appreciate a fix. Many thanks!

marcelwa avatar Nov 27 '23 12:11 marcelwa

Thank you for doing the analysis @Drewniok! I will try to uncover what changes makes this performance decrease.

wlambooy avatar Nov 27 '23 13:11 wlambooy

Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with mean, low mean and high mean columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?

wlambooy avatar Nov 27 '23 13:11 wlambooy

Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with mean, low mean and high mean columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?

@wlambooy See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md for a definition of samples, iterations, etc. In our case, the crossing gate will be simulated 100 times due to 100 samples and only 1 iteration each.

Drewniok avatar Nov 27 '23 13:11 Drewniok

Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with mean, low mean and high mean columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?

@wlambooy See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md for a definition of samples, iterations, etc. In our case, the crossing gate will be simulated 100 times due to 100 samples and only 1 iteration each.

I think @wlambooy's question was where the output comes from. It will be automatically generated by the benchmark script. Simply pass -DFICTION_BENCHMARK=ON to your CMake call and run the resulting binary after compilation in Release mode.

marcelwa avatar Nov 27 '23 13:11 marcelwa

Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with mean, low mean and high mean columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?

@wlambooy See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md for a definition of samples, iterations, etc. In our case, the crossing gate will be simulated 100 times due to 100 samples and only 1 iteration each.

I see, I was running the benchmarks from within CLion so I was getting some XML output that didn't look like the output in the comment. I have to say that on my system at least, I'm always getting quite a spread in the benchmark results, and usually the more often I run it without rebuilding the better the times will get. You can see this in the benchmark results below, which were performed directly in sequence (look at the mean values)

QuickExact
<!-- All values in nano seconds -->
<mean value="2.42783e+09" lowerBound="2.35123e+09" upperBound="2.54392e+09" ci="0.95"/>
<standardDeviation value="4.74379e+08" lowerBound="3.47914e+08" upperBound="6.37069e+08" ci="0.95"/>
<outliers variance="0.936384" lowMild="0" lowSevere="0" highMild="2" highSevere="7"/>

QuickSim
<!-- All values in nano seconds -->
<mean value="1.33811e+07" lowerBound="1.31979e+07" upperBound="1.35853e+07" ci="0.95"/>
<standardDeviation value="987767" lowerBound="835378" upperBound="1.19855e+06" ci="0.95"/>
<outliers variance="0.676348" lowMild="2" lowSevere="0" highMild="6" highSevere="1"/>


QuickExact
<!-- All values in nano seconds -->
<mean value="2.11722e+09" lowerBound="2.08929e+09" upperBound="2.15057e+09" ci="0.95"/>
<standardDeviation value="1.55694e+08" lowerBound="1.30843e+08" upperBound="1.96559e+08" ci="0.95"/>
<outliers variance="0.666639" lowMild="0" lowSevere="0" highMild="3" highSevere="0"/>

QuickSim
<!-- All values in nano seconds -->
<mean value="1.06471e+07" lowerBound="1.04801e+07" upperBound="1.08103e+07" ci="0.95"/>
<standardDeviation value="845190" lowerBound="739383" upperBound="995966" ci="0.95"/>
<outliers variance="0.707146" lowMild="2" lowSevere="0" highMild="1" highSevere="0"/>


QuickExact
<!-- All values in nano seconds -->
<mean value="2.01781e+09" lowerBound="2.00066e+09" upperBound="2.04291e+09" ci="0.95"/>
<standardDeviation value="1.04298e+08" lowerBound="7.61848e+07" upperBound="1.52602e+08" ci="0.95"/>
<outliers variance="0.494789" lowMild="0" lowSevere="0" highMild="3" highSevere="2"/>

Quicksim
<!-- All values in nano seconds -->
<mean value="9.83447e+06" lowerBound="9.68097e+06" upperBound="9.98761e+06" ci="0.95"/>
<standardDeviation value="783977" lowerBound="703335" upperBound="901620" ci="0.95"/>
<outliers variance="0.707254" lowMild="0" lowSevere="0" highMild="0" highSevere="0"/>

wlambooy avatar Nov 27 '23 13:11 wlambooy

Also, @Drewniok, can I ask what your benchmark analysis process looks like so that I can replicate it? I see the simulation benchmark file contains a comment with mean, low mean and high mean columns, are these the results from 3 separate benchmark runs, or are they the mean/low/high from a set of more benchmark runs?

@wlambooy See https://github.com/catchorg/Catch2/blob/devel/docs/benchmarks.md for a definition of samples, iterations, etc. In our case, the crossing gate will be simulated 100 times due to 100 samples and only 1 iteration each.

I see, I was running the benchmarks from within CLion so I was getting some XML output that didn't look like the output in the comment. I have to say that on my system at least, I'm always getting quite a spread in the benchmark results, and usually the more often I run it without rebuilding the better the times will get. You can see this in the benchmark results below, which were performed directly in sequence (look at the mean values)

QuickExact
<!-- All values in nano seconds -->
<mean value="2.42783e+09" lowerBound="2.35123e+09" upperBound="2.54392e+09" ci="0.95"/>
<standardDeviation value="4.74379e+08" lowerBound="3.47914e+08" upperBound="6.37069e+08" ci="0.95"/>
<outliers variance="0.936384" lowMild="0" lowSevere="0" highMild="2" highSevere="7"/>

QuickSim
<!-- All values in nano seconds -->
<mean value="1.33811e+07" lowerBound="1.31979e+07" upperBound="1.35853e+07" ci="0.95"/>
<standardDeviation value="987767" lowerBound="835378" upperBound="1.19855e+06" ci="0.95"/>
<outliers variance="0.676348" lowMild="2" lowSevere="0" highMild="6" highSevere="1"/>


QuickExact
<!-- All values in nano seconds -->
<mean value="2.11722e+09" lowerBound="2.08929e+09" upperBound="2.15057e+09" ci="0.95"/>
<standardDeviation value="1.55694e+08" lowerBound="1.30843e+08" upperBound="1.96559e+08" ci="0.95"/>
<outliers variance="0.666639" lowMild="0" lowSevere="0" highMild="3" highSevere="0"/>

QuickSim
<!-- All values in nano seconds -->
<mean value="1.06471e+07" lowerBound="1.04801e+07" upperBound="1.08103e+07" ci="0.95"/>
<standardDeviation value="845190" lowerBound="739383" upperBound="995966" ci="0.95"/>
<outliers variance="0.707146" lowMild="2" lowSevere="0" highMild="1" highSevere="0"/>


QuickExact
<!-- All values in nano seconds -->
<mean value="2.01781e+09" lowerBound="2.00066e+09" upperBound="2.04291e+09" ci="0.95"/>
<standardDeviation value="1.04298e+08" lowerBound="7.61848e+07" upperBound="1.52602e+08" ci="0.95"/>
<outliers variance="0.494789" lowMild="0" lowSevere="0" highMild="3" highSevere="2"/>

Quicksim
<!-- All values in nano seconds -->
<mean value="9.83447e+06" lowerBound="9.68097e+06" upperBound="9.98761e+06" ci="0.95"/>
<standardDeviation value="783977" lowerBound="703335" upperBound="901620" ci="0.95"/>
<outliers variance="0.707254" lowMild="0" lowSevere="0" highMild="0" highSevere="0"/>

It would be best to run the benchmarks from the command line while having all other tasks/programs closed. Particularly your web-browser and CLion are running expensive background tasks that can impact the measurements.

marcelwa avatar Nov 27 '23 13:11 marcelwa

After doing a couple of benchmark runs with pre-built binaries in a more processor-usage sterile environement, I am able to confirm a performance difference with the following findings:

The QuickExact time is slower (+5.4% runtime on average) while QuickSim is faster (-7.1% runtime on average). All benchmark runs were consistent in the differences: each QuickExact benchmark run of the main branch was faster than each QuickExact benchmark run of this branch, and each QuickSim benchmark run of the main branch was slower than each QuickSim benchmark run of this branch.

I'll try to figure out why this is.

wlambooy avatar Nov 27 '23 14:11 wlambooy

@wlambooy Thanks again for your efforts! I know, sometimes it is not so easy to find out why the code is slower. Take your time and don't hesitate to contact us. Since it would be great to have this bugfix in the main branch as soon as possible, can I ask you to maybe split this PR into two - one for the bugfix and one for the other changes. What do you think about that? What do you think @marcelwa?

Drewniok avatar Dec 04 '23 12:12 Drewniok

I agree that it would be very helpful to have the bugfix available as quickly as possible.

marcelwa avatar Dec 04 '23 12:12 marcelwa

I was working on more pressing things the last week so I haven't gotten into a code speed investigation. I put the bugfixes in a new PR: #347

wlambooy avatar Dec 05 '23 20:12 wlambooy