qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

Energy density estimator tests near 100% fail rate

Open prckent opened this issue 8 months ago • 14 comments

Describe the bug

Looking at the month long statistics for tests named energydensity, the fail rate is near 100% https://cdash.qmcpack.org/testOverview.php?project=QMCPACK&begin=2025-04-01&end=2025-04-27&filtercount=1&showfilters=1&field1=testname&compare1=63&value1=energydensity

These tests are mostly for legacy where the code has not been intentionally changed in months. Possible explanations include unintentional breakage of the tests, of the underlying code, or perhaps the tests &/or code were simply not robust after the last time they were "fixed". We could also have h5py issues similar to https://github.com/QMCPACK/qmcpack/pull/3367 Most of the reported numerics are close to passing, so larger error bars could also be part of the solution.

To Reproduce

ctest -R energydensity , any recent version

Expected behavior

Test should pass; we should have confidence in energy density.

System:

Any

Additional context

Did not yet try to bisect failure or check historical PR discussions.

Note that sometimes the tests do pass e.g. https://cdash.qmcpack.org/testSummary.php?project=1&name=short-diamondC_1x1x1_pp-vmc-estimator-energydensity-cell-4-4-check&date=2025-04-25 has 3 of 18 runs passing in VMC.

The earliest cdash results we have are in the last week of February and the tests were failing then https://cdash.qmcpack.org/testOverview.php?project=QMCPACK&begin=2025-02-23&end=2025-02-28&filtercount=1&showfilters=1&field1=testname&compare1=63&value1=energydensity

prckent avatar Apr 27 '25 19:04 prckent

Possibly we didn't follow-up and fix the underlying problems after fixing check_stats.py (bisecting from v3.17.1 to today):

The following tests FAILED:
	431 - short-LiH_ae-vmc_hf_noj_estimator_energydensity_voronoi-4-4-check (Failed)
	547 - short-LiH_pp-vmc_hf_sdj_estimator_energydensity_voronoi-4-4-check (Failed)
	1240 - short-diamondC_1x1x1_pp-vmc-estimator-energydensity-cell-4-4-check (Failed)
	1242 - short-diamondC_1x1x1_pp-vmc-estimator-energydensity-voronoi-4-4-check (Failed)
	1244 - short-diamondC_1x1x1_pp-dmc-estimator-energydensity-cell-4-4-check (Failed)
Errors while running CTest
313cf7ece465007b72d42164a7f3c7ad7e10e767 is the first bad commit
commit 313cf7ece465007b72d42164a7f3c7ad7e10e767
Author: Ye Luo <[email protected]>
Date:   Wed May 15 13:17:27 2024 -0500

    check_stats.py properly exits when h5py read fails.

 tests/scripts/check_stats.py | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
bisect found first bad commit

prckent avatar Apr 27 '25 21:04 prckent

This estimator definitely worked at the end of 2022. I started looking into what was going on once Gani flagged the issue (#5224), but didn't dig very far. The tests have a deterministic component that should always work (checksums vs the total energy).

What I remember is that zeros were being written into the stat.h5 file. This might suggest zeros are being passed through TraceManager.

A good first check would be to enable the in-situ energy sum checks for the energy density (see TRACE_CHECK environment variable). If zero values are being passed through TraceManager, then this will catch it. If this check fails, then the per-particle quantities are not being produced by the individual energy components (coulomb potential, etc), indicating the request from the user via input is not being propagated down through TraceManager to the potentials, etc.

If the above check passes, then a further in-situ test can be enabled in the energy density estimator itself (see environment variable ENERGYDENSITY_CHECK). If this check fails, the problem resides in the estimator itself. If it passes, then the issue resides between the estimator and the stat.h5 output.

jtkrogel avatar Apr 28 '25 13:04 jtkrogel

I will note a number of changes to how HDF was handled in the code were made during this time period (see https://github.com/QMCPACK/qmcpack/commits/develop/src/QMCHamiltonians/EnergyDensityEstimator.cpp), including some that affected TraceManager (see https://github.com/QMCPACK/qmcpack/pull/4995).

jtkrogel avatar Apr 28 '25 14:04 jtkrogel

Very few changes were made to TraceManager itself (see https://github.com/QMCPACK/qmcpack/commits/develop/src/Estimators/TraceManager.h and https://github.com/QMCPACK/qmcpack/commits/develop/src/Estimators/TraceManager.cpp )

jtkrogel avatar Apr 28 '25 14:04 jtkrogel

I believe this is the last confirmation noted that the energy density estimator was working on CDash: https://github.com/QMCPACK/qmcpack/pull/3294

It surely worked long after this, but passing tests were observed at that time. For some reason, the cdash links no longer work: https://cdash.qmcpack.org/CDash/testDetails.php?test=14660932&build=229279 https://cdash.qmcpack.org/CDash/testDetails.php?test=14379785&build=229242

The log output of the python test script would confirm this. Perhaps we could find the log output from July 2021, and then bisect the cdash log output (visually) from there?

jtkrogel avatar Apr 28 '25 14:04 jtkrogel

At least one of the energy density tests (diamond cell grid for DMC) appears to be very close to passing currently:

https://cdash.qmcpack.org/tests/22015901?graph=status

All the statistical tests pass, as well as a deterministic block sum check. Only the deterministic comparison with the scalar.dat file fails:

  Energy density sums vs. scalar file per block tests:
    E 0 -10.533540356798579!=-10.53344467223
    some per block sums do not match the scalar file
    status of this test: fail

This disagreement could result from a difference in how weights are handled between scalar.dat and stat.h5, if it exists. I would in general expect quantities from scalar.dat and stat.h5 to deterministically agree with each other to high precision.

jtkrogel avatar Apr 28 '25 14:04 jtkrogel

The voronoi tests on LiH (ae and pp) have most tests passing also, but show a different pattern. All the deterministic checks pass, as well as the statistical tests on the kinetic energy and weight. The statistical test on the total potential energy passes, but the partial sum checks (the potential energy on each atom) fail with a statistical significance between 8 and 199 sigma:

https://cdash.qmcpack.org/tests/22015433 https://cdash.qmcpack.org/tests/22015552

E.g.:

  Testing quantity: V full sum
    reference mean value     :  -1.45471078
    reference error bar      :   0.00224560
    computed  mean value     :  -1.45494147
    computed  error bar      :   0.00255507
    pass tolerance           :   0.00673680  (  3.00000000 sigma)
    deviation from reference :  -0.00023069  ( -0.10273063 sigma)
    error bar of deviation   :   0.00340163
    significance probability :   0.08182334  (gaussian statistics)
    status of this test      :   pass

  Testing quantity: V partial sum 0
    reference mean value     :  -0.06180385
    reference error bar      :   0.00082984
    computed  mean value     :  -0.22734097
    computed  error bar      :   0.00070774
    pass tolerance           :   0.00265958  (  3.20494568 sigma)
    deviation from reference :  -0.16553712  (-199.48169192 sigma)
    error bar of deviation   :   0.00109065
    significance probability :   1.00000000  (gaussian statistics)
    status of this test      :   fail

  Testing quantity: V partial sum 1
    reference mean value     :  -1.39290693
    reference error bar      :   0.00222893
    computed  mean value     :  -1.22760050
    computed  error bar      :   0.00252165
    pass tolerance           :   0.00714360  (  3.20494568 sigma)
    deviation from reference :   0.16530643  ( 74.16407161 sigma)
    error bar of deviation   :   0.00336554
    significance probability :   1.00000000  (gaussian statistics)
    status of this test      :   fail

jtkrogel avatar Apr 28 '25 14:04 jtkrogel

The VMC diamond cell grid test shows intermittent passing, unlike the others:

https://cdash.qmcpack.org/tests/22015955?graph=status

One reason this statistical test is less stable than others is that it involves 24 statistical checks instead of one. This makes it much more likely to fail statistically. A larger nsigma criterion would be an appropriate fix for this one (and should be applied to the others above once a fix is identified).

jtkrogel avatar Apr 28 '25 14:04 jtkrogel

The above tests are for GCC14 on nitrogen. Other environments also given intermittent passing results for the diamond VMC case:

sulfur GCC14 and Clang20 nitrogen2 GCC14 and Clang20 bora Intel21

jtkrogel avatar Apr 28 '25 14:04 jtkrogel

In short, the following issues need investigating (in probable priority order):

  • [ ] Why do scalar.dat and stat.h5 give deterministically different quantities for DMC? (likely an issue broader than energy density)
  • [ ] Why does voronoi energy density give incorrect values for the potential energy partitioning among atoms? (suspicious since energy density code has not changed for a long time, indicating code changes elsewhere are responsible, and hence possibly relating to a broader issue)
  • [ ] Does the energy density function in broader/realistic cases? I.e. why did Gani see zeros, and is this behavior still happening?

jtkrogel avatar Apr 28 '25 15:04 jtkrogel

OK, lets discuss when @PDoakORNL is back since he may have seen some additional problems while working on the batched version.

Note that unfortunately we don't have cdash data older than ~3 months. The database was getting too large.

prckent avatar Apr 28 '25 16:04 prckent

I'm back now and will look into this again. I had fixed these tests in legacy but I'm not sure if it was PR'd in incompletely or if this is a regression. Hopefully less burnt out now post vacation I will be able to get these straightened out. I can certainly attest that in the last couple months I have had the legacy energy density estimator passing its tests #5258 was my attempt to get this fix into develop. My batched code in merged into several versions of develop was passing those same tests. However pre #5258 legacy had been broken simply because the hdf5 file organization had been inadvertently changed for a long time.

The current develop batched EDE is not "deterministically" correct at all in develop I don't know how many sub PR's that's going to take since I'm trying to get batched structure factor entirely in first. So for now I would entirely ignore what's going on with batched.

PDoakORNL avatar May 05 '25 15:05 PDoakORNL

Thanks Peter. I suggest that after structure factor we make a list of problems with energy density and its tests, what we know about when they were last working etc. and then meet to work out how to go about fixing them. e.g. Can we setup some extra deterministic integration or (better) unit tests that pass in a known good version of legacy?

prckent avatar May 05 '25 17:05 prckent

It is worth adding a deterministic test that should pass to high precision (regardless of run length): in post-processing, sum the energy density components and verify that they match the total energy reported in the dat files.

The current integration tests do this, but they also bundle in statistical tests which obscures the cause when just looking at the red/green indicator level.

jtkrogel avatar May 16 '25 13:05 jtkrogel