qmcpack icon indicating copy to clipboard operation
qmcpack copied to clipboard

NaN Values in Gradients Cause Calculation Abortion Using Mixed Precision with GPU Offload

Open romanfanta4 opened this issue 1 year ago • 2 comments

Describe the bug When running a periodic calculation with 9 twists, 724 electrons, and 44 atoms using the mixed precision version of QMCPACK with GPU offload, the calculation was aborted with the following error:

NaNguard::checkOneParticleGradients error message: TWF::calcRatioGrad at particle 687
  grads[0] = (-nan,0.0418255)
  grads[1] = (-nan,-0.0806002)
  grads[2] = (-nan,0.0412396)
Unexpected exception thrown in threaded section
Fatal Error. Aborting at Unhandled Exception
This issue appears to be related to NaN values in the gradients of the wave function for a specific particle.

The same calculation with full precision ran smoothly without any problems.

To Reproduce Input and output files below: dmc_2x2_single_prec-test.zip

Expected behavior The calculation should complete successfully without encountering NaN values in the wave function gradients, resulting in accurate and stable output data.

System: System name: Perlmutter Modules loaded: module use /global/common/software/nersc/n9/llvm/modules module load craype cray-mpich module load llvm/17.0.6-gpu Other systems where this is reproducible: Not tested on other systems.

Additional context The calculation was performed using the complex version of QMCPACK with NVIDIA GPU and OpenMP offload. No other context or error messages where in the output files.

romanfanta4 avatar May 31 '24 00:05 romanfanta4

Thanks for the report Roman. ~Is this the first run you have tried or are other runs either working or failing for you?~ Any issues with other runs? I see the full precision run of this system was fine.

prckent avatar May 31 '24 13:05 prckent

I tried it first for the larger system and ended up with the same error as for this smaller system. I did not investigated any further. For full precision, I did not run into any issues as you wrote.

romanfanta4 avatar Jun 02 '24 02:06 romanfanta4