E3SM
E3SM copied to clipboard
SHR_REPROSUM_CALC errors on Crusher
A case with F2010 and ne30pg2_r05_oECv3 on Crusher failed with following error message:
SHR_REPROSUM_CALC: Input contains 0.30000E+01 NaNs and 0.00000E+00 INFs on process 3
This error occurs on E3SM builds using each of Cray, AMD, and GNU compilers available on Crusher.
Interestingly, the F2010/ne30pg2_r05_oECv3 case runs successfully on Summit system where input data are shared with Crusher.
The error message seems to come from "E3SM/share/util/shr_reprosum_mod.F90 but it was hard to locate a source code line that this routine is called from.
Does anyone have seen this error?
The branch that I am working on is ykim/crusher/craydebug, a debug branch branched off from a recent master branch.
Just noting that we should try this case with GNU on a x86 machine and see if we encounter this issue.
This error is not showing with PrgEnv-cray/8.3.3, PrgEnv-amd/8.3.3, and PrgEnv-gnu/8.3.3. It is not evident if these modules fixed this error or not. It may be possible that a E3SM compiler configuration may cause this error. Because we have no issue now, I will close it and re-open if needed.
Just to document:
- for runs with GNU, reprosum NaNs went away after adding
-fno-inline-arg-packing
toeam/src/dynamics/se/inidat.F90
in Depends.[gnu,gnugpu].cmake in PR E3SM-Project/E3SM#5132 - for runs with Cray, the issue was fixed by adding
-hipa0 -hzero
to FFLAGS incrayclang_crusher.cmake
in E3SM-Project/E3SM#5208