E3SM icon indicating copy to clipboard operation
E3SM copied to clipboard

segfault with various DEBUG land cases

Open ndkeen opened this issue 2 years ago • 4 comments

When I try to run a recent test added to nightly set (which is ERS) in DEBUG, I see segfaults on cori-haswell and cori-knl using either Intel or GNU.

SMS_D.hcru_hcru.I20TRGSWCNPRDCTCBC.cori-haswell_intel.elm-erosion
SMS_D.hcru_hcru.I20TRGSWCNPRDCTCBC.cori-haswell_gnu.elm-erosion

SMS_D.hcru_hcru.I20TRGSWCNPRDCTCBC.cori-knl_inel.elm-erosion
SMS_D.hcru_hcru.I20TRGSWCNPRDCTCBC.cori-knl_gnu.elm-erosion

I only get a stack trace with one of those -- cori-haswell and Intel. The GNU cases do not give a stack trace, so assuming it is failing on same line. Same for cori-knl/Intel -- fails without a stack.

165: forrtl: error (65): floating invalid
165: Image              PC                Routine            Line        Source
165: e3sm.exe           00000000070C9944  Unknown               Unknown  Unknown
165: e3sm.exe           00000000069F4620  Unknown               Unknown  Unknown
165: e3sm.exe           0000000000ECF35A  subgridavemod_mp_        1075  subgridAveMod.F90
165: e3sm.exe           0000000002B0EC27  cnpbudgetmod_mp_c        1148  CNPBudgetMod.F90
165: e3sm.exe           0000000002B0E20C  cnpbudgetmod_mp_c        1085  CNPBudgetMod.F90
165: e3sm.exe           000000000096E160  elm_driver_mp_elm         578  elm_driver.F90
165: e3sm.exe           00000000009097CB  lnd_comp_mct_mp_l         508  lnd_comp_mct.F90
165: e3sm.exe           000000000047C632  component_mod_mp_         751  component_mod.F90
165: e3sm.exe           0000000000439C2C  cime_comp_mod_mp_        2891  cime_comp_mod.F90
165: e3sm.exe           0000000000463FAC  MAIN__                    153  cime_driver.F90
165: e3sm.exe           0000000000402102  Unknown               Unknown  Unknown
165: e3sm.exe           000000000723AA8F  Unknown               Unknown  Unknown
if (carr(c) /= spval .and. scale_c2l(c) /= spval .and. scale_l2g(l) /= spval) then

ndkeen avatar Aug 16 '22 15:08 ndkeen

I knew issue looked familiar. Same as https://github.com/E3SM-Project/E3SM/issues/3786 and will close

ndkeen avatar Aug 16 '22 15:08 ndkeen

Reopening this issue and will close the others that are older and perhaps more confusing.

ndkeen avatar Aug 16 '22 18:08 ndkeen

Other tests known to hit same error:

SMS_D_PMx1.f19_g16.I1850CNECACNTBC.cori-knl_intel
SMS_D.f19_g16.I1850CNECACNTBC.cori-knl_intel

ndkeen avatar Aug 16 '22 18:08 ndkeen

Should be the same as Issue #4820 . Pretty much none of the land tests can be run in DEBUG mode.

peterdschwartz avatar Sep 12 '22 14:09 peterdschwartz