E3SM
E3SM copied to clipboard
segfault with various DEBUG land cases
When I try to run a recent test added to nightly set (which is ERS) in DEBUG, I see segfaults on cori-haswell and cori-knl using either Intel or GNU.
SMS_D.hcru_hcru.I20TRGSWCNPRDCTCBC.cori-haswell_intel.elm-erosion
SMS_D.hcru_hcru.I20TRGSWCNPRDCTCBC.cori-haswell_gnu.elm-erosion
SMS_D.hcru_hcru.I20TRGSWCNPRDCTCBC.cori-knl_inel.elm-erosion
SMS_D.hcru_hcru.I20TRGSWCNPRDCTCBC.cori-knl_gnu.elm-erosion
I only get a stack trace with one of those -- cori-haswell and Intel. The GNU cases do not give a stack trace, so assuming it is failing on same line. Same for cori-knl/Intel -- fails without a stack.
165: forrtl: error (65): floating invalid
165: Image PC Routine Line Source
165: e3sm.exe 00000000070C9944 Unknown Unknown Unknown
165: e3sm.exe 00000000069F4620 Unknown Unknown Unknown
165: e3sm.exe 0000000000ECF35A subgridavemod_mp_ 1075 subgridAveMod.F90
165: e3sm.exe 0000000002B0EC27 cnpbudgetmod_mp_c 1148 CNPBudgetMod.F90
165: e3sm.exe 0000000002B0E20C cnpbudgetmod_mp_c 1085 CNPBudgetMod.F90
165: e3sm.exe 000000000096E160 elm_driver_mp_elm 578 elm_driver.F90
165: e3sm.exe 00000000009097CB lnd_comp_mct_mp_l 508 lnd_comp_mct.F90
165: e3sm.exe 000000000047C632 component_mod_mp_ 751 component_mod.F90
165: e3sm.exe 0000000000439C2C cime_comp_mod_mp_ 2891 cime_comp_mod.F90
165: e3sm.exe 0000000000463FAC MAIN__ 153 cime_driver.F90
165: e3sm.exe 0000000000402102 Unknown Unknown Unknown
165: e3sm.exe 000000000723AA8F Unknown Unknown Unknown
if (carr(c) /= spval .and. scale_c2l(c) /= spval .and. scale_l2g(l) /= spval) then
I knew issue looked familiar. Same as https://github.com/E3SM-Project/E3SM/issues/3786 and will close
Reopening this issue and will close the others that are older and perhaps more confusing.
Other tests known to hit same error:
SMS_D_PMx1.f19_g16.I1850CNECACNTBC.cori-knl_intel
SMS_D.f19_g16.I1850CNECACNTBC.cori-knl_intel
Should be the same as Issue #4820 . Pretty much none of the land tests can be run in DEBUG mode.