ufs-weather-model
ufs-weather-model copied to clipboard
Segmentation fault - invalid memory with FMS-2023.02-01: MOM_io_infra.F90
Description
- cpld_control_p8_gnu regression tests fail on Hercules gnu with FMS-2023.02-01. The failure was reported with spack stack 1.5.1 update. Develop branch runs ok with FMS-2023.01. It only fails with FMS-2023.02-01 (installed in both spack stack 1.5.1 and 1.6.0). Even FMS-2023.04 runs ok with spack stack 1.6.0.
- error messages: 151: #4 0x3521462 in __mom_io_infra_MOD_read_field_2d 151: at /work/noaa/epic/jongkim/rt-2013/MOM6-interface/MOM6/config_src/infra/FMS2/MOM_io_infra.F90:905
To Reproduce:
- run https://github.com/ufs-community/ufs-weather-model/pull/2013 on Hercules/Gnu
- for the case cpld_control_p8_gnu
Additional context
Output
@laurenchilutti @bensonr @junwang-noaa @jiandewang @DeniseWorthen I am wondering if any update is needed to update with FMS-2023.02-01 on MOM6 side for the lines that use fms_io ?
@ulmononian @RatkoVasic-NOAA @natalie-perlin @zach1221 @FernandoAndrade-NOAA FYI
@jkbk2004 My understanding is that the gnu compiler issue is known in fms 2023.02.01/2023.03. It is fixed in fms 2023.04. Can we comment out gnu with gnu test, and turned it back on when we update the model to use spack-stack 1.6.0? Thanks
@junwang-noaa That makes a sense since fms-2023.04 runs on hercules/gnu ok. We can turn off failed gnu cases and move on.
see https://github.com/jiandewang/MOM6/blob/dev/emc/config_src/infra/FMS2/MOM_io_infra.F90#L902-L906 I think this version of fms code had a bug when reading fixed files which doesn't have timeleve and dimension
https://github.com/NOAA-GFDL/FMS/issues/1254
@climbfuji I want to confirm with you that the failed gnu tests listed in this issue were able to run when a new gnu compiler (v12) is used, it is not a fms 2023.02.01 issue, is that right?
@climbfuji I want to confirm with you that the failed gnu tests listed in this issue were able to run when a new gnu compiler (v12) is used, it is not a fms 2023.02.01 issue, is that right?
My recollection is poor, but from the comments above your assumption sounds about right.
@junwang-noaa @BrianCurtis-NOAA @DeniseWorthen @zach1221 @FernandoAndrade-NOAA I will take my words back. There is still seg fault issue with Hercules/gnu cpld cases. It's related to the mom6 fms_io call. I agree the issue will be resolved with new fms version.
@RatkoVasic-NOAA Is spack-stacl 1.6.0 ready on hercules and hera? If yes, @jkbk2004 would you please confirm this is resolved? Thansk
@RatkoVasic-NOAA Is spack-stacl 1.6.0 ready on hercules and hera? If yes, @jkbk2004 would you please confirm this is resolved? Thansk
@junwang-noaa Hera: /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/unified-env-rocky8/install/modulefiles/Core Hercules: /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core
@RatkoVasic-NOAA Is spack-stacl 1.6.0 ready on hercules and hera? If yes, @jkbk2004 would you please confirm this is resolved? Thansk
@junwang-noaa Hera: /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/unified-env-rocky8/install/modulefiles/Core Hercules: /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core
I think we will be able to close this issue when we update for spack stack 1.6. Tests run ok on hera and other RDHPCS for the update of the fms 2023.04 of the #2093.
@jkbk2004 have you turned those tests back on and run them successfully? Just want to confirm.
@jkbk2004 have you turned those tests back on and run them successfully? Just want to confirm.
Yes, I checked again. It ran ok on hercules at /work2/noaa/stmp/jongkim/stmp/jongkim/FV3_RT/rt_1331586/cpld_control_p8_gnu. On hera, we still have the OSC pt2pt issue with gnu that we found from Rocky8 migration.
cpld_control_p8 was turned on in PR#2093. The issue is closed.