ufs-weather-model icon indicating copy to clipboard operation
ufs-weather-model copied to clipboard

Segmentation fault - invalid memory with FMS-2023.02-01: MOM_io_infra.F90

Open jkbk2004 opened this issue 1 year ago • 15 comments

Description

  1. cpld_control_p8_gnu regression tests fail on Hercules gnu with FMS-2023.02-01. The failure was reported with spack stack 1.5.1 update. Develop branch runs ok with FMS-2023.01. It only fails with FMS-2023.02-01 (installed in both spack stack 1.5.1 and 1.6.0). Even FMS-2023.04 runs ok with spack stack 1.6.0.
  2. error messages: 151: #4 0x3521462 in __mom_io_infra_MOD_read_field_2d 151: at /work/noaa/epic/jongkim/rt-2013/MOM6-interface/MOM6/config_src/infra/FMS2/MOM_io_infra.F90:905

To Reproduce:

  1. run https://github.com/ufs-community/ufs-weather-model/pull/2013 on Hercules/Gnu
  2. for the case cpld_control_p8_gnu

Additional context

Output

jkbk2004 avatar Jan 03 '24 18:01 jkbk2004

err.log

jkbk2004 avatar Jan 03 '24 18:01 jkbk2004

@laurenchilutti @bensonr @junwang-noaa @jiandewang @DeniseWorthen I am wondering if any update is needed to update with FMS-2023.02-01 on MOM6 side for the lines that use fms_io ?

jkbk2004 avatar Jan 03 '24 18:01 jkbk2004

@ulmononian @RatkoVasic-NOAA @natalie-perlin @zach1221 @FernandoAndrade-NOAA FYI

jkbk2004 avatar Jan 03 '24 18:01 jkbk2004

@jkbk2004 My understanding is that the gnu compiler issue is known in fms 2023.02.01/2023.03. It is fixed in fms 2023.04. Can we comment out gnu with gnu test, and turned it back on when we update the model to use spack-stack 1.6.0? Thanks

junwang-noaa avatar Jan 03 '24 18:01 junwang-noaa

@junwang-noaa That makes a sense since fms-2023.04 runs on hercules/gnu ok. We can turn off failed gnu cases and move on.

jkbk2004 avatar Jan 03 '24 19:01 jkbk2004

see https://github.com/jiandewang/MOM6/blob/dev/emc/config_src/infra/FMS2/MOM_io_infra.F90#L902-L906 I think this version of fms code had a bug when reading fixed files which doesn't have timeleve and dimension

jiandewang avatar Jan 03 '24 19:01 jiandewang

https://github.com/NOAA-GFDL/FMS/issues/1254

jkbk2004 avatar Jan 03 '24 19:01 jkbk2004

@climbfuji I want to confirm with you that the failed gnu tests listed in this issue were able to run when a new gnu compiler (v12) is used, it is not a fms 2023.02.01 issue, is that right?

junwang-noaa avatar Mar 20 '24 17:03 junwang-noaa

@climbfuji I want to confirm with you that the failed gnu tests listed in this issue were able to run when a new gnu compiler (v12) is used, it is not a fms 2023.02.01 issue, is that right?

My recollection is poor, but from the comments above your assumption sounds about right.

climbfuji avatar Mar 20 '24 19:03 climbfuji

@junwang-noaa @BrianCurtis-NOAA @DeniseWorthen @zach1221 @FernandoAndrade-NOAA I will take my words back. There is still seg fault issue with Hercules/gnu cpld cases. It's related to the mom6 fms_io call. I agree the issue will be resolved with new fms version.

jkbk2004 avatar Mar 21 '24 12:03 jkbk2004

@RatkoVasic-NOAA Is spack-stacl 1.6.0 ready on hercules and hera? If yes, @jkbk2004 would you please confirm this is resolved? Thansk

junwang-noaa avatar Apr 12 '24 17:04 junwang-noaa

@RatkoVasic-NOAA Is spack-stacl 1.6.0 ready on hercules and hera? If yes, @jkbk2004 would you please confirm this is resolved? Thansk

@junwang-noaa Hera: /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/unified-env-rocky8/install/modulefiles/Core Hercules: /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core

RatkoVasic-NOAA avatar Apr 12 '24 18:04 RatkoVasic-NOAA

@RatkoVasic-NOAA Is spack-stacl 1.6.0 ready on hercules and hera? If yes, @jkbk2004 would you please confirm this is resolved? Thansk

@junwang-noaa Hera: /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/unified-env-rocky8/install/modulefiles/Core Hercules: /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core

I think we will be able to close this issue when we update for spack stack 1.6. Tests run ok on hera and other RDHPCS for the update of the fms 2023.04 of the #2093.

jkbk2004 avatar Apr 12 '24 18:04 jkbk2004

@jkbk2004 have you turned those tests back on and run them successfully? Just want to confirm.

junwang-noaa avatar Apr 12 '24 19:04 junwang-noaa

@jkbk2004 have you turned those tests back on and run them successfully? Just want to confirm.

Yes, I checked again. It ran ok on hercules at /work2/noaa/stmp/jongkim/stmp/jongkim/FV3_RT/rt_1331586/cpld_control_p8_gnu. On hera, we still have the OSC pt2pt issue with gnu that we found from Rocky8 migration.

jkbk2004 avatar Apr 15 '24 13:04 jkbk2004

cpld_control_p8 was turned on in PR#2093. The issue is closed.

junwang-noaa avatar Jun 27 '24 14:06 junwang-noaa