ufs-weather-model icon indicating copy to clipboard operation
ufs-weather-model copied to clipboard

Convert modulefiles to .lua format on WCOSS2, acorn, hera, jet, orion, cheyenne

Open BrianCurtis-NOAA opened this issue 2 years ago • 14 comments

PR Checklist

  • [X] This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • [X] This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • [X] An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR are specified below.

  • [ ] Results for one or more of the regression tests change and the reasons for the changes are understood and explained below.

  • [ ] New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.

Description

NCO prefers lua modules an UFS got a pass for WCOSS2. This will make WCOSS2, acorn, hera, jet, orion and Cheyenne follow .lua modules.

Issue(s) addressed

  • fixes #1326

Testing

  • [ ] hera.intel
  • [ ] hera.gnu
  • [ ] orion.intel
  • [ ] cheyenne.intel
  • [ ] cheyenne.gnu
  • [ ] gaea.intel
  • [ ] jet.intel
  • [ ] wcoss2.intel
  • [ ] acorn.intel
  • [ ] opnReqTest for newly added/changed feature
  • [ ] CI

Dependencies

None

BrianCurtis-NOAA avatar Aug 19 '22 18:08 BrianCurtis-NOAA

How is this PR supposed to work on Gaea? I see you deleted the ufs_common and ufs_common_debug Tcl/Tk environment modules but didn't make any updates to the Gaea module files.

Note that spack-stack will only support native Tcl/Tk env modules on Gaea in order to remove the nasty hack to use lua modules there (which has caused headaches for many people).

climbfuji avatar Aug 26 '22 14:08 climbfuji

I can update/test the lua formatted module file on S4.

DavidHuber-NOAA avatar Aug 26 '22 14:08 DavidHuber-NOAA

I am getting cp No such file or directory error when running rt.sh on S4. When compile.sh is executed, it attempts to copy

cp ${PATHTR}/modulefiles/ufs_${MACHINE_ID} ${PATHTR}/tests/modules.${BUILD_NAME}

However, the name of the new module files end with .lua, so the cp fails. See lines 132-136.

DavidHuber-NOAA avatar Aug 26 '22 19:08 DavidHuber-NOAA

I can update/test the lua formatted module file on S4.

@DavidHuber-NOAA I think I got all bugs squared away. Please try again.

BrianCurtis-NOAA avatar Sep 02 '22 01:09 BrianCurtis-NOAA

@DavidHuber-NOAA Now that I read back your comments today. Should I convert s4 to .lua format? I think I made an incorrect assumption that s4 should stay tcl.

BrianCurtis-NOAA avatar Sep 02 '22 12:09 BrianCurtis-NOAA

@DavidHuber-NOAA Now that I read back your comments today. Should I convert s4 to .lua format? I think I made an incorrect assumption that s4 should stay tcl.

Note for the near future (hopefully!): we now have a spack-stack install on S4, which is using .lua modules. Maybe S4 can be the guinea pig for transitioning in the next several weeks?

climbfuji avatar Sep 02 '22 13:09 climbfuji

@BrianCurtis-NOAA I can change that when I commit the S4 .lua module files, so that's not a problem. That said, there is a similar bug in run_test.sh. When run_test.sh attempts to copy over the modulefile copied by compile.sh to the tests/ directory, it needs a .lua extension for systems supporting lua modulefiles (see line 150).

@DomHeinzeller I would be good with giving spack-stack a whirl when the UFS is ready for it on S4.

DavidHuber-NOAA avatar Sep 02 '22 13:09 DavidHuber-NOAA

@DavidHuber-NOAA I did not see the same issue with run_test.sh on the other machines i tested on, and i'm not saying it doesn't exist on s4, but please update to the latest changes and let me know if they work for you.

BrianCurtis-NOAA avatar Sep 02 '22 14:09 BrianCurtis-NOAA

@BrianCurtis-NOAA I gave this another go on S4 and received the same error. I also gave it a shot on Orion where I am also seeing the same error.

On Orion, I ran rt.sh as follows: rt.sh -c -e -n control_c48. The compile job executed without problem, but the control_c48 test failed on line 150 of run_test.sh. The last few lines of run_001_control_c48.log contain the following:

+ cp /work/noaa/nesdis-rdo2/dhuber/ufs_lua/tests/fv3_001.exe fv3.exe
+ cp /work/noaa/nesdis-rdo2/dhuber/ufs_lua/tests/modules.fv3_001 modules.fv3
cp: cannot stat '/work/noaa/nesdis-rdo2/dhuber/ufs_lua/tests/modules.fv3_001': No such file or directory
+ '[' 1 -eq 0 ']'
+ write_fail_test
+ [[ false == true ]]
+ echo 'control_c48 001 failed in run_test'
+ exit 1

The name of the modulefile in the tests/ directory is "modules.fv3_001.lua" rather than "modules.fv3_001", which I believe is the source of this error.

I ran rt.sh here: /work/noaa/nesdis-rdo2/dhuber/ufs_lua/tests

DavidHuber-NOAA avatar Sep 02 '22 15:09 DavidHuber-NOAA

@DavidHuber-NOAA Thanks! I don't normally run rt.sh -c -e -n control_c48 i usually run rt.sh -e -l rt.test where i just copy rt.conf to rt.test and cut a bunch of tests out. I will use that and get a fix.

BrianCurtis-NOAA avatar Sep 02 '22 15:09 BrianCurtis-NOAA

@DavidHuber-NOAA Thanks! I don't normally run rt.sh -c -e -n control_c48 i usually run rt.sh -e -l rt.test where i just copy rt.conf to rt.test and cut a bunch of tests out. I will use that and get a fix.

@DavidHuber-NOAA First off, thanks for helping test on S4. I have implemented a fix that i tested on Gaea/Orion successfully. Please test again on S4 if you can.

BrianCurtis-NOAA avatar Sep 02 '22 17:09 BrianCurtis-NOAA

@BrianCurtis-NOAA Yes, it's working for me now on S4. Thanks! I'll open a PR into your fork shortly.

DavidHuber-NOAA avatar Sep 02 '22 18:09 DavidHuber-NOAA

@DavidHuber-NOAA It must be Friday. I had those changes in my Orion test and never pushed them to the repo. Apologies and thanks for the work! I think we'll get this into UFS ASAP.

BrianCurtis-NOAA avatar Sep 02 '22 18:09 BrianCurtis-NOAA

@BrianCurtis-NOAA we can take time. we can combine this PR with #1383 next Tuesday.

jkbk2004 avatar Sep 02 '22 18:09 jkbk2004

Rahul needs complete .lua change. Meaning we'll need to edit compile.sh and run_test.sh to reflect these changes.

BrianCurtis-NOAA avatar Sep 06 '22 15:09 BrianCurtis-NOAA

@BrianCurtis-NOAA so we need to move on to #1383 without combining, right?

jkbk2004 avatar Sep 06 '22 15:09 jkbk2004