ufs-weather-model icon indicating copy to clipboard operation
ufs-weather-model copied to clipboard

Fix for land component model

Open uturuncoglu opened this issue 1 year ago • 5 comments

Commit Queue Requirements:

  • [x] Fill out all sections of this template.
  • [ ] All sub component pull requests have been reviewed by their code managers.
  • [ ] Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • [ ] Commit 'test_changes.list' from previous step

Description:

This PR aims to fix couple of issues with the component land model. The fully coupled atm+lnd configuration (control_p8_atmlnd) was failing in debug mode. To fix the issue, the NOAHMP NUOPC cap is modified to read required fixed information from the sfc_data file. The control_p8_atmlnd and control_p8_atmlnd_sbs tests are started to use V2 surface files which seems consistent. control_p8_atmlnd_debug test is also included to the test list.

Commit Message:

* UFSWM - fix fully coupled land component configuration
  * NOAHMP - get fixed information from surface file

Priority:

  • Critical Bugfix: control_p8_atmlnd configuration was not running correctly
  • High: The masking was wrong in CMEPS and lnd->atm direction of coupling was also not working properly for control_p8_atmlnd. The input files was also not consistent and as a result baseline was also wrong.

Git Tracking

UFSWM:

  • Closes #2189

Sub component Pull Requests:

  • NOAHMP: https://github.com/NOAA-EMC/noahmp/pull/9

UFSWM Blocking Dependencies:

  • Blocked by #2175 (This PR fixes mask for lnd->atm direction)

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines. control_p8_atmlnd_debug
  • PR Updates/Changes Baselines. control_p8_atmlnd and control_p8_atmlnd_sbs

Input data Changes:

  • None.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • [ ] Hera
    • [ ] Orion
    • [ ] Hercules
    • [ ] Jet
    • [ ] Gaea
    • [ ] Derecho
  • WCOSS2
    • [ ] Dogwood/Cactus
    • [ ] Acorn
  • [ ] CI
  • [ ] opnReqTest (complete task if unnecessary)

uturuncoglu avatar Mar 14 '24 16:03 uturuncoglu

@jkbk2004 @DeniseWorthen I am making this ready for review and depend on https://github.com/ufs-community/ufs-weather-model/pull/2175. This will fix additional issues revealed by testing https://github.com/ufs-community/ufs-weather-model/pull/2175.

uturuncoglu avatar Apr 01 '24 22:04 uturuncoglu

@jkbk2004 @DeniseWorthen BTW, I am not sure it is already available or not but it would be nice to have capability in RT testing framework to allow using exiting tests to create new ones by changing couple of options in it. So, it would be something like an overloading. For example, this PR introduces another test for debug and the only change in the test file is the baseline directory since it is complied with debug mode. The rest of the options are same. So, it would be nice to use a set of base tests and use them to create new ones. I think this could simply the the tests files and also their consistencies.

uturuncoglu avatar Apr 01 '24 22:04 uturuncoglu

@jkbk2004 @DeniseWorthen BTW, I am not sure it is already available or not but it would be nice to have capability in RT testing framework to allow using exiting tests to create new ones by changing couple of options in it. So, it would be something like an overloading. For example, this PR introduces another test for debug and the only change in the test file is the baseline directory since it is complied with debug mode. The rest of the options are same. So, it would be nice to use a set of base tests and use them to create new ones. I think this could simply the the tests files and also their consistencies.

@uturuncoglu Right now, debug test is handled as separate test case with debug build option on it. This pr is exactly doing that. However, I fully got your point. On EPIC side, we consider to revamp the test system to introduce rt.yaml (https://github.com/ufs-community/ufs-weather-model/issues/2182). Goal is to allow more flexibility and even to meet the different requirements of projects across centers and labs. Hope to make progress soon with the yaml approach (in a month time scale): https://github.com/jkbk2004/ufs-weather-model/blob/feature/rt-refactor/tests/rt.yaml. Once a sharable feature e branch is available, I will keep you posted there.

jkbk2004 avatar Apr 01 '24 23:04 jkbk2004

@uturuncoglu If I understand correctly what you are asking, you can already do that relatively easily. In the derived test configuration file simply source the base test configuration file and then overwrite variables you need to have different values. Suppose you have a base test named 'base_test' which has the test configuration file in tests/base_test and it probably has the content something like:

$ cat tests/base_test

export TEST_DESCR="Base test to test specific configuration"
export CNTL_DIR=base_test
export LIST_FILES="sfcf000.nc \
                   sfcf021.nc \
                   sfcf024.nc \
                   atmf000.nc \
                   atmf021.nc \
                   atmf024.nc"
export_fv3
export VAR1=.true.
export VAR2=365
export VAR3='enable_feature'

Now, let's say you want to run a debug version of this test. Create file tests/base_test_debug, for example like:

$ cat tests/base_test_debug

# "Inherit" all settings from 'base_test'
source tests/base_test

# Overwrite some configuration variables
export TEST_DESCR="Base test to test specific configuration - DEBUG version"
export CNTL_DIR=base_test_debug
export LIST_FILES="sfcf000.nc \
                   sfcf001.nc \
                   atmf000.nc \
                   atmf001.nc"
export VAR1=.false.

You can keep 'deriving' more tests from base_test_debug, by sourcing it in a third file, etc. Or another test that derives from the base_test. You can build a family (tree) of closely related tests.

There's nothing special or magic about these files, they are just shell files sourced from run_test.sh script to set test specific variables. That's all. You can run any shell command in these files or whatever you need to do to set variables required by that specific test.

DusanJovic-NOAA avatar Apr 02 '24 01:04 DusanJovic-NOAA

@DusanJovic-NOAA Thanks. That is what I am looking for. I'll update the test with this approach. It makes the life easier. @jkbk2004 Thanks for the information. It is great to know that testing system will be improved.

uturuncoglu avatar Apr 02 '24 20:04 uturuncoglu

@jkbk2004 I synced the component and model in my fix branch. I wonder if this could go into commit queue soon since it has some critical fix for land component model. I could also try to introduce a fully coupled configuration in this PR by using cpld_control_p8 as a reference per @junwang-noaa request.

uturuncoglu avatar Apr 16 '24 17:04 uturuncoglu

@jkbk2004 I synced the component and model in my fix branch. I wonder if this could go into commit queue soon since it has some critical fix for land component model. I could also try to introduce a fully coupled configuration in this PR by using cpld_control_p8 as a reference per @junwang-noaa request.

@uturuncoglu sounds good! we may be able to schedule to commit around Friday. I will keep you posted.

jkbk2004 avatar Apr 16 '24 17:04 jkbk2004

@jkbk2004 Thank you. Best,

uturuncoglu avatar Apr 16 '24 19:04 uturuncoglu

@uturuncoglu Are you able to merge develop and run the test suite on Hercules or Orion and then commit the test_changes.list?

DeniseWorthen avatar Apr 18 '24 15:04 DeniseWorthen

@DeniseWorthen Okay. I synced. I'll run the full RTs and update you.

uturuncoglu avatar Apr 18 '24 16:04 uturuncoglu

Great. I'd like to get this committed, since it has been waiting.

DeniseWorthen avatar Apr 18 '24 16:04 DeniseWorthen

@DeniseWorthen Thanks for your help. I am running RTs on Hercules now.

uturuncoglu avatar Apr 18 '24 16:04 uturuncoglu

@DeniseWorthen I commit test_changes.list. It looks fine to me and only fully coupled configurations (incl. side-by-side) are failing due to answer change and also debug test is failing since it has no baseline yet.

uturuncoglu avatar Apr 18 '24 21:04 uturuncoglu

@uturuncoglu Yes, that's perfect. If you can get Mike or someone to review the NoahMP PR, this can be marked as ready.

DeniseWorthen avatar Apr 18 '24 21:04 DeniseWorthen

@DeniseWorthen I have just sent a reminder mail to @barlage.

uturuncoglu avatar Apr 18 '24 21:04 uturuncoglu

Hi, @uturuncoglu can you please sync up your branch for this PR?

Also, @DusanJovic-NOAA can you please sync up your log compiler warning branch here, so we can combine it with this PR?

zach1221 avatar Apr 25 '24 19:04 zach1221

@zach1221 Sure.

uturuncoglu avatar Apr 25 '24 19:04 uturuncoglu

@zach1221 Do you want me to update test_changes.list?

uturuncoglu avatar Apr 25 '24 19:04 uturuncoglu

Hi, @uturuncoglu can you please sync up your branch for this PR?

Also, @DusanJovic-NOAA can you please sync up your log compiler warning branch here, so we can combine it with this PR?

Done.

DusanJovic-NOAA avatar Apr 25 '24 19:04 DusanJovic-NOAA

@zach1221 Do you want me to update test_changes.list?

Yes, please!

zach1221 avatar Apr 25 '24 19:04 zach1221

@zach1221 @jkbk2004 I tried to run all the land related RTs again after sync. I am not sure why but I am seeing following log in the ECFLOW output,

ECFLOW Tasks Remaining: 3/10
ECFLOW Tasks Remaining: 2/10
Will force aborted state for task /regtest_22879/control_restart_p8_atmlnd_intel
ECFLOW Tasks Remaining: 1/10
ECFLOW Tasks Remaining: 1/10

It seems it skips running land restart test. The directory is not created in the run_dir. I could see the run_test_control_restart_p8_atmlnd_intel.env file but that is all. Is it possible to have some issue with the regression testing system. Is there any way to debug it? Maybe I could try to run again to see I could reproduce it or not.

uturuncoglu avatar Apr 25 '24 22:04 uturuncoglu

@zach1221 @jkbk2004 I tried to run all the land related RTs again after sync. I am not sure why but I am seeing following log in the ECFLOW output,

ECFLOW Tasks Remaining: 3/10
ECFLOW Tasks Remaining: 2/10
Will force aborted state for task /regtest_22879/control_restart_p8_atmlnd_intel
ECFLOW Tasks Remaining: 1/10
ECFLOW Tasks Remaining: 1/10

It seems it skips running land restart test. The directory is not created in the run_dir. I could see the run_test_control_restart_p8_atmlnd_intel.env file but that is all. Is it possible to have some issue with the regression testing system. Is there any way to debug it? Maybe I could try to run again to see I could reproduce it or not.

@uturuncoglu I can double check with develop branch.

jkbk2004 avatar Apr 25 '24 22:04 jkbk2004

@zach1221 @jkbk2004 I tried to run all the land related RTs again after sync. I am not sure why but I am seeing following log in the ECFLOW output,

ECFLOW Tasks Remaining: 3/10
ECFLOW Tasks Remaining: 2/10
Will force aborted state for task /regtest_22879/control_restart_p8_atmlnd_intel
ECFLOW Tasks Remaining: 1/10
ECFLOW Tasks Remaining: 1/10

It seems it skips running land restart test. The directory is not created in the run_dir. I could see the run_test_control_restart_p8_atmlnd_intel.env file but that is all. Is it possible to have some issue with the regression testing system. Is there any way to debug it? Maybe I could try to run again to see I could reproduce it or not.

This will happen when a compile fails or a test that others depends on fails, it aborts the associated/related test.

BrianCurtis-NOAA avatar Apr 26 '24 13:04 BrianCurtis-NOAA

@zach1221 @jkbk2004 I tried to run all the land related RTs again after sync. I am not sure why but I am seeing following log in the ECFLOW output,

ECFLOW Tasks Remaining: 3/10
ECFLOW Tasks Remaining: 2/10
Will force aborted state for task /regtest_22879/control_restart_p8_atmlnd_intel
ECFLOW Tasks Remaining: 1/10
ECFLOW Tasks Remaining: 1/10

It seems it skips running land restart test. The directory is not created in the run_dir. I could see the run_test_control_restart_p8_atmlnd_intel.env file but that is all. Is it possible to have some issue with the regression testing system. Is there any way to debug it? Maybe I could try to run again to see I could reproduce it or not.

This will happen when a compile fails or a test that others depends on fails, it aborts the associated/related test.

develop branch has no problem for those control_*_p8_atmlnd cases. @zach1221 @FernandoAndrade-NOAA FYI

jkbk2004 avatar Apr 26 '24 13:04 jkbk2004

@uturuncoglu If you were running rt.sh to update test_changes.list then because the restart tests dependent test fails (as it should if we expect changes, which we do, right?) then it's actually working as intended, I believe.

BrianCurtis-NOAA avatar Apr 26 '24 13:04 BrianCurtis-NOAA

Also to note, there is a non-impactful rt.sh change coming in that adds warning/remark counts to the intel compile lines of the RegressionTests_.log file. This will be brought in once we get confirmation all is OK to start testing this PR.

BrianCurtis-NOAA avatar Apr 26 '24 13:04 BrianCurtis-NOAA

I don't think it's ready yet, we have no test_changes.list in a PR that says it's changing baselines.

BrianCurtis-NOAA avatar Apr 26 '24 15:04 BrianCurtis-NOAA

I don't think it's ready yet, we have no test_changes.list in a PR that says it's changing baselines.

You're right, my mistake. We still need the updated change test file.

zach1221 avatar Apr 26 '24 15:04 zach1221

@BrianCurtis-NOAA @zach1221 I think compile is fine since same executable also used by side-by-side bald also fully coupled. I am trying to figure out what is wrong. @BrianCurtis-NOAA That makes sense. So, as I understanding if the baseline check fails for the full coupled run (used as a baseline for restart) then restart fails. Right? If this is the case, do I need to create test_changes.list manually?

uturuncoglu avatar Apr 26 '24 16:04 uturuncoglu

@BrianCurtis-NOAA @zach1221 I think compile is fine since same executable also used by side-by-side bald also fully coupled. I am trying to figure out what is wrong. @BrianCurtis-NOAA That makes sense. So, as I understanding if the baseline check fails for the full coupled run (used as a baseline for restart) then restart fails. Right? If this is the case, do I need to create test_changes.list manually?

What we recommend is for each PR, authors run the full suite on Hera (as it runs most if not all tests on GNU/Intel side) and push that test_changes.list file to the repository. This file helps track the changed tests for each PR in the repository. If you know which tests change then you can manually edit the test_changes.list. each line should have <test> <compiler>. Then when we run testing we will catch any missing tests and report back in the PR.

BrianCurtis-NOAA avatar Apr 26 '24 16:04 BrianCurtis-NOAA