ufs-weather-model
                                
                                
                                
                                    ufs-weather-model copied to clipboard
                            
                            
                            
                        Fix for land component model
Commit Queue Requirements:
- [x] Fill out all sections of this template.
 - [ ] All sub component pull requests have been reviewed by their code managers.
 - [ ] Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
 - [ ] Commit 'test_changes.list' from previous step
 
Description:
This PR aims to fix couple of issues with the component land model. The fully coupled atm+lnd configuration (control_p8_atmlnd) was failing in debug mode. To fix the issue, the NOAHMP NUOPC cap is modified to read required fixed information from the sfc_data file. The control_p8_atmlnd and control_p8_atmlnd_sbs tests are started to use V2 surface files which seems consistent. control_p8_atmlnd_debug test is also included to the test list.
Commit Message:
* UFSWM - fix fully coupled land component configuration
  * NOAHMP - get fixed information from surface file
Priority:
- Critical Bugfix: 
control_p8_atmlndconfiguration was not running correctly - High: The masking was wrong in CMEPS and lnd->atm direction of coupling was also not working properly for 
control_p8_atmlnd. The input files was also not consistent and as a result baseline was also wrong. 
Git Tracking
UFSWM:
- Closes #2189
 
Sub component Pull Requests:
- NOAHMP: https://github.com/NOAA-EMC/noahmp/pull/9
 
UFSWM Blocking Dependencies:
- Blocked by #2175 (This PR fixes mask for lnd->atm direction)
 
Changes
Regression Test Changes (Please commit test_changes.list):
- PR Adds New Tests/Baselines. 
control_p8_atmlnd_debug - PR Updates/Changes Baselines. 
control_p8_atmlndandcontrol_p8_atmlnd_sbs 
Input data Changes:
- None.
 
Library Changes/Upgrades:
- No Updates
 
Testing Log:
- RDHPCS
- [ ] Hera
 - [ ] Orion
 - [ ] Hercules
 - [ ] Jet
 - [ ] Gaea
 - [ ] Derecho
 
 - WCOSS2
- [ ] Dogwood/Cactus
 - [ ] Acorn
 
 - [ ] CI
 - [ ] opnReqTest (complete task if unnecessary)
 
@jkbk2004 @DeniseWorthen I am making this ready for review and depend on https://github.com/ufs-community/ufs-weather-model/pull/2175. This will fix additional issues revealed by testing https://github.com/ufs-community/ufs-weather-model/pull/2175.
@jkbk2004 @DeniseWorthen BTW, I am not sure it is already available or not but it would be nice to have capability in RT testing framework to allow using exiting tests to create new ones by changing couple of options in it. So, it would be something like an overloading. For example, this PR introduces another test for debug and the only change in the test file is the baseline directory since it is complied with debug mode. The rest of the options are same. So, it would be nice to use a set of base tests and use them to create new ones. I think this could simply the the tests files and also their consistencies.
@jkbk2004 @DeniseWorthen BTW, I am not sure it is already available or not but it would be nice to have capability in RT testing framework to allow using exiting tests to create new ones by changing couple of options in it. So, it would be something like an overloading. For example, this PR introduces another test for debug and the only change in the test file is the baseline directory since it is complied with debug mode. The rest of the options are same. So, it would be nice to use a set of base tests and use them to create new ones. I think this could simply the the tests files and also their consistencies.
@uturuncoglu Right now, debug test is handled as separate test case with debug build option on it. This pr is exactly doing that. However, I fully got your point. On EPIC side, we consider to revamp the test system to introduce rt.yaml (https://github.com/ufs-community/ufs-weather-model/issues/2182). Goal is to allow more flexibility and even to meet the different requirements of projects across centers and labs. Hope to make progress soon with the yaml approach (in a month time scale): https://github.com/jkbk2004/ufs-weather-model/blob/feature/rt-refactor/tests/rt.yaml. Once a sharable feature e branch is available, I will keep you posted there.
@uturuncoglu If I understand correctly what you are asking, you can already do that relatively easily. In the derived test configuration file simply source the base test configuration file and then overwrite variables you need to have different values.
Suppose you have a base test named 'base_test' which has the test configuration file in tests/base_test and it probably has the content something like:
$ cat tests/base_test
export TEST_DESCR="Base test to test specific configuration"
export CNTL_DIR=base_test
export LIST_FILES="sfcf000.nc \
                   sfcf021.nc \
                   sfcf024.nc \
                   atmf000.nc \
                   atmf021.nc \
                   atmf024.nc"
export_fv3
export VAR1=.true.
export VAR2=365
export VAR3='enable_feature'
Now, let's say you want to run a debug version of this test. Create file tests/base_test_debug, for example like:
$ cat tests/base_test_debug
# "Inherit" all settings from 'base_test'
source tests/base_test
# Overwrite some configuration variables
export TEST_DESCR="Base test to test specific configuration - DEBUG version"
export CNTL_DIR=base_test_debug
export LIST_FILES="sfcf000.nc \
                   sfcf001.nc \
                   atmf000.nc \
                   atmf001.nc"
export VAR1=.false.
You can keep 'deriving' more tests from base_test_debug, by sourcing it in a third file, etc. Or another test that derives from the base_test. You can build a family (tree) of closely related tests.
There's nothing special or magic about these files, they are just shell files sourced from run_test.sh script to set test specific variables. That's all. You can run any shell command in these files or whatever you need to do to set variables required by that specific test.
@DusanJovic-NOAA Thanks. That is what I am looking for. I'll update the test with this approach. It makes the life easier. @jkbk2004 Thanks for the information. It is great to know that testing system will be improved.
@jkbk2004 I synced the component and model in my fix branch. I wonder if this could go into commit queue soon since it has some critical fix for land component model. I could also try to introduce a fully coupled configuration in this PR by using cpld_control_p8 as a reference per @junwang-noaa request.
@jkbk2004 I synced the component and model in my fix branch. I wonder if this could go into commit queue soon since it has some critical fix for land component model. I could also try to introduce a fully coupled configuration in this PR by using
cpld_control_p8as a reference per @junwang-noaa request.
@uturuncoglu sounds good! we may be able to schedule to commit around Friday. I will keep you posted.
@jkbk2004 Thank you. Best,
@uturuncoglu Are you able to merge develop and run the test suite on Hercules or Orion and then commit the test_changes.list?
@DeniseWorthen Okay. I synced. I'll run the full RTs and update you.
Great. I'd like to get this committed, since it has been waiting.
@DeniseWorthen Thanks for your help. I am running RTs on Hercules now.
@DeniseWorthen I commit test_changes.list. It looks fine to me and only fully coupled configurations (incl. side-by-side) are failing due to answer change and also debug test is failing since it has no baseline yet.
@uturuncoglu Yes, that's perfect. If you can get Mike or someone to review the NoahMP PR, this can be marked as ready.
@DeniseWorthen I have just sent a reminder mail to @barlage.
Hi, @uturuncoglu can you please sync up your branch for this PR?
Also, @DusanJovic-NOAA can you please sync up your log compiler warning branch here, so we can combine it with this PR?
@zach1221 Sure.
@zach1221 Do you want me to update test_changes.list?
Hi, @uturuncoglu can you please sync up your branch for this PR?
Also, @DusanJovic-NOAA can you please sync up your log compiler warning branch here, so we can combine it with this PR?
Done.
@zach1221 Do you want me to update
test_changes.list?
Yes, please!
@zach1221 @jkbk2004 I tried to run all the land related RTs again after sync. I am not sure why but I am seeing following log in the ECFLOW output,
ECFLOW Tasks Remaining: 3/10
ECFLOW Tasks Remaining: 2/10
Will force aborted state for task /regtest_22879/control_restart_p8_atmlnd_intel
ECFLOW Tasks Remaining: 1/10
ECFLOW Tasks Remaining: 1/10
It seems it skips running land restart test. The directory is not created in the run_dir. I could see the run_test_control_restart_p8_atmlnd_intel.env file but that is all. Is it possible to have some issue with the regression testing system. Is there any way to debug it? Maybe I could try to run again to see I could reproduce it or not.
@zach1221 @jkbk2004 I tried to run all the land related RTs again after sync. I am not sure why but I am seeing following log in the ECFLOW output,
ECFLOW Tasks Remaining: 3/10 ECFLOW Tasks Remaining: 2/10 Will force aborted state for task /regtest_22879/control_restart_p8_atmlnd_intel ECFLOW Tasks Remaining: 1/10 ECFLOW Tasks Remaining: 1/10It seems it skips running land restart test. The directory is not created in the
run_dir. I could see therun_test_control_restart_p8_atmlnd_intel.envfile but that is all. Is it possible to have some issue with the regression testing system. Is there any way to debug it? Maybe I could try to run again to see I could reproduce it or not.
@uturuncoglu I can double check with develop branch.
@zach1221 @jkbk2004 I tried to run all the land related RTs again after sync. I am not sure why but I am seeing following log in the ECFLOW output,
ECFLOW Tasks Remaining: 3/10 ECFLOW Tasks Remaining: 2/10 Will force aborted state for task /regtest_22879/control_restart_p8_atmlnd_intel ECFLOW Tasks Remaining: 1/10 ECFLOW Tasks Remaining: 1/10It seems it skips running land restart test. The directory is not created in the
run_dir. I could see therun_test_control_restart_p8_atmlnd_intel.envfile but that is all. Is it possible to have some issue with the regression testing system. Is there any way to debug it? Maybe I could try to run again to see I could reproduce it or not.
This will happen when a compile fails or a test that others depends on fails, it aborts the associated/related test.
@zach1221 @jkbk2004 I tried to run all the land related RTs again after sync. I am not sure why but I am seeing following log in the ECFLOW output,
ECFLOW Tasks Remaining: 3/10 ECFLOW Tasks Remaining: 2/10 Will force aborted state for task /regtest_22879/control_restart_p8_atmlnd_intel ECFLOW Tasks Remaining: 1/10 ECFLOW Tasks Remaining: 1/10It seems it skips running land restart test. The directory is not created in the
run_dir. I could see therun_test_control_restart_p8_atmlnd_intel.envfile but that is all. Is it possible to have some issue with the regression testing system. Is there any way to debug it? Maybe I could try to run again to see I could reproduce it or not.This will happen when a compile fails or a test that others depends on fails, it aborts the associated/related test.
develop branch has no problem for those control_*_p8_atmlnd cases. @zach1221 @FernandoAndrade-NOAA FYI
@uturuncoglu If you were running rt.sh to update test_changes.list then because the restart tests dependent test fails (as it should if we expect changes, which we do, right?) then it's actually working as intended, I believe.
Also to note, there is a non-impactful rt.sh change coming in that adds warning/remark counts to the intel compile lines of the RegressionTests_
I don't think it's ready yet, we have no test_changes.list in a PR that says it's changing baselines.
I don't think it's ready yet, we have no test_changes.list in a PR that says it's changing baselines.
You're right, my mistake. We still need the updated change test file.
@BrianCurtis-NOAA @zach1221 I think compile is fine since same executable also used by side-by-side bald also fully coupled. I am trying to figure out what is wrong. @BrianCurtis-NOAA That makes sense. So, as I understanding if the baseline check fails for the full coupled run (used as a baseline for restart) then restart fails. Right? If this is the case, do I need to create test_changes.list manually?
@BrianCurtis-NOAA @zach1221 I think compile is fine since same executable also used by side-by-side bald also fully coupled. I am trying to figure out what is wrong. @BrianCurtis-NOAA That makes sense. So, as I understanding if the baseline check fails for the full coupled run (used as a baseline for restart) then restart fails. Right? If this is the case, do I need to create
test_changes.listmanually?
What we recommend is for each PR, authors run the full suite on Hera (as it runs most if not all tests on GNU/Intel side) and push that test_changes.list file to the repository. This file helps track the changed tests for each PR in the repository. If you know which tests change then you can manually edit the test_changes.list. each line should have <test> <compiler>. Then when we run testing we will catch any missing tests and report back in the PR.