ufs-weather-model icon indicating copy to clipboard operation
ufs-weather-model copied to clipboard

Add FV3 suite WoFS_v0

Open MicroTed opened this issue 2 years ago • 62 comments

PR Checklist

  • [x] This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • [x] This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • [ ] An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR are specified below.

  • [ ] Results for one or more of the regression tests change and the reasons for the changes are understood and explained below.

  • [ ] New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.

Instructions: All subsequent sections of text should be filled in as appropriate.

Description

Adds new suite definition file to FV3 (FV3_WoFS_v0) to use the NSSL cloud microphysics scheme with Noah LSM. For use use with the next SRW app release.

Testing

Testing done on Jet

  • [x] hera.intel
  • [ ] hera.gnu
  • [ ] orion.intel
  • [ ] cheyenne.intel
  • [ ] cheyenne.gnu
  • [ ] gaea.intel
  • [x] jet.intel
  • [ ] wcoss_cray
  • [ ] wcoss_dell_p3
  • [x] opnReqTest for newly added/changed feature
  • [ ] CI

Dependencies

If testing this branch requires non-default branches in other repositories, list them. Those branches should have matching names (ideally).

Do PRs in upstream repositories need to be merged first?

"waiting for other repos"

  • dependency for ufs-community/regional_workflow/pull/708
  • waiting on noaa-emc/fv3atm/pull/514
  • ORT failures (rst and dcp) fixed by https://github.com/NCAR/ccpp-physics/pull/904

MicroTed avatar Apr 05 '22 22:04 MicroTed

@MicroTed It looks to me you created two PRs, #1158 and #1159, maybe they can be combined? Also please fill up information on the checklist, description, issues( all the PRs need to have an issue associated), testing information and dependencies(any submodule needs to be updated). Thanks

junwang-noaa avatar Apr 06 '22 16:04 junwang-noaa

@junwang-noaa Only one of the drafts will be used, depending on whether the RT tests for v1nssl should be converted to WoFS_v0 or not. I may have jumped the gun a little on that.. 1158 just adds the new SDF without touching the RT files, which is the easiest path.

MicroTed avatar Apr 06 '22 18:04 MicroTed

@MicroTed could you please try to run case with this new SDF by reversing MPI layout in input.nml file. For example, if you have layout = 6,11 - try with layout = 11,6 (this will change MPI decomposition) For the compilation flag please add option -DREPRO=ON (this will use less aggressive optimization) Purpose of this is to see if we get bit-wise identical results with two different MPI layouts.

NOTE Our current SDF (suite_FV3_GFS_v15_thompson_mynn_lam3km.xml) is reproducing results only with coarse resolution (currently tested in UFS regression tests), but at 3km it is creating different results.

RatkoVasic-NOAA avatar Apr 08 '22 16:04 RatkoVasic-NOAA

@RatkoVasic-NOAA

please try to run case

Sure thing. Is there any particular current RT test that would be good?

by reversing MPI layout in input.nml file

My experience with changing the layout is that optimization has to be off completely (to turn off vectorization). Otherwise there can be a change in the "leftover" operations that don't fit into the vector and have different round-off characteristics. It has been a few years since I've tested that, so YMMV.

MicroTed avatar Apr 08 '22 16:04 MicroTed

@MicroTed Dynamics-only is reproducing results in different MPI layout, so should with physics (which is column-only and independent of MPI).

There are two tests in rt,conf: RUN | regional_control RUN | regional_3km

First one uses low resolution and second one high resolution. Just reminder to use -DREPRO=ON in COMPILE line, and of course, use suite_FV3_WoFS_v0.xml in same line. Also, you have to use your input.nml and model_configure files. When you run rt.sh, run it with -k option (which will keep run directories). Then you go in each run directory (save output) and resubmit job_card (with edited layout line in input.nml).

RatkoVasic-NOAA avatar Apr 08 '22 16:04 RatkoVasic-NOAA

@RatkoVasic-NOAA I guess it is expected that regional_3km has no real convection in the domain for a 6hr forecast? There doesn't seem to be much for the microphysics to do there.

MicroTed avatar Apr 08 '22 22:04 MicroTed

@RatkoVasic-NOAA I guess it is expected that regional_3km has no real convection in the domain for a 6hr forecast? There doesn't seem to be much for the microphysics to do there.

Right

RatkoVasic-NOAA avatar Apr 08 '22 22:04 RatkoVasic-NOAA

@MicroTed May I ask if you ran the decomposition test as Ratko suggested (reversing MPI layout in input.nml file)? We need to run the ORT test(threading, decomposition, restart reproducibility test and debug test ) for new physics suite. Thanks.

junwang-noaa avatar Apr 12 '22 01:04 junwang-noaa

Dynamics-only is reproducing results in different MPI layout, so should with physics (which is column-only and independent of MPI).

Not completely column-only -- the Thompson interface (mp_thompson.F90) has 2D loops that will vary in width with MPI. The NSSL scheme is 2D within the microphysics code (works on 2D slabs rather than just columns)

@junwang-noaa I did try flipping the layout on the regional_3km, and results do change, even after changing the REPRO options to '-fp-model precise'

MicroTed avatar Apr 12 '22 04:04 MicroTed

@MicroTed is the test reproducing with restart runs? @ligiabernardet @ywangwof @arunchawla-NOAA The new suite file FV3_WoFS_v0 does not reproduce decomposition. Do you want it to be in SRW v2 release?

junwang-noaa avatar Apr 12 '22 13:04 junwang-noaa

Yes. We want it to be in the SRW v2 release. Could you provide more details about the decomposition reproducing? Actually, this PR just adds one CCPP suite file. I am wondering why it will impact the decomposition?

ywangwof avatar Apr 12 '22 14:04 ywangwof

@ywangwof The requirement for public release is that the suite files are well tested. When developers run the test using the suite file (e.g. regional_3km_wofs) with different threads, different decomposition, and restart they will get same answer. The test can also run in debug mode. These tests are required for public release (@ligiabernardet please let me know if the requirement is changed).

junwang-noaa avatar Apr 12 '22 14:04 junwang-noaa

@MicroTed

In the rt.conf file, you have a new test regional_3km_wofs. However, your new test file is tests/tests/regional_3km_wofsv0. These two names have to match to be able to run the regression test.

MinsukJi-NOAA avatar Apr 12 '22 15:04 MinsukJi-NOAA

These two names have to match to be able to run the regression test.

@MinsukJi-NOAA Thanks for pointing that out! I renamed the file.

MicroTed avatar Apr 12 '22 19:04 MicroTed

could you please try to run case with this new SDF by reversing MPI layout in input.nml file. For example, if you have layout = 6,11 - try with layout = 11,6 (this will change MPI decomposition)

@RatkoVasic-NOAA I have done some tests with this, and I find the same behavior you get with suite_FV3_GFS_v15_thompson_mynn_lam3km.xml. The WoFS suite has reproducible results for regional_control (10,6 vs. 6,10) but not for various tests on 3km grids. (Even with DEBUG=ON.) The NSSL microphysics module is virtually the same as in WRF, where it has no problems with the reproducibility tests. So I suspect there is something common to both SDFs or namelist options that could be the culprit.

At this point, it seems reasonable to note that there is an issue somewhere and move on?

MicroTed avatar Apr 14 '22 16:04 MicroTed

Pinging @junwang-noaa, since I think Ratko may be on leave. It sounds like there is a systematic problem with physics reproducibility at higher resolutions that we'll need to address at some point, but it's unrelated to the SDFs themselves. Therefore, do you think we could get this PR back into the merge queue? Thanks!

JeffBeck-NOAA avatar Apr 15 '22 02:04 JeffBeck-NOAA

@JeffBeck-NOAA Since this suite file is for CCPP v6 public release. Please check with @ligiabernardet and @ChunxiZhang-NOAA, if both of them are OK to have the suite file in fv3/ccpp/suites, we will continue with this PR. Thanks.

junwang-noaa avatar Apr 15 '22 02:04 junwang-noaa

Thanks, @junwang-noaa!

@ligiabernardet, @ChunxiZhang-NOAA, regarding the reproducibility issue, do you know if this a problem with a specific physics parameterization or a combination of physics? I'm just trying to get a feeling for how we should proceed with the PR, since it's just an SDF and no physics code is being committed. Thanks!

JeffBeck-NOAA avatar Apr 15 '22 03:04 JeffBeck-NOAA

I don't know which schemes/suites pass the reproducibility/decomposition tests. @grantfirl @ChunxiZhang-NOAA Do you have this information?

ligiabernardet avatar Apr 15 '22 14:04 ligiabernardet

@ligiabernardet The NSSL MP scheme probably is not tested for reproducibility/decomposition. The suite WoFS_v0 is only one scheme different to the RRFS_v1nssl suite, e.g., WoFS_v0 uses Noah_lsm, while RRFS_v1nssl uses Noahmp_lsm. The decomposition reproducibility issue could be caused by the NSSL MP itself or the treatment of the lateral boundary conditions in the regional runs.

ChunxiZhang-NOAA avatar Apr 15 '22 15:04 ChunxiZhang-NOAA

@ChunxiZhang-NOAA, so the NSSL MP parameterization was merged before reproducibility/decomposition testing for physics was required? I would certainly agree with new parameterizations being tested, but since this PR is just an SDF XML for existing physics, I would advocate for noting the testing that @MicroTed conducted and moving ahead. It does sound like dedicated funds/time are needed to get to the bottom of the reproducibility problem soon, as it impacts multiple physics parameterizations, and only at certain resolutions. It'd be good to narrow it down to either the physics or the LBCs first.

JeffBeck-NOAA avatar Apr 15 '22 15:04 JeffBeck-NOAA

The Thompson and NSSL schemes both have reproducible results in WRF, and the CCPP versions likely do not change that. Since the two mentions SDFs with either scheme have different results at 3km vs. large scale, it suggest that the issue is not either microphysics scheme (or something common to both in the cap-end drivers).

It is definitely worth pursuing the root(s) of this issue (though not necessarily for this PR). Is there a dx <= 3km test where a particular suite has reproducible output? An ideal setup with periodic BC may help to isolate or rule out some physics packages.

MicroTed avatar Apr 15 '22 16:04 MicroTed

I did ORT tests for the NSSL MP scheme. The ORT tests were based on a low resolution global domain and it forecasted 24 hours. The findings are:

  1. The dcp test: For rrfs_v1nssl, the dcp test failed. Interestingly, with fhcyc=24, the difference started from fh21 plus one time step; with fhcyc=0, the difference started from fh21 or fh21 minus one time step. Still don't know what caused the problem because gcycle was not called with either fhcyc=0 or fhcyc=24 at fh21. For rrfs_v1nssl_nohailnoccn, the dcp test passed. The dcp test for the rrfs_v1beta passed too, and the only difference between rrfs_v1beta and rrfs_v1nssl is the MP scheme. So I guess there are bugs in dynamics related to adding the extra tracer hail or the treatment of CCN (less likely) in the NSSL scheme.
  2. The rst test: Since the rst tests for both rrfs_v1nssl and rrfs_v1nssl_nohailnoccn failed, I guess there are bugs for restart variables in this scheme. It needs to check if all the intermediate variables are written out correctly and not reinitialized in the code. The differences between the SDF rrfs_v1nssl and WoFS_v0 are very small, .e.g., rrfs_v1nssl employed the NoahMP land surface scheme and the WoFS_v0 employed the Noah land surface scheme. Probably the decomposition and restart issues are mainly caused by the NSSL MP scheme. One question for Ted: why GFS_suite_interstitial_3 GFS_suite_interstitial_4 are not included in the rrfs_v1nssl and WoFS_v0 SDFs?

ChunxiZhang-NOAA avatar Apr 18 '22 15:04 ChunxiZhang-NOAA

@ChunxiZhang-NOAA The dcp test failure may relate to the initialization of "water_nc" in field_table. It should be "0.0" but not 1.e30. It is a bug I found while testing the WoFS_v0 suite. I suppose it is also presented in the field_table for rrfs_v1nssl suite. Please confirm.

# prognostic cloud water number concentration
 "TRACER", "atmos_mod", "water_nc"
           "longname",     "cloud liquid water number concentration"
           "units",        "/kg"
       "profile_type", "fixed", "surface_value=0.0" /

ywangwof avatar Apr 18 '22 15:04 ywangwof

@ChunxiZhang-NOAA The dcp test failure may relate to the initialization of "water_nc" in field_table. It should be "0.0" but not 1.e30. It is a bug I found while testing the WoFS_v0 suite. I suppose it is also presented in the field_table for rrfs_v1nssl suite. Please confirm.

# prognostic cloud water number concentration
 "TRACER", "atmos_mod", "water_nc"
           "longname",     "cloud liquid water number concentration"
           "units",        "/kg"
       "profile_type", "fixed", "surface_value=0.0" /

@ywangwof It is '0.0' in the field_table in the rrfs_v1nssl tests.

ChunxiZhang-NOAA avatar Apr 18 '22 15:04 ChunxiZhang-NOAA

2. One question for Ted: why GFS_suite_interstitial_3 GFS_suite_interstitial_4 are not included in the rrfs_v1nssl and WoFS_v0 SDFs?

@ChunxiZhang-NOAA The interstitial routines are left out because there is no convection parameterization scheme. As far as I could tell, those routines do not do anything if a CPS is not active, so I took them out. If that is incorrect, I can put them back in.

MicroTed avatar Apr 18 '22 16:04 MicroTed

  1. One question for Ted: why GFS_suite_interstitial_3 GFS_suite_interstitial_4 are not included in the rrfs_v1nssl and WoFS_v0 SDFs?

@ChunxiZhang-NOAA The interstitial routines are left out because there is no convection parameterization scheme. As far as I could tell, those routines do not do anything if a CPS is not active, so I took them out. If that is incorrect, I can put them back in.

@ChunxiZhang-NOAA To chime in, I agree with @MicroTed that those schemes can safely be left out of the SDF in this suite.

grantfirl avatar Apr 18 '22 16:04 grantfirl

@ChunxiZhang-NOAA Regarding the restart test, I believe the restart ORT failure for regional_3km_wofs may have to do with the test setup. I was working to resolve this, but couldn't figure it out. Similarly, I could not set up a restart ORT for regional_3km. @RatkoVasic-NOAA Is it possible to set a test case for restart regression test for regional_3km, so that I can correctly configure ORT test?

MinsukJi-NOAA avatar Apr 18 '22 16:04 MinsukJi-NOAA

We need to run the ORT test(threading, decomposition, restart reproducibility test and debug test ) for new physics suite.

@junwang-noaa Is there a particular procedure laid out somewhere for these tests that we can follow? It didn't come up in the CCPP process, though it seems it probably should have. Thanks!

MicroTed avatar Apr 18 '22 16:04 MicroTed

@RatkoVasic-NOAA Is it possible to set a test case for restart regression test for regional_3km, so that I can correctly configure ORT test?

@MinsukJi-NOAA I'm back and working on it.

RatkoVasic-NOAA avatar Apr 18 '22 16:04 RatkoVasic-NOAA