CTSM icon indicating copy to clipboard operation
CTSM copied to clipboard

ctsm5.3.059: Various cleanup efforts from the cesm3_0_beta04 tags for testing and usability

Open ekluzek opened this issue 11 months ago • 8 comments

Description of changes

Various updates for testing and other problems identified in the cesm3_0_beta04 tag. So fixes and cleanup for usability. Including the following:

  • Fix SHR_ASSERT so single-point matrix test passes
  • ne3np4 to namelist_defaults_ctsm.xml and Makefile for PTS mode and add ability for warm starts in PTS mode
  • f19 + f45 16pft fsurdat/landuse files to namelist_defaults_ctsm + Makefile
  • Changes in the FORTRAN code to properly abort when fire-emission is asked for it can't be provided. Added unit testing for this.

Specific notes

Contributors other than yourself, if any: @slevis-lmwg

CTSM Issues Fixed (include github issue #): Fixes #2868 Fixes #2791 Fixes #2768 Fixes #2780 Fixes #2762 Fixes #3073 Some of #2810 CTSM namelist checking for: https://github.com/NGEET/fates/issues/1356 Some work on https://github.com/ESCOMP/CTSM/issues/2643

Are answers expected to change (and if so in what way)? No

Any User Interface Changes (namelist or namelist defaults changes)? Yes

Does this create a need to change or add documentation? Did you do so? No

Testing performed, if any: regular

PR's Involved:

#2840 #2835 #2834 #2844

ekluzek avatar Feb 04 '25 23:02 ekluzek

@ekluzek What's the status of this? Is any of it ready to come in / can any of it be split out into different PRs?

samsrabin avatar Apr 15 '25 20:04 samsrabin

@samsrabin this is my next tag to bring in. We created this branch a long time ago because we thought that the PR's on it would flood b4b-dev too much. But, it turned out the work on it was spread out long enough that that wouldn't have been an issue. At least for this second go around. With b4b-dev we have a clock that ensures changes come in regularly, but we didn't have a process to that here. So it didn't happen. I think that means that in the future we should either slide these things as individual tags to master, or bring them to b4b-dev. Or think about the process that should be in place for a temporary branch like this....

It was brought into this branch as separate PR's similarly to b4b-dev and the first go around with this work. So you can view the contributions singly with those PR's already.

With your ask -- I thought about rebasing the PR's that went into this onto b4b-dev. But at this point I think that would only serve to create more work and slow the work down. So I won't do that now, but this can serve as something to think about as lessons for the future. I'll add this to the Thursday discussion.

ekluzek avatar Apr 16 '25 18:04 ekluzek

These tests fail on setup because they are using fire-emis with SP cases. So I'll reconfigure these tests to make sure the testmods used include nofireemis.

ERP_D_Ld3_PS.f09_g17.I2000Clm50Sp.derecho_intel.clm-prescribed (SETUP) ERP_D_Ld5.f10_f10_mg37.I2000Clm60Sp.derecho_intel.clm-decStart (SETUP) ERP_D_Ld5.f10_f10_mg37.IHistClm45Sp.derecho_intel.clm-decStart (SETUP) ERP_D_Ld5.f10_f10_mg37.IHistClm60Sp.derecho_intel.clm-default (SETUP) ERP_D_Ld5.ne30pg3_t232.IHistClm60Sp.derecho_intel.clm-default (SETUP) ERP_P64x2_D.f10_f10_mg37.I2000Clm50SpRtmFl.derecho_intel.clm-default (SETUP) ERP_P64x2_D_Ld5.f10_f10_mg37.I2000Clm45Sp.derecho_intel.clm-default (SETUP) ERP_P64x2_D_Ld5.f10_f10_mg37.I2000Clm50Sp.derecho_gnu.clm-default (SETUP) ERP_P64x2_D_Ld5.f10_f10_mg37.I2000Ctsm50NwpSpGswp.derecho_intel.clm-default (SETUP) ERS_D_Ld10.f10_f10_mg37.IHistClm50Sp.derecho_intel.clm-collapse_pfts_78_to_16_decStart_f10 (SETUP) NCK_Ld1.f10_f10_mg37.I2000Clm50Sp.derecho_intel.clm-default (SETUP) SMS_C2_D_Lh12.f10_f10_mg37.I2000Clm50Sp.derecho_intel.clm-pauseResume (SETUP) SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.derecho_gnu.clm-ptsRLA (SETUP) SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.derecho_gnu.clm-ptsROA (SETUP) SMS_D_Ld1_Mmpi-serial.ne3_ne3_mg37.I2000Clm50SpRs.derecho_intel.clm-ptsRLA (SETUP) SMS_D_Ld1_PS.f09_g17.I1850Clm50Sp.derecho_intel.clm-default (SETUP) SMS_D_Ln9_P128x3.f19_g17.IHistClm50Sp.derecho_intel.clm-waccmx_offline (SETUP) SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60SpRs.derecho_intel.clm-default--clm-NEON-TOOL (SETUP) SMS_Ld1_PS.nldas2_rnldas2_mnldas2.I2000Ctsm50NwpSpNldas.derecho_gnu.clm-default (SETUP) SMS_Ld1_PS.nldas2_rnldas2_mnldas2.I2000Ctsm50NwpSpNldasRs.derecho_gnu.clm-default (SETUP) SMS_P384x2_D_Ld5.f19_g17.I2000Clm50Sp.derecho_intel.clm-default (SETUP)

ekluzek avatar Apr 16 '25 18:04 ekluzek

@slevis-lmwg and I just talked to @jtruesdal about this one. We decided we'll bring this in as is. So I'll finish it off. And re-target my other branch that was going to come in for master or b4b-dev.

I had in my mind that my last bit of changes needed to come in with this tag. And I haven't been able to get back to that little bit. I still want/need to finish that off. But, in the spirit of bringing things in when they are ready this part can just come in as is. This is in the spirit of Continuous Integration (CI) from a coding perspective. As well as the freedom we have to bring smaller tags in because of b4b-dev and our having stable versions with minor version updates for scientists. So more small tags is something that we can move toward. There's still some balance because we often have a backlog of tags to go to master. But, this has been outstanding for a long time and we should have brought it in a long time ago.

ekluzek avatar Jun 11 '25 19:06 ekluzek

I updated to ctsm5.3.057 and testing for aux_clm is passing as expected on Derecho and Izumi.

ekluzek avatar Jun 14 '25 17:06 ekluzek

I ran some extra tests to make sure the issues were resolved. These PASS:

PEA.1x1_smallvilleIA.IHistClm50BgcCropQianRs.derecho_gnu.clm-smallville_dynurban_monthly PEA_D.1x1_smallvilleIA.IHistClm50BgcCropQianRs.derecho_gnu.clm-smallville_dynurban_monthly SEQ_D_PS.f09_f09_mt232.I1850Clm50Sp.derecho_intel.clm-default--clm-nofireemis SEQ_PS.f09_f09_mt232.I1850Clm50Sp.derecho_intel.clm-default--clm-nofireemis SMS_D_Ln9.f19_f19_mg17.FWma2000climo.derecho_intel.cam-outfrq9s_waccm_ma_mam4 SMS_D_Ln9_P1280x1.ne0CONUSne30x8_ne0CONUSne30x8_mt12.FCHIST.derecho_intel.cam-outfrq9s SMS_D_Ln9_P1280x1.ne0CONUSne30x8_ne0CONUSne30x8_mt12.FCnudged.derecho_intel.cam-outfrq9s SMS_Ld12_Mmpi-serial.1x1_vancouverCAN.I1PtClm60SpRs.derecho_gnu.clm-output_sp_highfreq SMS_Ln9.ne3pg3_ne3pg3_mg37.I2000Clm50Sp.derecho_gnu.clm-clm50cam6LndTuningMode--clm-nofireemis SMS_Ly1_Mmpi-serial.1x1_brazil.IHistClm60BgcQianRs.derecho_intel.clm-output_bgc_highfreq SSPMATRIXCN_Ly5_Mmpi-serial.1x1_numaIA.I2000Clm60BgcCropQianRs.derecho_intel.clm-ciso_monthly

These fail and need to be figured out:

ERS_Ld3.f19_f19_mg17.FXHIST.derecho_intel.cam-waccmx_weimer (SETUP) SMS_D_Ln9.f19_f19_mg17.FXHIST.derecho_intel.cam-outfrq9s_amie (SETUP) SMS_D_Ln9_P1280x1.ne0ARCTICne30x4_ne0ARCTICne30x4_mt12.FHIST.derecho_intel.cam-outfrq9s (NLCOMP RUN)

ekluzek avatar Jun 17 '25 08:06 ekluzek

It looks like the problem with the f19 tests are the unusual RUN_STARTDATE of: 2005-12-31 and 2003-10-28. It fails with the problem we have been seeing where use_init_interp isn't coordinated correctly with the IC files.

The ARCTIC grid fails in an ESMF regrid and I think it's likely due to too few processors. The default for CAM is 91 nodes, which is 11k tasks, so 10X larger than the tasks asked for. So I think it will likely work with more processors.

ekluzek avatar Jun 18 '25 19:06 ekluzek

Since, I had some variety of tests failing, I sent the ctsm_sci test list and these tests failed as they need the --clm-nofireemis added as an extra testmod to them.

SMS_Ld5.f09_g17.IHistClm50SpCru.derecho_intel.clm-default (SETUP) SMS_Ld5.f19_g17.IHistClm50SpCru.derecho_intel.clm-default (SETUP) SMS_Lm12.f09_f09_mg17.I1850Clm60Sp.derecho_intel.clm-ExcessIceStartup_output_sp_exice (SETUP) SMS_Lm12.f09_t232.I1850Clm60SpCrujra.derecho_intel.clm-ExcessIceStartup_output_sp_exice (SETUP)

ekluzek avatar Jun 18 '25 19:06 ekluzek

In the meeting this morning, I was encouraged to make the tag as is. And file issues for anything outstanding. I have created baselines that just need to be renamed. And I'll start the ChangeLog.

ekluzek avatar Jun 20 '25 19:06 ekluzek

Some tests have different answers because the fieldlists change because fire-emission is off.

ERP_D_Ld3_PS.f09_g17.I2000Clm50Sp.derecho_intel.clm-prescribed ERS_D_Ld10.f10_f10_mg37.IHistClm50Sp.derecho_intel.clm-collapse_pfts_78_to_16_decStart_f10 SMS_C2_D_Lh12.f10_f10_mg37.I2000Clm50Sp.derecho_intel.clm-pauseResume SMS_D_Ln9_P128x3.f19_g17.IHistClm50Sp.derecho_intel.clm-waccmx_offline ERI_D_Ld9_P48x1.f10_f10_mg37.I2000Clm50Sp.izumi_nag.clm-SNICARFRC ERP_D_Ld5_P48x1.f10_f10_mg37.I2000Clm50Sp.izumi_nag.clm-o3lombardozzi2015 ERS_D.f10_f10_mg37.I1850Clm60Sp.izumi_nag.clm-ExcessIceStreams

Also the ctsm_sci comparison to ctsm5.3.051 changes answers for VOC's because the MEGAN change came in just after that.

ekluzek avatar Jun 23 '25 07:06 ekluzek