Change history time to be equal to the middle of the time bounds
Description of changes
This PR subsets the scope of issue #1059 and PR #2445 as a result of the October 2024 conversation in #2445. This PR changes history time to be equal to the middle of the time bounds. This PR does not put instantaneous fields on their own separate history files.
I will also bring submodule changes from https://github.com/ESCOMP/MOSART/pull/106 (was https://github.com/ESCOMP/MOSART/pull/69) and https://github.com/ESCOMP/RTM/pull/39.
Specific notes
Contributors other than yourself, if any:
Are answers expected to change (and if so in what way)? No.
Does this create a need to change or add documentation? Did you do so? Maybe. No.
Testing performed, if any: Plan to run aux_clm, mosart, rtm test-suites.
I submitted this manual test to confirm that the committed modifications work as intended:
./create_test SMS_Lm1.f10_f10_mg37.I1850Clm60BgcCropCmip6waccm.derecho_gnu.clm-basic -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.009
Check on Friday.
The previous test completed its 1 month and the monthly output looked good, but there were annual history files that I could not tell. So I started another test (default is Ly1, but I changed to Ly2) and I added hist_avgflag_pertape(6) = 'I' to see what happens:
SMS_Ly2_Mmpi-serial.1x1_brazil.IHistClm60BgcQianRs.derecho_intel.clm-output_bgc_highfreq
PASS
I updated the submodules to point to https://github.com/ESCOMP/MOSART/pull/106 and https://github.com/ESCOMP/RTM/pull/39 and submitted the three corresponding test-suites:
OK ./run_sys_tests -s rtm -c rtm1_0_80-ctsm5.2.029 --skip-generate
OK ./run_sys_tests -s mosart -c mosart1.1.02-ctsm5.2.029 --skip-generate
OK ./run_sys_tests -s aux_clm -c ctsm5.3.009 --skip-generate
All the cases that differ from the baseline, differ only in the time variable.
UPDATE Repeating the rtm and mosart test-suites with the suggested code modification: https://github.com/ESCOMP/MOSART/pull/106#discussion_r1807115142
@ekluzek and I agreed on the order that the "hist" PRs would get merged. The order as shown in Upcoming Tags is #2838 #2084 #2052 ...and we will follow the same order in mosart/rtm. Before I merge the mosart/rtm "hist" PRs, @ekluzek will merge the work in the "simple bfb" mosart/rtm cards.
@ekluzek review and approval of this PR should take 5 minutes, as it looks the same as the corresponding RTM and MOSART PRs that you reviewed/approved. Thanks :-)
TODOs left for me:
- [x] Update to ctsm5.3.012
- [x] Update .gitmodules
- [x] mosart/rtm test suites
- [x] Run aux_clm
- Merge and make tag
izumi testing OK ./run_sys_tests -s aux_clm -c ctsm5.3.012 -g ctsm5.3.013 OK ./run_sys_tests -s mosart -c mosart1.1.04_ctsm5.3.009 -g mosart1.1.04_ctsm5.3.013
derecho testing
OK ./run_sys_tests -s rtm -c rtm1_0_82-ctsm5.3.009 -g rtm1_0_82-ctsm5.3.013
OK ./run_sys_tests -s mosart -c mosart1.1.04-ctsm5.3.009 -g mosart1.1.04-ctsm5.3.013
FAIL ./run_sys_tests -s aux_clm -c ctsm5.3.012 -g ctsm5.3.013
RXCROPMATURITYSKIPGEN_Ld1097.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput
with a conda error. Troubleshooting with @samsrabin
Also I'm getting diffs in the cpl and mosart output of two tests BUT both are 3-yr tests:
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.cpl.hi.1853-01-01-00000.nc.cprnc.out: RMS rofImp_Forr_rofi_glc 5.2778E-06 NORMALIZED 5.5871E+02
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.cpl.hi.1853-01-01-00000.nc.cprnc.out: RMS rofImp_Forr_rofl_glc 7.6124E-12 NORMALIZED 7.1083E+02
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.mosart.h0.1852-12.nc.cprnc.out: RMS DIRECT_DISCHARGE_TO_OCEAN_GLC_IC 2.7946E+00 NORMALIZED 5.2264E+02
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.mosart.h0.1852-12.nc.cprnc.out: RMS DIRECT_DISCHARGE_TO_OCEAN_GLC_LI 8.5847E-06 NORMALIZED 7.1784E+02
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.mosart.h0.1852-12.nc.cprnc.out: RMS QGLC_ICE_INPUT 2.0309E+00 NORMALIZED 3.7981E+02
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.mosart.h0.1852-12.nc.cprnc.out: RMS QGLC_LIQ_INPUT 8.2673E-06 NORMALIZED 6.9130E+02
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.mosart.h0.1852-12.nc.cprnc.out: RMS TOTAL_DISCHARGE_TO_OCEAN_ICE 2.7946E+00 NORMALIZED 1.4117E+01
ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int/ERS_Ly3_P64x2.f10_f10_mg37.IHistClm50BgcCropG.derecho_intel.clm-cropMonthOutput.GC.1113-122038de_int.mosart.h0.1852-12.nc.cprnc.out: RMS TOTAL_DISCHARGE_TO_OCEAN_LIQ 8.5847E-06 NORMALIZED 2.3013E-06
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.cpl.hi.0004-02-01-00000.nc.cprnc.out: RMS rofImp_Forr_rofi_glc 3.2518E-06 NORMALIZED 4.0968E+02
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.cpl.hi.0004-02-01-00000.nc.cprnc.out: RMS rofImp_Forr_rofl_glc 3.9452E-11 NORMALIZED 7.5332E+02
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.mosart.h0.0004-01.nc.cprnc.out: RMS DIRECT_DISCHARGE_TO_OCEAN_GLC_IC 1.8679E+00 NORMALIZED 3.8169E+02
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.mosart.h0.0004-01.nc.cprnc.out: RMS DIRECT_DISCHARGE_TO_OCEAN_GLC_LI 4.4943E-05 NORMALIZED 7.8313E+02
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.mosart.h0.0004-01.nc.cprnc.out: RMS QGLC_ICE_INPUT 1.5340E+00 NORMALIZED 3.1346E+02
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.mosart.h0.0004-01.nc.cprnc.out: RMS QGLC_LIQ_INPUT 4.4188E-05 NORMALIZED 7.6998E+02
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.mosart.h0.0004-01.nc.cprnc.out: RMS TOTAL_DISCHARGE_TO_OCEAN_ICE 1.8679E+00 NORMALIZED 7.2051E+00
SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int/SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis.GC.1113-122038de_int.mosart.h0.0004-01.nc.cprnc.out: RMS TOTAL_DISCHARGE_TO_OCEAN_LIQ 4.4943E-05 NORMALIZED 1.5745E-05
The diffs of the two tests above seem vaguely related to the earlier update to mosart1.1.02 (https://github.com/ESCOMP/MOSART/pull/94), but Adrianna's test passed just fine pointing to mosart1.1.02. So I will try the two tests pointing to mosart1.1.02 and mosart1.1.03:
./create_test SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.012
DIFF in mosart1.1.03
OK in mosart1.1.02
The same test from ctsm5.3.012: PASS
DID THESE TESTS EXIST WHEN I LAST RAN aux_clm? Yes (ctsm5.3.009). So now I checked out 1e81456 from above, pointed to mosart1.1.04/rtm1_0_82, and submitted:
DIFF ./create_test SMS_Lm37.f10_f10_mg37.I1850Clm50SpG.derecho_intel.clm-glcMEC_long--clm-nofireemis -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.009
which tells me that the diffs were there, and I didn't notice the first time I ran aux_clm.
BUT the same test pointing to mosart1.1.02 passes.
I brainstormed for a bit with @billsacks and Bill pointed out/suggested:
- these two are the only long tests with active cism
- the changes appear in the coupler due to the changes in mosart and not due to changes in ctsm
- that he would not expect these diffs (as I also didn't), so he would recommend going through a methodical way of testing, making baselines, and updating the code and the baselines to confirm whether I still get these diffs
@billsacks I mentioned to you a vague memory I had of an issue that could relate to these diffs, and it is this one: #2542
I mentioned to you a vague memory I had of an issue that could relate to these diffs, and it is this one: #2542
Ah, yes. But I don't think that should be the issue with these tests, right?
Right, I don't think so. I think I have now found that the problem starts with the introduction of mosart1.1.03. I see this in a new test today and in my testing from yesterday (somehow I missed the sign when I looked originally). First I will confirm beyond doubt and then I will try bisecting mosart1.1.03 to find the culprit.
then I will try bisecting mosart1.1.03 to find the culprit.
Looking at https://github.com/ESCOMP/MOSART/pull/70, I have converged on two commits: We removed two lines: 7749459 Instead of removing the two lines, we added if-statemt that is .false.: 692d183
I have confirmed that the two lines that we removed caused the diffs. @ekluzek I will check with you how to resolve this.
My first guess: Keep the if-statement but need changes elsewhere to make the if-statement be true as suggested in https://github.com/ESCOMP/MOSART/issues/103
Trying a case with cism NOT active and the if-statement still commented out (as in my last test)
PASS ./create_test SMS_D.f10_f10_mg37.I2000Clm60Bgc.derecho_intel -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.012
Submitted ./run_sys_tests -s aux_clm -c ctsm5.3.012 -g ctsm5.3.013
with the if-statement still commented out, to take advantage of the computer overnight.
FAIL RXCROPMATURITYSKIPGEN_Ld1097.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput
Troubleshooting with @samsrabin
Rerunning this test to generate a baseline, but stuck in the SHAREDLIB_BUILD phase, so I will kill it and try again next week. Besides, I will need to generate a ctsm5.3.014 baseline, so rerunning right now is redundant:
./create_test RXCROPMATURITYSKIPGEN_Ld1097.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput -c /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.012 -g /glade/campaign/cgd/tss/ctsm_baselines/ctsm5.3.012_hist_time_mid
I'm wondering if the "treat a file as instantaneous if its first variable is" might be premature to bring in here instead of #2445. Specifically, I think the time_bounds variable should for now be saved no matter what. Not having it messed up the RXCROPMATURITY test, and though I was able to work around it, others might not be.
[...] Specifically, I think the
time_boundsvariable should for now be saved no matter what. Not having it messed up the RXCROPMATURITY test, and though I was able to work around it, others might not be.
@olyson what do you think about @samsrabin's comment? Most concerning to me would be any vulnerability in the land diagnostic package.
I don't think the land diagnostics package uses time_bounds. But I may not understand the issue here. Seems like we could do a short simulation using this branch once stable and see if there are any problems? Also should check ILAMB. So maybe an I2000 case for a test.
@olyson Okay, good that the land diagnostics package probably doesn't use it, but yes would be nice to check. However, I'm thinking that other people's scripts might rely on the presence of time_bounds, as mine did. Removing it at this stage seems premature because it is still possible to have both instantaneous and average/etc. variables on the same file. Users might wonder (as I did) why time_bounds disappeared just because they happened to put an instantaneous variable first in the hist_fincl list.
And actually, this comment goes for the "exact middle" vs. "end of" change as well. It seems arbitrary (and against the "principle of least astonishment") that the first variable in the hist_fincl list should affect this. My understanding was that this would be changed to "exact middle" for all history files for now, and we would just accept that being wrong for instantaneous variables.
A bonus from what I'm proposing: Always (a) including time_bounds and (b) setting time to the exact middle means that people always have what they need for postprocessing either instantaneous variables (just look at the second value in time_bounds) or averaged/etc. variables (either both values in time_bounds or the value in time).
The standard diagnostics package doesn't use time_bounds as far as I can tell. ILAMB may, it just needs to be tested. Always including time_bounds sounds fine to me.
time_bounds is an expected part of the CF convention, so I do endorse using it for anything with time. That's likely why some tools might expect it to be there.
For instantaneous it should likely be the time bounds of the time-step that was output. So the previous time-step time first to the ending time-step time for the endpoint. You could have both being the same ending time-step, but that doesn't show that it is a model with a finite time-step.
Here's information on the CF Convention attributes. Look up "bounds"...
https://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#attribute-appendix
@ekluzek That link contains the following text, which implies that instantaneous files should actually not have time_bounds:
It is often the case that data values are not representative of single points in time and/or space, but rather of intervals or multidimensional cells. This convention defines a bounds attribute to specify the extent of intervals or cells.
Good point @samsrabin. That convinces me we should remove it for I fields then. That would then be in line with the convention.
We don't similarly report on the grid cell bounds either (which could be done for 2D grids, but would be harder for unstructured grids), so we shouldn't for Instantaneous time fields either.
Earlier discussion led to the conclusion that we should not have time_bounds in instantaneous files (though CAM's mistakenly still does): https://github.com/ESCOMP/CAM/issues/1166
I had a quick meeting with @samsrabin 40 minutes ago: I proposed and he agreed to an alternate version of the if-statement that rightly concerned him. The alternate version eliminates the risk of wrongly labeling a tape "instantaneous" just because the first field is instantaneous.
Oh, good, looks as though we're removing time_bounds from instantaneous tapes, as originally planned :-)
Ok, so I'm testing the alternate if-statement with aux_clm right now and then I will push it to the PR.
- [x] I must still make the same change to the corresponding rtm/mosart file. Move this TODO to the third "history" tag because that one still has mosart/rtm PRs that I have not merged.
How I'm checking whether diffs are expected:
./cs.status.fails | grep -v PASS | grep -v 'wise bit-for' | grep -v 'd_1: DIF'
grep NORM */*cprnc.out | grep -v time | grep -v 'ERS_D_Ld15.f45_f45_mg37.I2000Clm50FatesRs.derecho_intel.clm-FatesColdTwoStream' | grep -v 'P_P64x2_Lm13.f10_f10_mg37.IHistClm60Bgc.derecho_intel.clm-monthly--clm-matrixcnOn_ignore_warnings'
OK aux_clm except failure reported in the following post.
@samsrabin back to this test
/glade/derecho/scratch/slevis/tests_1121-124652de/RXCROPMATURITYSKIPGEN_Ld1097.f10_f10_mg37.IHistClm60BgcCrop.derecho_intel.clm-cropMonthOutput.GC.1121-124652de_int.gddgen
I get a new error. The problem and solution are not obvious to me from a quick look, but I'm happy to look together if you want.
@samsrabin I'm thinking please hold regarding my last post for now, because in my aux_clm tests with the two subsequent PRs, which included this one, the test passed.
Note to @ekluzek
- If you're up for approving this one, then I would be ready to move forward with the merge when it's time.
- You have seen it before; it should look very familiar, other than an addition from Sam Rabin pertaining to cropcal_module.py.
Note to self: Wait for Erik's upcoming mosart/rtm tags before proceeding with this merge.