CDEPS
CDEPS copied to clipboard
[WIP] Plumber updates
This will address #248 In order to implement PLUMBER capabilities
From meeting with @ekluzek
- All externals remain "frozen" until ./manage_externals gets removed (which is cesm tag beta18).
- Because this PR changes externals, it needs to wait until after that happens.
- People with permissions to merge this PR are Chris Fisher, Bill Sacks, and Jim Edwards.
A note on somethings we need to do before this is asked for merge:
- [x] As above wait until cesm2_3_beta18 freeze is resolved
- [x] Test with it in CTSM and make sure it'll work for PLUMBER2 cases
- [x] Add a aux_cdeps test for PLUMBER2? (Not going to do, eventually should do this both for NEON and PLUMBER2)
- [x] Run the aux_cdeps testlist and make sure it runs the same as a baseline for it (run a baseline first) (this code adds new functionality that shouldn't mess with anything else -- but to make sure it doesn't...)
- [x] Ask a CDEPS maintainer to merge it in (a new tag is created when they do)
Agreed in today's ctsm software meeting: @TeaganKing will notify @slevis-lmwg when he should run the aux_cdeps test-suite.
Per conversation with Erik, we can remove the files listed in PLUMBER2 user mod directories because these will be implemented in another PR (#277 ); those do not need to be moved to CDEPS.
However, we do need to implement the dtlimit used for these various streams specifically for PLUMBER-- hence the placeholder values that I still need to ensure work properly for changing dtlimit when CLM_USRDAT_NAME is set to PLUMBER.
Variables in those user mod directories that are duplicated in the CDEPS stream can be removed once this PR is merged in.
The dtlimit is updated as expected when running CTSM when I do an xmlchange to set CLM_USRDAT_NAME to PLUMBER. So, @slevis-lmwg , I think we can run the aux_cdeps test-suite. Note that the CTSM changes (https://github.com/ESCOMP/CTSM/pull/2485 and https://github.com/ESCOMP/CTSM/pull/2406) are not yet available (since they're dependent on this CDEPS PR).
This PR introduced CLM_USRDAT_NAME as PLUMBER2 instead of PLUMBER, so I will update that now.
@TeaganKing I want to confirm that I understand. I need to combine the branches from these three PRs: https://github.com/ESCOMP/CTSM/pull/2485 https://github.com/ESCOMP/CTSM/pull/2406 #262 before I start the aux_cdeps test-suite, right?
Also, a note to myself: The checklist points out that I need to generate a baseline.
@TeaganKing I want to confirm that I understand. I need to combine the branches from these three PRs: ESCOMP/CTSM#2485 ESCOMP/CTSM#2406 #262 before I start the aux_cdeps test-suite, right?
Also, a note to myself: The checklist points out that I need to generate a baseline.
ESCOMP/CTSM#2406 is very much still in progress, and there are going to be a few changes to ESCOMP/CTSM#2485 still as well (I'll do this within the next few days). What exactly is being tested with the aux_cdeps tests? I personally tested this one just by doing an xmlchange to set CLM_USRDAT_NAME to PLUMBER2, building the case, and checking the input files.
ESCOMP/CTSM#2406 is very much still in progress, and there are going to be a few changes to ESCOMP/CTSM#2485 still as well (I'll do this within the next few days). What exactly is being tested with the aux_cdeps tests? I personally tested this one just by doing an xmlchange to set CLM_USRDAT_NAME to PLUMBER2, building the case, and checking the input files.
Ok, based on this information, I think I could go ahead and submit aux_cdeps with #262 with ctsm from master (I will try ctsm5.2.007 which is the current latest).
I tried and failed to generate a baseline using the latest ctsm paired with cdeps1.0.38, i.e. the same cdeps that I see in @TeaganKing's branch:
./run_sys_tests -s aux_cdeps --skip-compare -g cdeps1.0.38_ctsm5.2.008
I also tried and failed to generate a baseline using the latest ctsm paired with cdeps1.0.34, i.e. the default cdeps for ctsm5.2.008:
./run_sys_tests -s aux_cdeps --skip-compare -g cdeps1.0.34_ctsm5.2.008
The former seems less surprising, if e.g. there are incompatibilities between ctsm5.2.008 and cdeps1.0.38.
The latter though means that I have a problem with aux_cdeps (environment or other?) or that aux_cdeps has a problem (in which case it should fail for others, as well).
@TeaganKing at this point I will need help from @ekluzek with this. I will raise the issue at Monday's stand-up.
I encountered the same problem this morning even with aux_clm and ctsm_sci. This helped me realize that the problem may be as simple as setting an account number that hasn't expired. I will try this again today or tomorrow.
UPDATE 1: I submitted the same two tests. I expect that at least the cdeps1.0.34 should work and generate a baseline.
UPDATE 2: Worked out the opposite from what I expected:
- ctsm5.2.008 with default cdeps1.034 gave three failures.
- ctsm5.2.008 with cdeps1.0.38 passed. This is convenient because the current version of the branch in this PR modifies cdeps1.0.38.
UPDATE 3:
Submitted aux_cdeps comparing this branch to the baseline (tests_0703-140019de).
./run_sys_tests -s aux_cdeps --skip-generate -c cdeps1.0.38_ctsm5.2.008
@TeaganKing two updates:
- Erik clarified that the second checkbox (currently unchecked) is asking you to run one or more plumber cases to confirm that they work.
- aux_cdeps fails for several tests with this error during the build phase:
2024-07-03 14:01:30: Test 'SMS_Ld5.f10_f10_mg37.2000_DATM%NLDAS2_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel' failed in phase 'SETUP' with exception 'ERROR: Fatal error in case.cmpgen_namelists: 2024-07-03 14:01:29 atm
Create namelist for component datm
Calling /glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml
Running /glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml
Traceback (most recent call last):
File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml", line 336, in <module>
_main_func()
File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml", line 332, in _main_func
buildnml(case, caseroot, "datm")
File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml", line 311, in buildnml
_create_namelists(case, confdir, inst_string, namelist_infile, nmlgen, data_list_path)
File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml", line 211, in _create_namelists
streams = StreamCDEPS(stream_file, schema_file)
File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/../../cime_config/stream_cdeps.py", line 65, in __init__
GenericXML.__init__(self, infile, schema)
File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/XML/generic_xml.py", line 78, in __init__
self.read(infile, schema)
File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/XML/generic_xml.py", line 129, in read
self.read_fd(fd)
File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/XML/generic_xml.py", line 159, in read_fd
self.tree = ET.parse(fd)
File "/glade/work/slevis/conda-envs/ctsm_pylib/lib/python3.7/xml/etree/ElementTree.py", line 1197, in parse
tree.parse(source, parser)
File "/glade/work/slevis/conda-envs/ctsm_pylib/lib/python3.7/xml/etree/ElementTree.py", line 598, in parse
self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 4083, column 15
ERROR: /glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml /glade/derecho/scratch/slevis/tests_0703-140019de/SMS_Ld5.f10_f10_mg37.2000_DATM%NLDAS2_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel.C.0703-140019de_int FAILED, see above'
File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/test_scheduler.py", line 1125, in _run_catch_exceptions
return run(test)
File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/test_scheduler.py", line 1016, in _setup_phase
"Fatal error in case.cmpgen_namelists: {}".format(output),
File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/utils.py", line 176, in expect
raise exc_type(msg)
In case it helps, here's a list of tests that PASS versus FAIL:
PASS SMS_Ld2.ww3a.2000_SATM_SLND_SICE_SOCN_SROF_SGLC_DWAV%CLIMO.derecho_intel RUN
PASS SMS_Ld3.f09_f09_mg17.1850_SATM_DLND%SCPL_SICE_SOCN_SROF_SGLC_SWAV.derecho_intel RUN
PASS SMS_Ly3.f10_f10_ais8gris4_mg37.2000_SATM_SLND_SICE_SGLC_SROF_DGLC%NOEVOLVE_SWAV.derecho_intel RUN
PASS SMS_Ly3.f10_f10_ais8_mg37.2000_SATM_SLND_SICE_SGLC_SROF_DGLC%NOEVOLVE_SWAV.derecho_intel RUN
PASS SMS_Ly3.f19_g17_gris4.2000_SATM_SLND_SICE_SGLC_SROF_DGLC%NOEVOLVE_SWAV.derecho_intel RUN
As far as I can tell, the PEND failures report the same error as the FAIL in this list:
FAIL SMS_Ld5.f10_f10_mg37.1850_DATM%GSWP3v1_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ld5.f10_f10_mg37.2000_DATM%CRUv7_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ld5.f10_f10_mg37.2000_DATM%NLDAS2_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ld5.f10_f10_mg37.2000_DATM%QIA_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ld5.f10_f10_mg37.2010_DATM%GSWP3v1_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ld5.f10_f10_mg37.HIST_DATM%GSWP3v1_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ld5.f10_f10_mg37.SSP585_DATM%GSWP3v1_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ld5_P1.1x1_mexicocityMEX.2000_DATM%1PT_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel.datm-1PT SHAREDLIB_BUILD
PEND SMS_Ld5.T62_g17.2000_DATM%IAF_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ld5.T62_g17.2000_DATM%NYF_SLND_DICE%IAF_DOCN%DOM_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ld5.T62_g17.2000_DATM%NYF_SLND_DICE%SSMI_DOCN%DOM_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ld5.T62_g17.2000_DATM%NYF_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ld5.TL319_t061.2000_DATM%JRA-1p4-2018_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ld5.TL319_t061.2000_DATM%JRA_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ln5.f19_f19_mg17.2000_DATM%QIA_SLND_SICE_DOCN%DOM_SROF_SGLC_SWAV.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ln5.f19_f19_mg17.2000_DATM%QIA_SLND_SICE_DOCN%SOMAQP_SROF_SGLC_SWAV.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ln5.f19_f19_mg17.HIST_DATM%QIA_SLND_SICE_DOCN%DOM_SROF_SGLC_SWAV.derecho_intel SHAREDLIB_BUILD
PEND SMS_Ln9_P1.T42_T42.2000_DATM%QIA_SLND_SICE_DOCN%DOM_SROF_SGLC_SWAV.derecho_intel.datm-scam SHAREDLIB_BUILD
My quick look at the above lists suggests that SATM tests PASS and DATM tests fail.
Thank you for running these tests and clarifying the 2nd checkbox item!
Regarding actually running the PLUMBER case, we don't have run_tower() fully functioning at the moment. I was thinking it may be most helpful to move this in and then finalize run_tower() since it will require these changes?
@TeaganKing if your suggestion does not affect whether the aux_cdeps test-suite can be fixed in this PR (which I suspect and hope is true), then I would be fine with that. Still, I would like @ekluzek to also weigh in on your suggestion.
Following up from a discussion with @ekluzek , I will do the following:
- [x] Edit
stream_definition_datm.xmlfile for a potential formatting error (have done locally, in particular we were looking at $CLM_USRDAT vs CLM_USRDAT, == vs = in the dtlimit statement, and updates to the documentation at the top of this file to include NEON & PLUMBER) - [x] Make a few edits to
buildnmlbased on needs for implementing plumber that have come up in #2406 - [x] run
aux_cdepstests and see if error was fixed:./run_sys_tests -s aux_cdeps --skip-generate --skip-compare
The last issue that I'm running into in buildnml adjustments is that the start dates in the filenames for the PLUMBER files in /glade/campaign/cesm/cesmdata/inputdata/atm/datm7/CLM1PT_data/PLUMBER2/*/CLM1PT_data/ don't match the dDATM_YR_ALIGN nor the DATM_YR_START variables. @wwieder was the DATM_YR_ALIGN supposed to match the filename here? I'm a bit confused on the numerous year labels/time zones/actual years. I was thinking that we could add an additional XML variable to describe this year but that (hopefully) may be a bit unnecessary.
Eg, hard-coding 2010 into the filepath in datm/cime_config/stream_definition_datm.xml for Au-Ctr works when running ./run_tower --plumber-sites AU-Ctr (when including the changes from PLUMBER5.2 PR and Plumber 2 Implementation PR)
Because PLUMBER2 data are all provided in local time, there a number of complexities with handling YR_START, YR_END and YR_ALIGN related to how datm and LAI stream files handle time. @olyson set things up so that these details are handled differently for spinup (I2000 compsets) vs. transient (HIST compsets) cases.
Currently we're handling this with usermods, which are generated by a python script that seems to be working correctly. After taking this over with @TeaganKing I'd suggest we keep this level of control in CTSM, instead of in CDEPS.
@ekluzek, when we briefly touched on this, you didn't seem supportive of this idea. Can you think of a simple way to migrating this into cdeps, or for simplicity can we leave the customization in usermods?
I also want to add that keeping this in the usermods would utilize a different method of implementation than neon which could be confusing for users. I'm somewhat inclined to add a new xml variable 'FILENAME_DATM_START' to be used in the datm stream, but that also could be a bit unnecessary and only relevant to this specific case.
Hi @jedwards4b and @billsacks , In a discussion with @wwieder and @ekluzek we decided that it would be best to include an xml variable DATM_YR_START_FILENAME in order to find the relevant PLUMBER file to use as input (the start year listed in the filename is not necessarily the actual start year given that the local time zone is used). These changes were implemented in datm/cime_config/config_component.xml, stream_definition_datm.xml, and the script used to generate the usermods for plumber cases (plumber2_usermods.py). We wanted to run this by you both to see if you had any feedback or concerns regarding this method of implementing the PLUMBER capabilities.
The alternative option that we thought of would be to leave the input files as adjustments to the namelists that could be done in the usermods directories; however, this would be a different (and hence potentially confusing to users) implementation compared to the neon site input file implementation.
Thanks for explaining this, @TeaganKing . I appreciate your laying out the details as well as the alternative that you considered - that's very helpful context.
I feel like I'm missing something, because I'm not seeing where DATM_YR_START_FILENAME is actually used: I see it set in a lot of places, but I can't find where it is used. Is it supposed to appear instead of the 2010 in this line?:
<file first_year="$DATM_YR_START" last_year="$DATM_YR_END">$DIN_LOC_ROOT/atm/datm7/CLM1PT_data/PLUMBER2/${PLUMBER2SITE}/CLM1PT_data/CTSM_DATM_${PLUMBER2SITE}_2010-${DATM_YR_END}.nc</file>
It makes sense to me to have an xml variable for this purpose. It seems like the ideal situation would be to have consistency between the different possible datm modes... something like always using this DATM_YR_START_FILENAME and having it default to DATM_YR_START if it isn't explicitly set... but I imagine that would take a lot to implement. So for now I'm fine with it being done the way it is if others are (and I appreciate the documentation you have added to this variable).
Yes, that's precisely where it will be implemented (replacing the 2010 in that example). Okay, thanks for the feedback!
Hi @jedwards4b and @billsacks , would one of you be able to review this PR? The aux_cdeps tests are now passing. @ekluzek and I discussed this yesterday as well. Note that this will be used in conjunction with CTSM #2406 and CTSM #2485-- both of which we are planning to bring in to CTSM once this is merged.
Thanks @TeaganKing ! I'm taking a look at this now....
@TeaganKing this is great, thanks for working on this and getting it figured out. I appreciate your sticking it out even though it got difficult.
I ask for a small change, but marking as approved so you don't have to wait for me to reapprove.
I also would've wanted you to add testing around this in CDEPS, but realized the reason for that is that CLM_USRDAT_NAME, NEONSITE, and PLUMBER2SITE are all defined in CTSM. I think we should move them to here in CDEPS so that we can do testing for them in CDEPS and not just in CLM. I'll make an issue around doing that here. This would be a longer term code health/testing improvement type of thing. I suspect it would've been a lot easier to do a lot of the testing here if we could test the CDEPS side for NEON and PLUMBER2 with compsets without CLM.
@TeaganKing in terms of things to do it looks like aux_cdeps passes, have you also run the latest in CTSM to ensure it works? Just want to make sure all the boxes are ticked off..
@ekluzek I will open up an issue for testing these if they are moved into CDEPS.
In regards to running the latest in CTSM, are you just proposing pulling in the most recent CTSM changes, replacing the cdeps directory with this PR, and then running aux_cdeps?
Thanks for reviewing this @billsacks ! I'm also not totally following the testing suggestion from @ekluzek -- Erik, do you mind explaining that a bit more?
Sorry I just wanted to make sure we had all the boxes checked off here:
https://github.com/ESCOMP/CDEPS/pull/262#issuecomment-2101161162
Namely making sure this CMEPS code works in a branch of CTSM where you are bringing this CMEPS update into CTSM. And run CTSM for a few PLUMBER2 sites to make sure it works fine inside of CTSM.
This is where if we could add PLUMBER2 tests to CDEPS -- you'd just have to run aux_cdeps. But, because we can't we need to make sure they work from within CTSM, for running PLUMBER2 sites in a CTSM checkout. Does that make sense now?
Sorry I just wanted to make sure we had all the boxes checked off here:
Namely making sure this CMEPS code works in a branch of CTSM where you are bringing this CMEPS update into CTSM. And run CTSM for a few PLUMBER2 sites to make sure it works fine inside of CTSM.
This is where if we could add PLUMBER2 tests to CDEPS -- you'd just have to run aux_cdeps. But, because we can't we need to make sure they work from within CTSM, for running PLUMBER2 sites in a CTSM checkout. Does that make sense now?
Okay. So I have run this for a few PLUMBER2 sites and it worked fine inside of CTSM-- but that required using the changes from both of the CTSM PLUMBER-related PRs.
Perfect, that's what I wanted to make sure had been done. It's normal for CTSM work to also depend on an external like CDEPS. Once, you can show the thing both works with the new CDEPS and other CTSM changes -- and doesn't break other CDEPS cases, we make the CDEPS tag. Then you can point to it in your CTSM PR, which still probably will take work to come into the CTSM side.
So I checked off all the checkboxes now.
@billsacks please press the merge button! Thanks everyone!