CDEPS icon indicating copy to clipboard operation
CDEPS copied to clipboard

[WIP] Plumber updates

Open TeaganKing opened this issue 1 year ago • 4 comments
trafficstars

This will address #248 In order to implement PLUMBER capabilities

TeaganKing avatar Mar 08 '24 20:03 TeaganKing

From meeting with @ekluzek

  • All externals remain "frozen" until ./manage_externals gets removed (which is cesm tag beta18).
  • Because this PR changes externals, it needs to wait until after that happens.
  • People with permissions to merge this PR are Chris Fisher, Bill Sacks, and Jim Edwards.

slevis-lmwg avatar May 07 '24 22:05 slevis-lmwg

A note on somethings we need to do before this is asked for merge:

  • [x] As above wait until cesm2_3_beta18 freeze is resolved
  • [x] Test with it in CTSM and make sure it'll work for PLUMBER2 cases
  • [x] Add a aux_cdeps test for PLUMBER2? (Not going to do, eventually should do this both for NEON and PLUMBER2)
  • [x] Run the aux_cdeps testlist and make sure it runs the same as a baseline for it (run a baseline first) (this code adds new functionality that shouldn't mess with anything else -- but to make sure it doesn't...)
  • [x] Ask a CDEPS maintainer to merge it in (a new tag is created when they do)

ekluzek avatar May 08 '24 18:05 ekluzek

Agreed in today's ctsm software meeting: @TeaganKing will notify @slevis-lmwg when he should run the aux_cdeps test-suite.

slevis-lmwg avatar May 09 '24 17:05 slevis-lmwg

Per conversation with Erik, we can remove the files listed in PLUMBER2 user mod directories because these will be implemented in another PR (#277 ); those do not need to be moved to CDEPS.

However, we do need to implement the dtlimit used for these various streams specifically for PLUMBER-- hence the placeholder values that I still need to ensure work properly for changing dtlimit when CLM_USRDAT_NAME is set to PLUMBER.

Variables in those user mod directories that are duplicated in the CDEPS stream can be removed once this PR is merged in.

TeaganKing avatar May 09 '24 21:05 TeaganKing

The dtlimit is updated as expected when running CTSM when I do an xmlchange to set CLM_USRDAT_NAME to PLUMBER. So, @slevis-lmwg , I think we can run the aux_cdeps test-suite. Note that the CTSM changes (https://github.com/ESCOMP/CTSM/pull/2485 and https://github.com/ESCOMP/CTSM/pull/2406) are not yet available (since they're dependent on this CDEPS PR).

TeaganKing avatar Jun 25 '24 16:06 TeaganKing

This PR introduced CLM_USRDAT_NAME as PLUMBER2 instead of PLUMBER, so I will update that now.

TeaganKing avatar Jun 26 '24 20:06 TeaganKing

@TeaganKing I want to confirm that I understand. I need to combine the branches from these three PRs: https://github.com/ESCOMP/CTSM/pull/2485 https://github.com/ESCOMP/CTSM/pull/2406 #262 before I start the aux_cdeps test-suite, right?

Also, a note to myself: The checklist points out that I need to generate a baseline.

slevis-lmwg avatar Jun 27 '24 23:06 slevis-lmwg

@TeaganKing I want to confirm that I understand. I need to combine the branches from these three PRs: ESCOMP/CTSM#2485 ESCOMP/CTSM#2406 #262 before I start the aux_cdeps test-suite, right?

Also, a note to myself: The checklist points out that I need to generate a baseline.

ESCOMP/CTSM#2406 is very much still in progress, and there are going to be a few changes to ESCOMP/CTSM#2485 still as well (I'll do this within the next few days). What exactly is being tested with the aux_cdeps tests? I personally tested this one just by doing an xmlchange to set CLM_USRDAT_NAME to PLUMBER2, building the case, and checking the input files.

TeaganKing avatar Jun 28 '24 15:06 TeaganKing

ESCOMP/CTSM#2406 is very much still in progress, and there are going to be a few changes to ESCOMP/CTSM#2485 still as well (I'll do this within the next few days). What exactly is being tested with the aux_cdeps tests? I personally tested this one just by doing an xmlchange to set CLM_USRDAT_NAME to PLUMBER2, building the case, and checking the input files.

Ok, based on this information, I think I could go ahead and submit aux_cdeps with #262 with ctsm from master (I will try ctsm5.2.007 which is the current latest).

slevis-lmwg avatar Jun 28 '24 16:06 slevis-lmwg

I tried and failed to generate a baseline using the latest ctsm paired with cdeps1.0.38, i.e. the same cdeps that I see in @TeaganKing's branch: ./run_sys_tests -s aux_cdeps --skip-compare -g cdeps1.0.38_ctsm5.2.008

I also tried and failed to generate a baseline using the latest ctsm paired with cdeps1.0.34, i.e. the default cdeps for ctsm5.2.008: ./run_sys_tests -s aux_cdeps --skip-compare -g cdeps1.0.34_ctsm5.2.008

The former seems less surprising, if e.g. there are incompatibilities between ctsm5.2.008 and cdeps1.0.38.

The latter though means that I have a problem with aux_cdeps (environment or other?) or that aux_cdeps has a problem (in which case it should fail for others, as well).

@TeaganKing at this point I will need help from @ekluzek with this. I will raise the issue at Monday's stand-up.

slevis-lmwg avatar Jun 28 '24 23:06 slevis-lmwg

I encountered the same problem this morning even with aux_clm and ctsm_sci. This helped me realize that the problem may be as simple as setting an account number that hasn't expired. I will try this again today or tomorrow.

UPDATE 1: I submitted the same two tests. I expect that at least the cdeps1.0.34 should work and generate a baseline.

UPDATE 2: Worked out the opposite from what I expected:

  • ctsm5.2.008 with default cdeps1.034 gave three failures.
  • ctsm5.2.008 with cdeps1.0.38 passed. This is convenient because the current version of the branch in this PR modifies cdeps1.0.38.

UPDATE 3: Submitted aux_cdeps comparing this branch to the baseline (tests_0703-140019de). ./run_sys_tests -s aux_cdeps --skip-generate -c cdeps1.0.38_ctsm5.2.008

slevis-lmwg avatar Jul 01 '24 19:07 slevis-lmwg

@TeaganKing two updates:

  1. Erik clarified that the second checkbox (currently unchecked) is asking you to run one or more plumber cases to confirm that they work.
  2. aux_cdeps fails for several tests with this error during the build phase:
2024-07-03 14:01:30: Test 'SMS_Ld5.f10_f10_mg37.2000_DATM%NLDAS2_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel' failed in phase 'SETUP' with exception 'ERROR: Fatal error in case.cmpgen_namelists: 2024-07-03 14:01:29 atm
Create namelist for component datm
   Calling /glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml
   Running /glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml
Traceback (most recent call last):
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml", line 336, in <module>
    _main_func()
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml", line 332, in _main_func
    buildnml(case, caseroot, "datm")
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml", line 311, in buildnml
    _create_namelists(case, confdir, inst_string, namelist_infile, nmlgen, data_list_path)
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml", line 211, in _create_namelists
    streams = StreamCDEPS(stream_file, schema_file)
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/../../cime_config/stream_cdeps.py", line 65, in __init__
    GenericXML.__init__(self, infile, schema)
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/XML/generic_xml.py", line 78, in __init__
    self.read(infile, schema)
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/XML/generic_xml.py", line 129, in read
    self.read_fd(fd)
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/XML/generic_xml.py", line 159, in read_fd
    self.tree = ET.parse(fd)
  File "/glade/work/slevis/conda-envs/ctsm_pylib/lib/python3.7/xml/etree/ElementTree.py", line 1197, in parse
    tree.parse(source, parser)
  File "/glade/work/slevis/conda-envs/ctsm_pylib/lib/python3.7/xml/etree/ElementTree.py", line 598, in parse
    self._root = parser._parse_whole(source)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 4083, column 15
ERROR: /glade/work/slevis/git_externals/plumber_upd_pr262b/components/cdeps/datm/cime_config/buildnml /glade/derecho/scratch/slevis/tests_0703-140019de/SMS_Ld5.f10_f10_mg37.2000_DATM%NLDAS2_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel.C.0703-140019de_int FAILED, see above'
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/test_scheduler.py", line 1125, in _run_catch_exceptions
    return run(test)
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/test_scheduler.py", line 1016, in _setup_phase
    "Fatal error in case.cmpgen_namelists: {}".format(output),
  File "/glade/work/slevis/git_externals/plumber_upd_pr262b/cime/CIME/utils.py", line 176, in expect
    raise exc_type(msg)

slevis-lmwg avatar Jul 05 '24 19:07 slevis-lmwg

In case it helps, here's a list of tests that PASS versus FAIL:

    PASS SMS_Ld2.ww3a.2000_SATM_SLND_SICE_SOCN_SROF_SGLC_DWAV%CLIMO.derecho_intel RUN 
    PASS SMS_Ld3.f09_f09_mg17.1850_SATM_DLND%SCPL_SICE_SOCN_SROF_SGLC_SWAV.derecho_intel RUN 
    PASS SMS_Ly3.f10_f10_ais8gris4_mg37.2000_SATM_SLND_SICE_SGLC_SROF_DGLC%NOEVOLVE_SWAV.derecho_intel RUN 
    PASS SMS_Ly3.f10_f10_ais8_mg37.2000_SATM_SLND_SICE_SGLC_SROF_DGLC%NOEVOLVE_SWAV.derecho_intel RUN
    PASS SMS_Ly3.f19_g17_gris4.2000_SATM_SLND_SICE_SGLC_SROF_DGLC%NOEVOLVE_SWAV.derecho_intel RUN

As far as I can tell, the PEND failures report the same error as the FAIL in this list:

    FAIL SMS_Ld5.f10_f10_mg37.1850_DATM%GSWP3v1_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.f10_f10_mg37.2000_DATM%CRUv7_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.f10_f10_mg37.2000_DATM%NLDAS2_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.f10_f10_mg37.2000_DATM%QIA_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.f10_f10_mg37.2010_DATM%GSWP3v1_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.f10_f10_mg37.HIST_DATM%GSWP3v1_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.f10_f10_mg37.SSP585_DATM%GSWP3v1_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5_P1.1x1_mexicocityMEX.2000_DATM%1PT_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel.datm-1PT SHAREDLIB_BUILD
    PEND SMS_Ld5.T62_g17.2000_DATM%IAF_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.T62_g17.2000_DATM%NYF_SLND_DICE%IAF_DOCN%DOM_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.T62_g17.2000_DATM%NYF_SLND_DICE%SSMI_DOCN%DOM_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.T62_g17.2000_DATM%NYF_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.TL319_t061.2000_DATM%JRA-1p4-2018_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ld5.TL319_t061.2000_DATM%JRA_SLND_SICE_SOCN_SROF_SGLC_SWAV_SESP.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ln5.f19_f19_mg17.2000_DATM%QIA_SLND_SICE_DOCN%DOM_SROF_SGLC_SWAV.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ln5.f19_f19_mg17.2000_DATM%QIA_SLND_SICE_DOCN%SOMAQP_SROF_SGLC_SWAV.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ln5.f19_f19_mg17.HIST_DATM%QIA_SLND_SICE_DOCN%DOM_SROF_SGLC_SWAV.derecho_intel SHAREDLIB_BUILD
    PEND SMS_Ln9_P1.T42_T42.2000_DATM%QIA_SLND_SICE_DOCN%DOM_SROF_SGLC_SWAV.derecho_intel.datm-scam SHAREDLIB_BUILD

slevis-lmwg avatar Jul 05 '24 19:07 slevis-lmwg

My quick look at the above lists suggests that SATM tests PASS and DATM tests fail.

slevis-lmwg avatar Jul 05 '24 19:07 slevis-lmwg

Thank you for running these tests and clarifying the 2nd checkbox item!

Regarding actually running the PLUMBER case, we don't have run_tower() fully functioning at the moment. I was thinking it may be most helpful to move this in and then finalize run_tower() since it will require these changes?

TeaganKing avatar Jul 08 '24 13:07 TeaganKing

@TeaganKing if your suggestion does not affect whether the aux_cdeps test-suite can be fixed in this PR (which I suspect and hope is true), then I would be fine with that. Still, I would like @ekluzek to also weigh in on your suggestion.

slevis-lmwg avatar Jul 09 '24 23:07 slevis-lmwg

Following up from a discussion with @ekluzek , I will do the following:

  • [x] Edit stream_definition_datm.xml file for a potential formatting error (have done locally, in particular we were looking at $CLM_USRDAT vs CLM_USRDAT, == vs = in the dtlimit statement, and updates to the documentation at the top of this file to include NEON & PLUMBER)
  • [x] Make a few edits to buildnml based on needs for implementing plumber that have come up in #2406
  • [x] run aux_cdeps tests and see if error was fixed: ./run_sys_tests -s aux_cdeps --skip-generate --skip-compare

TeaganKing avatar Jul 25 '24 19:07 TeaganKing

The last issue that I'm running into in buildnml adjustments is that the start dates in the filenames for the PLUMBER files in /glade/campaign/cesm/cesmdata/inputdata/atm/datm7/CLM1PT_data/PLUMBER2/*/CLM1PT_data/ don't match the dDATM_YR_ALIGN nor the DATM_YR_START variables. @wwieder was the DATM_YR_ALIGN supposed to match the filename here? I'm a bit confused on the numerous year labels/time zones/actual years. I was thinking that we could add an additional XML variable to describe this year but that (hopefully) may be a bit unnecessary.

Eg, hard-coding 2010 into the filepath in datm/cime_config/stream_definition_datm.xml for Au-Ctr works when running ./run_tower --plumber-sites AU-Ctr (when including the changes from PLUMBER5.2 PR and Plumber 2 Implementation PR)

TeaganKing avatar Aug 02 '24 19:08 TeaganKing

Because PLUMBER2 data are all provided in local time, there a number of complexities with handling YR_START, YR_END and YR_ALIGN related to how datm and LAI stream files handle time. @olyson set things up so that these details are handled differently for spinup (I2000 compsets) vs. transient (HIST compsets) cases.

Currently we're handling this with usermods, which are generated by a python script that seems to be working correctly. After taking this over with @TeaganKing I'd suggest we keep this level of control in CTSM, instead of in CDEPS.

@ekluzek, when we briefly touched on this, you didn't seem supportive of this idea. Can you think of a simple way to migrating this into cdeps, or for simplicity can we leave the customization in usermods?

wwieder avatar Aug 07 '24 17:08 wwieder

I also want to add that keeping this in the usermods would utilize a different method of implementation than neon which could be confusing for users. I'm somewhat inclined to add a new xml variable 'FILENAME_DATM_START' to be used in the datm stream, but that also could be a bit unnecessary and only relevant to this specific case.

TeaganKing avatar Aug 07 '24 17:08 TeaganKing

Hi @jedwards4b and @billsacks , In a discussion with @wwieder and @ekluzek we decided that it would be best to include an xml variable DATM_YR_START_FILENAME in order to find the relevant PLUMBER file to use as input (the start year listed in the filename is not necessarily the actual start year given that the local time zone is used). These changes were implemented in datm/cime_config/config_component.xml, stream_definition_datm.xml, and the script used to generate the usermods for plumber cases (plumber2_usermods.py). We wanted to run this by you both to see if you had any feedback or concerns regarding this method of implementing the PLUMBER capabilities.

The alternative option that we thought of would be to leave the input files as adjustments to the namelists that could be done in the usermods directories; however, this would be a different (and hence potentially confusing to users) implementation compared to the neon site input file implementation.

TeaganKing avatar Aug 12 '24 21:08 TeaganKing

Thanks for explaining this, @TeaganKing . I appreciate your laying out the details as well as the alternative that you considered - that's very helpful context.

I feel like I'm missing something, because I'm not seeing where DATM_YR_START_FILENAME is actually used: I see it set in a lot of places, but I can't find where it is used. Is it supposed to appear instead of the 2010 in this line?:

<file first_year="$DATM_YR_START" last_year="$DATM_YR_END">$DIN_LOC_ROOT/atm/datm7/CLM1PT_data/PLUMBER2/${PLUMBER2SITE}/CLM1PT_data/CTSM_DATM_${PLUMBER2SITE}_2010-${DATM_YR_END}.nc</file>

It makes sense to me to have an xml variable for this purpose. It seems like the ideal situation would be to have consistency between the different possible datm modes... something like always using this DATM_YR_START_FILENAME and having it default to DATM_YR_START if it isn't explicitly set... but I imagine that would take a lot to implement. So for now I'm fine with it being done the way it is if others are (and I appreciate the documentation you have added to this variable).

billsacks avatar Aug 12 '24 22:08 billsacks

Yes, that's precisely where it will be implemented (replacing the 2010 in that example). Okay, thanks for the feedback!

TeaganKing avatar Aug 12 '24 22:08 TeaganKing

Hi @jedwards4b and @billsacks , would one of you be able to review this PR? The aux_cdeps tests are now passing. @ekluzek and I discussed this yesterday as well. Note that this will be used in conjunction with CTSM #2406 and CTSM #2485-- both of which we are planning to bring in to CTSM once this is merged.

TeaganKing avatar Aug 16 '24 16:08 TeaganKing

Thanks @TeaganKing ! I'm taking a look at this now....

billsacks avatar Aug 16 '24 16:08 billsacks

@TeaganKing this is great, thanks for working on this and getting it figured out. I appreciate your sticking it out even though it got difficult.

I ask for a small change, but marking as approved so you don't have to wait for me to reapprove.

I also would've wanted you to add testing around this in CDEPS, but realized the reason for that is that CLM_USRDAT_NAME, NEONSITE, and PLUMBER2SITE are all defined in CTSM. I think we should move them to here in CDEPS so that we can do testing for them in CDEPS and not just in CLM. I'll make an issue around doing that here. This would be a longer term code health/testing improvement type of thing. I suspect it would've been a lot easier to do a lot of the testing here if we could test the CDEPS side for NEON and PLUMBER2 with compsets without CLM.

@TeaganKing in terms of things to do it looks like aux_cdeps passes, have you also run the latest in CTSM to ensure it works? Just want to make sure all the boxes are ticked off..

@ekluzek I will open up an issue for testing these if they are moved into CDEPS.

In regards to running the latest in CTSM, are you just proposing pulling in the most recent CTSM changes, replacing the cdeps directory with this PR, and then running aux_cdeps?

TeaganKing avatar Aug 16 '24 17:08 TeaganKing

Thanks for reviewing this @billsacks ! I'm also not totally following the testing suggestion from @ekluzek -- Erik, do you mind explaining that a bit more?

TeaganKing avatar Aug 16 '24 19:08 TeaganKing

Sorry I just wanted to make sure we had all the boxes checked off here:

https://github.com/ESCOMP/CDEPS/pull/262#issuecomment-2101161162

Namely making sure this CMEPS code works in a branch of CTSM where you are bringing this CMEPS update into CTSM. And run CTSM for a few PLUMBER2 sites to make sure it works fine inside of CTSM.

This is where if we could add PLUMBER2 tests to CDEPS -- you'd just have to run aux_cdeps. But, because we can't we need to make sure they work from within CTSM, for running PLUMBER2 sites in a CTSM checkout. Does that make sense now?

ekluzek avatar Aug 16 '24 19:08 ekluzek

Sorry I just wanted to make sure we had all the boxes checked off here:

#262 (comment)

Namely making sure this CMEPS code works in a branch of CTSM where you are bringing this CMEPS update into CTSM. And run CTSM for a few PLUMBER2 sites to make sure it works fine inside of CTSM.

This is where if we could add PLUMBER2 tests to CDEPS -- you'd just have to run aux_cdeps. But, because we can't we need to make sure they work from within CTSM, for running PLUMBER2 sites in a CTSM checkout. Does that make sense now?

Okay. So I have run this for a few PLUMBER2 sites and it worked fine inside of CTSM-- but that required using the changes from both of the CTSM PLUMBER-related PRs.

TeaganKing avatar Aug 16 '24 20:08 TeaganKing

Perfect, that's what I wanted to make sure had been done. It's normal for CTSM work to also depend on an external like CDEPS. Once, you can show the thing both works with the new CDEPS and other CTSM changes -- and doesn't break other CDEPS cases, we make the CDEPS tag. Then you can point to it in your CTSM PR, which still probably will take work to come into the CTSM side.

So I checked off all the checkboxes now.

@billsacks please press the merge button! Thanks everyone!

ekluzek avatar Aug 16 '24 20:08 ekluzek