CTSM icon indicating copy to clipboard operation
CTSM copied to clipboard

Plumber2 Implementation

Open TeaganKing opened this issue 11 months ago • 19 comments

This PR will use the TowerSite class (parent to both NEON and PLUBMER sites) and create a Plumber2Site class as well as implement capabilities for running single point simulations at PLUMBER sites.

Contributors other than yourself, if any: @ekluzek @adrifoster

CTSM Issues Fixed (include github issue #): Addresses part of https://github.com/ESCOMP/CTSM/issues/1487

Are answers expected to change (and if so in what way)? No, this PR should be BFB. However, it should expand existing capabilities to allow users to run at PLUMBER sites.

Any User Interface Changes (namelist or namelist defaults changes)? Additional flags will be implemented for run_neon

TeaganKing avatar Mar 07 '24 17:03 TeaganKing

This PR will also address #2186

TeaganKing avatar Mar 08 '24 20:03 TeaganKing

We'll also need to add the new CDEPS tag to this PR.

TeaganKing avatar Mar 08 '24 21:03 TeaganKing

Per conversation with Erik, we'll move this to bfb once the other Plumber-related PRs are in bfb next week.

TeaganKing avatar Mar 08 '24 22:03 TeaganKing

To Do for this PR:

  • [x] Update branch / resolve merge conflicts
  • [x] Add new CDEPS tag to this PR once the CDEPS PR is merged.
  • [x] Bring in CTSM PLUMBER5.2 PR
  • [x] Check for any sort of equivalent of user_version for plumber
  • [x] Include any plumber-specific user namelist lines
  • [x] Update check_plumber_data
  • [x] Determine valid sites based on tower_type?
  • [x] Define available_plumber_list
  • [x] Be able to run NEON & PLUMBER simultaneously (if both are same run type)
  • [x] Provide option for None for both NEON and PLUMBER cases so they can be run indepdently
  • [x] Ensure that we don't use a NEON base case for PLUMBER or vice versa
  • [x] Update run_type defaults in arg parse
  • [x] Resolve unknown case.run error for PLUMBER given need for setup/build
  • [x] Test out various tower options/arguments
  • [x] Update tests
  • [x] Update documentation

TeaganKing avatar Jun 25 '24 18:06 TeaganKing

We don't currently have restart files for PLUMBER2, which has brought up the question: to what extent do we want to provide support for PLUMBER tower sites? Two options that @wwieder and I discussed this morning are described below:

  1. Users spin up their own AD & postAD cases and then can run a transient case (no restart files needed). This is somewhat contrary to the purpose of run_neon because it requires users to run case setup, build, submit, and archive.
  2. My vote would be that we provide (and update upon changes to model versions) restart files and allow users to run a transient case out of the box (this is how NEON tower sites work, and consistency with run_tower would be nice as well as ease of use, but there is more overhead to maintain the feature). This would be a slightly easier implementation given that run_tower currently expects a transient case (and automatically submits case.run, which is not automatically created for the ad run but is automatically created for transient cases).

@danicalombardozzi @slevis-lmwg @olyson let me know if you have thoughts on this. We were planning to chat on Tuesday at 9am but it looks like there is a NCAR-NEON-Community partnership meeting that will be taking that time slot and I'd ideally like to make a decision before August 20th; also happy to tag up an additional time if that's easier than a GitHub discussion.

TeaganKing avatar Aug 06 '24 20:08 TeaganKing

My guess is that most users will want to change the model and will likely have to do their own spinups anyway. So I might lean toward not providing initial files if it requires significant work to generate, keep track of, and update those files.

olyson avatar Aug 06 '24 23:08 olyson

I would agree with this. These runs are cheap and the point is to have a set of tower sites that can be run quickly and easily with different model configurations and with new model versions. The ICs quickly become irrelevant as the model changes.

On Tue, Aug 6, 2024 at 7:02 PM Keith Oleson @.***> wrote:

My guess is that most users will want to change the model and will likely have to do their own spinups anyway. So I might lean toward not providing initial files if it requires significant work to generate, keep track of, and update those files.

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/pull/2406#issuecomment-2272302038, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFABYVB663VOB5R7GAWPZJDZQFIYNAVCNFSM6AAAAABELMMUA6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZSGMYDEMBTHA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dlawrenncar avatar Aug 07 '24 15:08 dlawrenncar

Ok, thank you both for this feedback!

TeaganKing avatar Aug 07 '24 15:08 TeaganKing

Thanks for weighing in here. Teagan since Dave and Keith came to the same conclusion we did yesterday lets move forward with the plan to support 'spinup' and 'transient' PLUMBER2 cases, without providing initial conditions.

It might be nice to differentiate how we do this for SP (fast spinup, ~20 or 40 years) vs. BGC (more involved, like NEON) cases. Would that be difficult to integrate into the run_towers workflow?

wwieder avatar Aug 07 '24 15:08 wwieder

That sounds good. I have successfully run the AD & postAD case (with a bit of hard-coding), so this will require the following updates, which should be feasible:

  • [x] finidat file name updates for each site in user_nl_clm (expecting <SITE>.ad.clm2.r.YYYY-MM-DD-00000.nc files but we have for instance *50400.nc files)
  • [x] Update run_type defaults in run_case in both tower_site.py and plumber_site.py such that the AD case is the default (instead of the transient default that's set in neon_site.py).

@wwieder To clarify, regarding the option for SP vs BGC, are you envisioning a flag that allows users to pick between these options?

TeaganKing avatar Aug 07 '24 17:08 TeaganKing

A few comments:

  • I don't know that we need to "automatically" run postAD and transient runs, as I think it's a good idea for users to have to check the status of their AD and postAD cases to check that they meet steady state criteria (e.g., Alaska NEON sites take significantly longer to spin up than CONUS NEON sites).
  • What I like about the current run_neon implementation is that running a postAD cases automatically creates a branch that looks for an similarly names AD case and then using the --run from postAD flag also creates a transient case uses the off the postAD ref case and initial conditions. Both of these options just take the last restart files that are available (regardless of simulation date).
  • Regarding SP vs. BGC cases, yes a flag that allows users to chose between these two options would be excellent. I'm not sure about the best way to implement this and don't want to introduce scope creep / complexity to this work but having the option to set up SP vs. BGC cases would be very useful for PLUMBER2 (and even NEON) cases.

Happy to have a quick that if it's helpful.

wwieder avatar Aug 07 '24 17:08 wwieder

Ok, in that case, I'll just plan to change the default for PLUMBER to AD and add some documentation on this anticipated workflow for users. Thanks for clarifying!

Why don't I plan to get that first part implemented and then we can talk more about the BGC vs. SP options.

TeaganKing avatar Aug 07 '24 17:08 TeaganKing

It seems like perhaps the SP vs BGC option should come later. It will be useful for both PLUMBER and NEON and therefore might require more in-depth work to do it well.

We should also create a separate tutorial for people to check the stability of their site spin up. We apply a standard number of years for the AD and post-AD NEON site simulations and it would be useful to help users evaluate the stability after making changes. I don't recall that we evaluated all sites for stability thoroughly, either, so perhaps it would also help us to know if the number of years we use is effective everywhere.

On Wed, Aug 7, 2024 at 11:41 AM Teagan King @.***> wrote:

Ok, in that case, I'll just plan to change the default for PLUMBER to AD and add some documentation on this anticipated workflow for users. Thanks for clarifying!

Why don't I plan to get that first part implemented and then we can talk more about the BGC vs. SP options.

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/pull/2406#issuecomment-2273994633, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGHW2QOYGE75H6BHKVQVDGLZQJL4RAVCNFSM6AAAAABELMMUA6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZTHE4TINRTGM . You are receiving this because you were mentioned.Message ID: @.***>

-- Dr. Danica Lombardozzi she/her/hers Terrestrial Sciences Section Climate and Global Dynamics NSF National Center for Atmospheric Research Boulder, CO 80305

email: @.*** office: (303) 497-1777

danicalombardozzi avatar Aug 07 '24 17:08 danicalombardozzi

This is about ready to go, but once #2485 is in, I'll run the unit and system tests once more and perform a few final tests to make sure the various run_tower flags work as expected.

TeaganKing avatar Aug 20 '24 16:08 TeaganKing

I just wanted to note here a few things that I still need to fix:

  • [x] Tests for test_sys_run_tower.py are currently failing. Fix and then run all ctsm py tests.
  • [x] There seem to be some mixups when running with additional flags, eg plumber sites run with 'experiment' end up being run in a transient case instead of an ad case
  • [x] Need to test various argument options (note some neon options irrelevant for plumber)

TeaganKing avatar Aug 23 '24 00:08 TeaganKing

One limitation of this implementation is that if both neon and plumber cases are run simultaneously (not recommended), that the default run type becomes transient. While this could be changed by adding the following lines at line 197 to tower_arg_parse.py, implementing this as a valid parser option, and then setting run_type dependent on the type of tower in run_tower.py, we run into difficulties when the compset is changed, a step which seems reasonable to not repeat for every site if most times people use this will be for either all-neon or all-plumber sites.

if args.plumber_sites:
    run_type = "default_neon_and_plumber"

Overall, I think it's most reasonable to expect users to run plumber and neon cases separately (although both with the run_tower command), but I'm open to others thoughts.

TeaganKing avatar Aug 23 '24 18:08 TeaganKing

Tests on derecho currently pass for make all and ./run_ctsm_py_tests --sys.

The test_sys_run_tower.py tests take about 30 minutes to run now that it involves setting up three cases; I can remove the ad neon case if this seems too excessive.

TeaganKing avatar Aug 23 '24 19:08 TeaganKing

Hi @ekluzek ,

I have now addressed the comments from @wwieder 's review. Two outstanding issues that Will and I discussed are the following (more details in comments above):

  • [ ] The Plumber2Site class is called in a somewhat clunky way with dummy years that get overwritten. I'm not sure of a better way to do this, and it works, but let me know if you have alternative suggestions?
  • [ ] Running system tests takes a while. There are of course benefits to doing more testing (as I have done here), but the time the tests take may be on the excessive side. Your feedback on whether these tests are worth it would be helpful!

And a software-focused review of the code would be helpful now that Will reviewed some of the functionality pieces.

Before merging (and after @ekluzek 's review), I will be sure to run the following tests one last time (I ran them before Will's review but there are changes that we should test again before merging):

  • [ ] run ./run_tower --plumber-sites with various options as a final check
  • [ ] make all
  • [ ] run ctsm system tests with ./python/run_ctsm_py_tests -s

Lastly, note that there are some related outstanding issues that have been documented that are outside the scope of this PR.

TeaganKing avatar Aug 26 '24 19:08 TeaganKing

It seems like this PR also fixes #2739

wwieder avatar Sep 03 '24 17:09 wwieder