CTSM icon indicating copy to clipboard operation
CTSM copied to clipboard

Add hist_all_htapes namelist flag to output all history fields

Open johnpaulalex opened this issue 2 years ago • 1 comments

Also made rah1 history field properly initialize itself (it was not overwriting its NaNs to spvals like every other field does)

Specific notes

Contributors other than yourself, if any: @ekluzek, @samsrabin

CTSM Issues Fixed (include github issue #):

  • Resolves #29

Are answers expected to change (and if so in what way)? No

Any User Interface Changes (namelist or namelist defaults changes)? Added new hist_all_htapes namelist flag, defaults to false (a no-op).

Testing performed, if any:

  • SMS_D_Ld3.f10_f10_mg37.I1850Clm50BgcCrop.derecho_gnu.clm-default--clm-all_outputs gets 1369 variables saved to the h0 file.
  • ERP_Ld9.f45_f45_mg37.I2000Clm50FatesRs.derecho_intel.clm-FatesColdAllVars now has two extra variables.

johnpaulalex avatar Jul 17 '23 00:07 johnpaulalex

Looks like this does affect FATES variables too; after e53049f, the FatesColdAllVars test has two more variables. That's not as many as I expected based on #2833, but that might be due to FATES mode settings.

samsrabin avatar Oct 18 '24 18:10 samsrabin

I ran testing. Izumi was fine. A few unexpected fails pm Derecho to look into, mostly build errors....

ERP_Ld9.f45_f45_mg37.I2000Clm50FatesRs.derecho_intel.clm-FatesColdAllVars	(NLCOMP COMPARE_base_rest BASELINE)		
ERS_D_Ld5.f10_f10_mg37.IHistClm50BgcQian.derecho_intel.clm-ciso_bombspike1963DecStart--clm-matrixcnOn_ignore_warnings	(MODEL_BUILD)		
ERS_D_Mmpi-serial_Ld5.5x5_amazon.I2000Clm50FatesRs.derecho_gnu.clm-FatesCold	(MEMCOMP SHORT_TERM_ARCHIVER)		
LILACSMOKE_D_Ld2.f10_f10_mg37.I2000Ctsm50NwpSpAsRs.derecho_intel.clm-lilac	(NLCOMP MODEL_BUILD)		
NCK_Ld1.f10_f10_mg37.I2000Clm50Sp.derecho_intel.clm-default	(MODEL_BUILD)		
PEM_D_Ld5.ne30_g17.I2000Clm50BgcCru.derecho_intel.clm-default	(MODEL_BUILD)		
PET_P64x2_D.f10_f10_mg37.I1850Clm50BgcCrop.derecho_intel.clm-default	(MODEL_BUILD)		
PET_P64x2_D.f10_f10_mg37.I1850Clm50BgcCrop.derecho_intel.clm-default--clm-matrixcnOn_ignore_warnings	(MODEL_BUILD)		
SMS_D_Ld3.f10_f10_mg37.I2000Clm50BgcCru.derecho_intel.clm-default	(MODEL_BUILD)		
SMS_D_Ld5.f09_g17.ISSP126Clm50BgcCrop.derecho_intel.clm-datm_ssp126_anom_forc	(MODEL_BUILD)		
SMS_Ld2_D_PS.f09_g17.I1850Clm60BgcCropCmip6.derecho_intel.clm-basic_interp	(MODEL_BUILD)		
SMS_Ln9.ne30pg2_ne30pg2_mg17.I2000Clm50BgcCrop.derecho_intel.clm-clm50cam6LndTuningMode	(MODEL_BUILD)		
SSP_D_Ld4.f10_f10_mg37.I1850Clm50BgcCrop.derecho_intel.clm-ciso_rtmColdSSP	(MODEL_BUILD)

ekluzek avatar Dec 12 '24 15:12 ekluzek

Adding next so we can talk about what priority this should be. Possibly @slevis-lmwg will work on this. Since he is doing work on history fields he should at least be aware of this, and potential conflicts with his other work.

ekluzek avatar May 12 '25 17:05 ekluzek

There are a couple of reasons that make this important to bring in:

  • It removes the maintenance burden of updating the FATES all-vars testlist with a long list of output vars as history fields are added or subtracted or changed in name
  • It allows us to have an easy way to test that new history fields work by setting a single flag
  • It allows us to have more confidence in optional history fields that they work and continue to work

ekluzek avatar May 12 '25 17:05 ekluzek

[...] Possibly @slevis-lmwg will work on this. Since he is doing work on history fields he should at least be aware of this, and potential conflicts with his other work.

For reference, here are the large history PRs/issues that Erik alluded to above: #2445 https://github.com/ESCOMP/MOSART/pull/117 https://github.com/ESCOMP/RTM/issues/32

slevis-lmwg avatar Jun 24 '25 22:06 slevis-lmwg

@johnpaulalex I tried to download your branch a couple of different ways and failed: git clone -b hist_all https://github.com/johnpaulalex/ctsm.git hist_all and

git remote add johnpaulalex https://github.com/johnpaulalex/ctsm.git
git fetch johnpaulalex

Both return fatal: repository 'https://github.com/johnpaulalex/ctsm.git/' not found

Am I possibly doing something wrong or do I not have appropriate permissions?

slevis-lmwg avatar Dec 05 '25 23:12 slevis-lmwg

@johnpaulalex I tried to download your branch a couple of different ways and failed: git clone -b hist_all https://github.com/johnpaulalex/ctsm.git hist_all and

git remote add johnpaulalex https://github.com/johnpaulalex/ctsm.git
git fetch johnpaulalex

Both return fatal: repository 'https://github.com/johnpaulalex/ctsm.git/' not found

Am I possibly doing something wrong or do I not have appropriate permissions?

@slevis-lmwg @johnpaulalex added a "_jp" on the end of his CTSM fork. So you need to use this as the remote

github.com:johnpaulalex/CTSM_jp.git

Doing that should get it working for you.

ekluzek avatar Dec 06 '25 00:12 ekluzek

Thank you @ekluzek, that worked. Next steps:

  • [x] FAIL ./run_sys_tests -s clm_short -c ctsm5.3.013 --skip-generate # aux_clm will not work in this version due to ctsm_pylib updates since then Failures to investigate
ERP_D_P64x2_Ld3.f10_f10_mg37.I1850Clm50BgcCrop.derecho_gnu.clm-default BASELINE ctsm5.3.013: DIFF
ERS_D_Ld3.f10_f10_mg37.I1850Clm50BgcCrop.derecho_gnu.clm-default BASELINE ctsm5.3.013: DIFF

DIFFs seem small, possibly starting from roundoff, but the PR is intended for the b4b-dev branch. Likely there were answer-changing gnu differences when ctsm5.3.013 was generated. I resubmitted the two tests from vanilla ctsm5.3.013 and confirmed that I still get DIFFs.

  • [x] git merge latest b4b-dev, resolve conflicts
  • [x] ./run_sys_tests -s aux_clm -c ctsm5.4.001 --skip-generate on derecho Two unexpected failures that I should likely add to the expected: FAIL ERP_Ld9.f45_f45_mg37.I2000Clm50FatesRs.derecho_intel.clm-FatesColdAllVars COMPARE_base_rest FATES_C13DISC_SZPF field appears different in h0a and h0i output suggesting a diagnostic field not tested for perfect restarts before. FAIL ERP_P36x2_D_Ld3.f10_f10_mg37.I1850Clm50BgcCrop.derecho_gnu.clm-default--clm-all_outputs COMPARE_base_rest 33 fields appear different in h0a and h0i output suggesting diagnostic fields not tested for perfect restarts before. Though LEAFCN_STORAGE is not diagnostic, but maybe something's missing in its initialization. I should look at these with ncdiff, though more likely later as part of a new issue and outside the scope of the present PR. Some of this seems related to #3525.
  • [x] ./run_sys_tests -s aux_clm -c ctsm5.4.001 --skip-generate --skip-git-status on izumi

slevis-lmwg avatar Dec 06 '25 00:12 slevis-lmwg

Let me know if there's any buttons I need to push. Is this still not merged in? And I didn't realize my git repo had to stick around?

On Fri, Dec 5, 2025, 19:30 Samuel Levis @.***> wrote:

slevis-lmwg left a comment (ESCOMP/CTSM#2061) https://github.com/ESCOMP/CTSM/pull/2061#issuecomment-3619052227

Thank you @ekluzek https://github.com/ekluzek

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/pull/2061#issuecomment-3619052227, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUZJY6UMFCKKR337735QKOL4AIPRBAVCNFSM6AAAAABQGKUXB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMJZGA2TEMRSG4 . You are receiving this because you were mentioned.Message ID: @.***>

johnpaulalex avatar Dec 07 '25 14:12 johnpaulalex

Hey @johnpaulalex thanks for chiming in and good to hear from you. Yeah, we had problems in our testing for this one, and hence couldn't easily bring it in. And then we had an immediate priority to do a different change in histFileMod.F90 that might've conflicted with this, so we did that first. This is still a useful feature to have that will help us, so we want to bring it in. We were able to get your other PR's in quite a while ago.

I do think github needs your repository to stick around in cases of open PR's like this. Once all PR's are in -- you could delete your fork.

ekluzek avatar Dec 07 '25 19:12 ekluzek

I was afraid that it would be hard to merge this to the latest, so I just tried a test merge and saw there's only 4 conflicts. And actually for that matter you can see them in the github section on "Resolve conflicts". They actually look reasonably straight forward to resolve and I think @slevis-lmwg won't have trouble doing it.

The testing problems are documented above here

https://github.com/ESCOMP/CTSM/pull/2061#issuecomment-2539304811

It looks like they are mostly build problems so hopefully, some simple syntax problem that will be easy to fix. It's somewhat surprising that Izumi_nag didn't catch it. Anyway, this gives me renewed hope that we'll be able to merge to the latest and fix the problems so that we can bring this in.

ekluzek avatar Dec 07 '25 19:12 ekluzek

Let me know if there's any buttons I need to push. Is this still not merged in? And I didn't realize my git repo had to stick around?

@johnpaulalex turns out I do need something from you: Could you give me (@slevis-lmwg) permission to push to your PR? You do this by

  • clicking Settings near the top of this page, then
  • clicking Collaborators and Teams near the top left, and following instructions.

slevis-lmwg avatar Dec 09 '25 22:12 slevis-lmwg

ok, I think I did it. I added you as a collaborator. There don't seem to be permissions to give each collaborator, but lmk if that doesn't do it.

John

On Tue, Dec 9, 2025 at 5:07 PM Samuel Levis @.***> wrote:

slevis-lmwg left a comment (ESCOMP/CTSM#2061) https://github.com/ESCOMP/CTSM/pull/2061#issuecomment-3634484759

Let me know if there's any buttons I need to push. Is this still not merged in? And I didn't realize my git repo had to stick around?

@johnpaulalex https://github.com/johnpaulalex turns out I do need something from you: Could you give me @.*** https://github.com/slevis-lmwg) permission to push to your PR? You do this by

  • clicking Settings near the top of this page, then
  • clicking Collaborators and Teams near the top left, and following instructions.

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/pull/2061#issuecomment-3634484759, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUZJY6WMKUKEVXIM6LRA6DL4A5BZFAVCNFSM6AAAAABQGKUXB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMZUGQ4DINZVHE . You are receiving this because you were mentioned.Message ID: @.***>

johnpaulalex avatar Dec 10 '25 03:12 johnpaulalex

  1. @johnpaulalex I am able to push now, so thank you :-)
  2. @ekluzek aux_clm on derecho gave 2 unexpected failures, as reported in the unchecked box above. Both are restart DIFFs in the tests that activate hist_all_htapes = .true. which makes me think that these are preexisting problems that we never checked for before. If so, then they warrant new issues and are not within this PR's scope.
  3. @johnpaulalex @ekluzek I think we should rename hist_all_htapes to hist_all_fields based on my understanding of its purpose.

slevis-lmwg avatar Dec 10 '25 19:12 slevis-lmwg

  1. @ekluzek aux_clm on derecho gave 2 unexpected failures, as reported in the unchecked box above. Both are restart DIFFs in the tests that activate hist_all_htapes = .true. which makes me think that these are preexisting problems that we never checked for before. If so, then they warrant new issues and are not within this PR's scope.

So the question then is if we make issues for those and bring this in as is with those as expected fails? I think that seems like a reasonable path forward.

Adding next so we can discuss this.

  1. @johnpaulalex @ekluzek I think we should rename hist_all_htapes to hist_all_fields based on my understanding of its purpose.

I agree that's a better name.

ekluzek avatar Dec 11 '25 08:12 ekluzek

  1. @ekluzek aux_clm on derecho gave 2 unexpected failures, as reported in the unchecked box above. Both are restart DIFFs in the tests that activate hist_all_htapes = .true. which makes me think that these are preexisting problems that we never checked for before. If so, then they warrant new issues and are not within this PR's scope.

So the question then is if we make issues for those and bring this in as is with those as expected fails? I think that seems like a reasonable path forward.

Adding next so we can discuss this.

@ekluzek and I discussed it this afternoon. We agreed that I could mark the two tests as EXPECTED FAILs. I will open two separate issues because the solutions may differ for each.

slevis-lmwg avatar Dec 11 '25 23:12 slevis-lmwg

  • [x] Add two tests to expected fails
  • [x] Wait for second one to finish: tests_1211-173421de

slevis-lmwg avatar Dec 12 '25 00:12 slevis-lmwg