CTSM icon indicating copy to clipboard operation
CTSM copied to clipboard

Going to a CISM active test for a processor change test (PEM), causes answers to change...

Open ekluzek opened this issue 1 year ago • 4 comments

Brief summary of bug

With ctsm5.2.0 we discovered we didn't have enough testing that corresponded to CESM or CAM testing. CESM testing is always done with CISM active, so I changed some tests in #2501 from I1850Clm60BgcCrop to I1850Clm60BgcCropG. However,

General bug information

CTSM version you are using: ctsm5.2.004-31-ga09d22376

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: With CISM active

Details of bug

PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCrop.derecho_intel.clm-clm60cam6LndTuningMode passes, however PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode fails in the comparison of different processors...

FAIL PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode COMPARE_base_modpes

Important details of your setup / configuration so we can reproduce the bug

In the test list there are PEM and ERP tests for glc* testmods that have a comment that says this

cism is not answer preserving across processor changes, but short test length should be ok

Those tests range from 5 days to 10 days. But, many are f10, and the highest resolution is f19 which runs 5 days.

ekluzek avatar May 13 '24 21:05 ekluzek

Still fails for 3 days, which is about the shortest I think we should try...

ekluzek avatar May 13 '24 22:05 ekluzek

I talked to @Katetc about this after the CSEG meeting. She also said that the issue is a traditional global-sum issue in MPI which is solved in other places and as such should be relatively easy to fix.

In confirming the timeline on this she sent me an email, which says that they will work on this relatively soon.

ekluzek avatar May 15 '24 15:05 ekluzek

On ctsm5.2.005, I'm getting a failure in the same step for

PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode

Should this be marked as an expected fail? I see that a slightly different test (3 days instead of 9) named

PEM_D_Ld3.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode

is present in the expected fail list (and points to this issue), but that's not actually in the test list.

samsrabin avatar May 28 '24 18:05 samsrabin

On ctsm5.2.005, I'm getting a failure in the same step for

PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode

Should this be marked as an expected fail? I see that a slightly different test (3 days instead of 9) named

PEM_D_Ld3.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode

is present in the expected fail list (and points to this issue), but that's not actually in the test list.

Yes we should correct the expected fail to the test list. I think @slevis-lmwg did this in 006 though.

ekluzek avatar May 30 '24 15:05 ekluzek

I ran into this again in working on ctsm5.2.009 because of a change in the test mod used.

But, I verified that in ctsm5.2.008 the following test fails:

PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel

ekluzek avatar Jul 09 '24 21:07 ekluzek

See this comment: https://github.com/ESCOMP/CTSM/pull/2632#issuecomment-2217988993

ekluzek avatar Jul 09 '24 22:07 ekluzek

@Katetc just pinging you on this. This would be helpful to have in place for the cesm3.0.0 release in CISM. Does that look like it might be able to happen?

ekluzek avatar Feb 05 '25 07:02 ekluzek

Hi Erik,

Thanks for the reminder! This is still on my list. I'm not clear when the cesm3.0.0 release is, so I'm not sure if we are likely to make it. I would like to have it done by this summer. Thanks!

Katetc avatar Feb 05 '25 15:02 Katetc