CAM Performance improvements for CSLAM (from John Dennis, CISL, and Lauritzen)

Closes https://github.com/ESCOMP/CAM/issues/1360

Aug 18 '25 14:08 PeterHjortLauritzen

I am not getting B4B in this test:

ERP_D_Ln9.ne30pg3_ne30pg3_mg17.FHISTC_LTso.derecho_intel.cam-outfrq9s

 ./create_test --output-root /glade/derecho/scratch/pel/ --project P93300042 ERP_D_Ln9.ne30pg3_ne30pg3_mg17.FHISTC_LTso.derecho_intel.cam-outfrq9s

(low top FHISTC with CSLAM)

@johnmauff: Could these changes be round-off (order of operation changes?)

Aug 18 '25 14:08 PeterHjortLauritzen

Peter,

I would guess that they are round-off changes. I have not check with a low top model. I only tested this with a high-top model, looking at the nstep output.

John

On Mon, Aug 18, 2025 at 8:42 AM Peter Hjort Lauritzen < @.***> wrote:

PeterHjortLauritzen left a comment (ESCOMP/CAM#1365) https://github.com/ESCOMP/CAM/pull/1365#issuecomment-3197224514

I am not getting B4B in this test:

ERP_D_Ln9.ne30pg3_ne30pg3_mg17.FHISTC_LTso.derecho_intel.cam-outfrq9s

(low top FHISTC with CSLAM)

@johnmauff https://github.com/johnmauff: Could these changes be round-off (order of operation changes?)

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CAM/pull/1365#issuecomment-3197224514, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADH7NUU3WXMFA2QRNJ45PRL3OHQ4XAVCNFSM6AAAAACEFIYUR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCOJXGIZDINJRGQ . You are receiving this because you were mentioned.Message ID: @.***>

Aug 18 '25 16:08 johnmauff

I am not getting B4B in this test:

ERP_D_Ln9.ne30pg3_ne30pg3_mg17.FHISTC_LTso.derecho_intel.cam-outfrq9s
 ./create_test --output-root /glade/derecho/scratch/pel/ --project P93300042 ERP_D_Ln9.ne30pg3_ne30pg3_mg17.FHISTC_LTso.derecho_intel.cam-outfrq9s
(low top FHISTC with CSLAM)

@johnmauff: Could these changes be round-off (order of operation changes?)

I ran the baroclinic wave test case (FKESSLER) and compared the optimized version against the baseline/trunk. Below is PS at day 10:

For comparison, here is a pertlim test (in this case perturbing PS by 1E-14):

The pertlim test produces errors about 100× smaller. It’s unclear whether it matters that the optimized code introduces round-off errors at every time step, whereas the pertlim test only introduces them at initialization. I'll keep looking/thinking ...

UPDATE: all tests were due to code bug ... all tests (I am running) are now BFB

Aug 20 '25 12:08 PeterHjortLauritzen

@PeterHjortLauritzen I am curious: I thought threading was not supported for the SE dycore (https://github.com/ESCOMP/CAM/issues/941). Thus why the ERP test still works here?

Aug 20 '25 16:08 sjsprecious

@PeterHjortLauritzen I ran the ERP_D_Ln9.ne30pg3_ne30pg3_mg17.FHISTC_LTso.derecho_intel.cam-outfrq9s test on Derecho and found that the second run cut the node number by half but did not change the thread number (still 1). Thus this is actually an ERS test?

Aug 20 '25 17:08 sjsprecious

@PeterHjortLauritzen I ran the ERP_D_Ln9.ne30pg3_ne30pg3_mg17.FHISTC_LTso.derecho_intel.cam-outfrq9s test on Derecho and found that the second run cut the node number by half but did not change the thread number (still 1). Thus this is actually an ERS test?

Don't know; I just took the test from the CAM test list without thinking much about the actual test (correct! Threading is currently broken in the SE dycore). @nusbaume Do you know the answer to @sjsprecious 's question?

Aug 21 '25 12:08 PeterHjortLauritzen

All the tests I am running are BFB now! Thanks @johnmauff and @sjsprecious ... @cacraigucar: This PR is ready to go ...

Aug 21 '25 13:08 PeterHjortLauritzen

Hi @PeterHjortLauritzen @sjsprecious, an ERP test takes the default thread and task count from the first case and divides it by 2 for the second case if the number is greater than one. You can see that logic in the CIME code here:

https://github.com/ESMCI/cime/blob/master/CIME/SystemTests/erp.py#L32

Given that the default configuration for an SE dycore run is one thread but multiple MPI tasks, it ends up adjusting the tasks but not the threads (which is what allows the test to run with this CAM configuration in the first place).

Also, my understanding is that the difference between ERS and ERP is that an ERS test won't change the task layout at all and simply checks that the restart run is bit-for-bit, while the ERP test will halve the processor count for the restarted run before checking if the results are bit-for-bit.

Anyways, I hope that helps, and thanks again for getting these improvements into CAM!

Aug 21 '25 14:08 nusbaume

Hi @PeterHjortLauritzen @sjsprecious, an ERP test takes the default thread and task count from the first case and divides it by 2 for the second case if the number is greater than one. You can see that logic in the CIME code here:

https://github.com/ESMCI/cime/blob/master/CIME/SystemTests/erp.py#L32

Given that the default configuration for an SE dycore run is one thread but multiple MPI tasks, it ends up adjusting the tasks but not the threads (which is what allows the test to run with this CAM configuration in the first place).

Also, my understanding is that the difference between ERS and ERP is that an ERS test won't change the task layout at all and simply checks that the restart run is bit-for-bit, while the ERP test will halve the processor count for the restarted run before checking if the results are bit-for-bit.

Anyways, I hope that helps, and thanks again for getting these improvements into CAM!

Thanks @nusbaume for your clarification. Clearly I misunderstood the ERS and ERP tests before, but now it is clear to me.

Aug 21 '25 15:08 sjsprecious

@pel I just noticed that commit 6067381 appears to deleate all of my optimizations. Is there a reason for this?

Sep 29 '25 13:09 johnmauff

@pel I just noticed that commit 6067381 appears to deleate all of my optimizations. Is there a reason for this?

apologies ... mistake ... reverted!

Sep 29 '25 13:09 PeterHjortLauritzen

Peter,

Thanks for reverting the commit. It looks like Rory's changes are only a couple of lines.

John

On Mon, Sep 29, 2025 at 7:11 AM Peter Hjort Lauritzen < @.***> wrote:

PeterHjortLauritzen left a comment (ESCOMP/CAM#1365) https://github.com/ESCOMP/CAM/pull/1365#issuecomment-3346856391

@pel https://github.com/pel I just noticed that commit 6067381 https://github.com/ESCOMP/CAM/commit/60673810aede57b3fc1139bcff3f4d9553982d17 appears to deleate all of my optimizations. Is there a reason for this?

apologies ... mistake ... reverted!

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CAM/pull/1365#issuecomment-3346856391, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADH7NUWB6PZUQRFKWHLX2LD3VEVYJAVCNFSM6AAAAACEFIYUR2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGNBWHA2TMMZZGE . You are receiving this because you were mentioned.Message ID: @.***>

Sep 29 '25 13:09 johnmauff

I heard at a meeting the optimization work may be completed, with no deliverable for CAM (other than something which Peter will include in a future PR). Is this correct, and should this PR be closed?

Oct 13 '25 19:10 cacraigucar

never mind - @PeterHjortLauritzen is working on this

Oct 14 '25 15:10 cacraigucar

FYI: performance results on this Wiki

https://github.com/PeterHjortLauritzen/CAM/wiki/Performance-notes-(2025)

Oct 23 '25 10:10 PeterHjortLauritzen

Thanks @nusbaume for the review. I have redone the math for finding extrema in CSLAM and found buggy code. I fixed it. Changes are not B4B but it takes over 2 days for differences to show up in FKESSLER. I did a science validation with FKESSLER (looking at traer properties) and things look good!

Nov 24 '25 13:11 PeterHjortLauritzen