CAM icon indicating copy to clipboard operation
CAM copied to clipboard

CAM with SE dycore run does not work with threading turned on

Open cacraigucar opened this issue 1 year ago • 8 comments

What happened?

Threading has not been routinely tested in CAM. Running the basic support case of F2000climo with ne30pg3 fails

The log file contains: dec2432.hsn.de.hpc.ucar.edu 53: forrtl: error (76): Abort trap signal dec2432.hsn.de.hpc.ucar.edu 53: Image PC Routine Line Source
dec2432.hsn.de.hpc.ucar.edu 53: libpthread-2.31.s 000015320D65C8C0 Unknown Unknown Unknown dec2432.hsn.de.hpc.ucar.edu 53: libc-2.31.so 0000153208B16CBB gsignal Unknown Unknown dec2432.hsn.de.hpc.ucar.edu 53: libc-2.31.so 0000153208B18355 abort Unknown Unknown dec2432.hsn.de.hpc.ucar.edu 53: libc-2.31.so 0000153208B5CAE7 Unknown Unknown Unknown dec2432.hsn.de.hpc.ucar.edu 53: libc-2.31.so 0000153208B64B6A Unknown Unknown Unknown dec2432.hsn.de.hpc.ucar.edu 53: libc-2.31.so 0000153208B66614 Unknown Unknown Unknown dec2432.hsn.de.hpc.ucar.edu 53: cesm.exe 00000000020FD1C3 fvm_consistent_se 163 fvm_consistent_se_cslam.F90 dec2432.hsn.de.hpc.ucar.edu 53: libiomp5.so 0000153209024493 __kmp_invoke_micr Unknown Unknown dec2432.hsn.de.hpc.ucar.edu 53: libiomp5.so 0000153208F92533 Unknown Unknown Unknown dec2432.hsn.de.hpc.ucar.edu 53: libiomp5.so 0000153208F91470 Unknown Unknown Unknown dec2432.hsn.de.hpc.ucar.edu 53: libiomp5.so 00001532090251FF Unknown Unknown Unknown dec2432.hsn.de.hpc.ucar.edu 53: libpthread-2.31.s 000015320D6506EA Unknown Unknown Unknown dec2432.hsn.de.hpc.ucar.edu 53: libc-2.31.so 0000153208BE3A6F clone Unknown Unknown

What are the steps to reproduce the bug?

./create_test ERS_P64x2.ne30pg3_ne30pg3_mg17.F2000climo.derecho_intel

What CAM tag were you using?

cam6_3_153

What machine were you running CAM on?

CISL machine (e.g. cheyenne)

What compiler were you using?

Intel

Path to a case directory, if applicable

No response

Will you be addressing this bug yourself?

Any CAM SE can do this

Extra info

@nusbaume did point out that the SE dycore does not appear to gain much performance by using threading, so this makes it a slightly lower, though still important priority

cacraigucar avatar Mar 28 '24 18:03 cacraigucar

Threading has not worked in the SE dycore for a long time. As far as I know it has never provided enough benefit over pure mpi configurations to make its use recommended. Comment @PeterHjortLauritzen ?

brian-eaton avatar Mar 28 '24 18:03 brian-eaton

This is consistent with my observation before (https://github.com/ESCOMP/CAM/issues/941). @fvitt and @adamrher commented that threading was not expect to work for SE dycore.

sjsprecious avatar Mar 28 '24 18:03 sjsprecious

@fvitt and @adamrher commented that threading was not expect to work for SE dycore.

I ran the se dycore with threading a few years back and it did run, it was just 2X slower than w/o threading. But it did run.

adamrher avatar Mar 28 '24 18:03 adamrher

I'll add that my own experiences are that threading is less performant than independent ranks at the scales we typically run at, but there could be a use case for accelerators -and comparisons with them, offloading to CPU & GPU alike- and, also for very high core-count runs, where the MPI message rate starts to become a problem.

I also wonder if the WACCMX folks could benefit, given the coarse resolutions but intensive calculations they have. Similarly, it'd be interesting to know if we benefit more now with the much higher number of vertical levels.

So, perhaps not a high priority now, but something to keep on the list of things to look at for the future.

Cheers,

  • Brian

On Thu, Mar 28, 2024 at 12:28 PM Brian Eaton @.***> wrote:

Threading has not worked in the SE dycore for a long time. As far as I know it has never provided enough benefit over pure mpi configurations to make its use recommended. Comment @PeterHjortLauritzen https://github.com/PeterHjortLauritzen ?

— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CAM/issues/1006#issuecomment-2025855909, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACL2HPNKFNWMITV7KJJU4LLY2RONNAVCNFSM6AAAAABFNIMCCKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRVHA2TKOJQHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

briandobbins avatar Mar 28 '24 18:03 briandobbins

I can add that @johnmauff put considerable effort into improving SE dycore performance by using threads in the past. He may have comments as well. My guess is that a GPU-ized version is perhaps a better direction to head in, although it seems that previous efforts in this direction have never made it to the CAM trunk.

brian-eaton avatar Mar 28 '24 18:03 brian-eaton

From my point of view I think the concern is that our workhorse configurations for CAM7 will straight up crash if we try to turn on threading. I am personally less concerned about performance then about it literally running (although I agree that it is not a high priority at the moment.)

nusbaume avatar Mar 28 '24 19:03 nusbaume

Can't the PE layouts be configured to not use threads for SE grids?

brian-eaton avatar Mar 28 '24 21:03 brian-eaton

After discussion during the SE coordination meeting, we agreed that the short-term solution is to put in an appropriate endrun or configure abort

cacraigucar avatar Apr 02 '24 16:04 cacraigucar

The "short-term" solution mentioned above was the subject of issue #1087. This was implemented in cam6_4_017 and issue #1087 was closed. Closing this issue now.

brian-eaton avatar Oct 02 '24 12:10 brian-eaton