CAM with SE dycore run does not work with threading turned on
What happened?
Threading has not been routinely tested in CAM. Running the basic support case of F2000climo with ne30pg3 fails
The log file contains:
dec2432.hsn.de.hpc.ucar.edu 53: forrtl: error (76): Abort trap signal
dec2432.hsn.de.hpc.ucar.edu 53: Image PC Routine Line Source
dec2432.hsn.de.hpc.ucar.edu 53: libpthread-2.31.s 000015320D65C8C0 Unknown Unknown Unknown
dec2432.hsn.de.hpc.ucar.edu 53: libc-2.31.so 0000153208B16CBB gsignal Unknown Unknown
dec2432.hsn.de.hpc.ucar.edu 53: libc-2.31.so 0000153208B18355 abort Unknown Unknown
dec2432.hsn.de.hpc.ucar.edu 53: libc-2.31.so 0000153208B5CAE7 Unknown Unknown Unknown
dec2432.hsn.de.hpc.ucar.edu 53: libc-2.31.so 0000153208B64B6A Unknown Unknown Unknown
dec2432.hsn.de.hpc.ucar.edu 53: libc-2.31.so 0000153208B66614 Unknown Unknown Unknown
dec2432.hsn.de.hpc.ucar.edu 53: cesm.exe 00000000020FD1C3 fvm_consistent_se 163 fvm_consistent_se_cslam.F90
dec2432.hsn.de.hpc.ucar.edu 53: libiomp5.so 0000153209024493 __kmp_invoke_micr Unknown Unknown
dec2432.hsn.de.hpc.ucar.edu 53: libiomp5.so 0000153208F92533 Unknown Unknown Unknown
dec2432.hsn.de.hpc.ucar.edu 53: libiomp5.so 0000153208F91470 Unknown Unknown Unknown
dec2432.hsn.de.hpc.ucar.edu 53: libiomp5.so 00001532090251FF Unknown Unknown Unknown
dec2432.hsn.de.hpc.ucar.edu 53: libpthread-2.31.s 000015320D6506EA Unknown Unknown Unknown
dec2432.hsn.de.hpc.ucar.edu 53: libc-2.31.so 0000153208BE3A6F clone Unknown Unknown
What are the steps to reproduce the bug?
./create_test ERS_P64x2.ne30pg3_ne30pg3_mg17.F2000climo.derecho_intel
What CAM tag were you using?
cam6_3_153
What machine were you running CAM on?
CISL machine (e.g. cheyenne)
What compiler were you using?
Intel
Path to a case directory, if applicable
No response
Will you be addressing this bug yourself?
Any CAM SE can do this
Extra info
@nusbaume did point out that the SE dycore does not appear to gain much performance by using threading, so this makes it a slightly lower, though still important priority
Threading has not worked in the SE dycore for a long time. As far as I know it has never provided enough benefit over pure mpi configurations to make its use recommended. Comment @PeterHjortLauritzen ?
This is consistent with my observation before (https://github.com/ESCOMP/CAM/issues/941). @fvitt and @adamrher commented that threading was not expect to work for SE dycore.
@fvitt and @adamrher commented that threading was not expect to work for SE dycore.
I ran the se dycore with threading a few years back and it did run, it was just 2X slower than w/o threading. But it did run.
I'll add that my own experiences are that threading is less performant than independent ranks at the scales we typically run at, but there could be a use case for accelerators -and comparisons with them, offloading to CPU & GPU alike- and, also for very high core-count runs, where the MPI message rate starts to become a problem.
I also wonder if the WACCMX folks could benefit, given the coarse resolutions but intensive calculations they have. Similarly, it'd be interesting to know if we benefit more now with the much higher number of vertical levels.
So, perhaps not a high priority now, but something to keep on the list of things to look at for the future.
Cheers,
- Brian
On Thu, Mar 28, 2024 at 12:28 PM Brian Eaton @.***> wrote:
Threading has not worked in the SE dycore for a long time. As far as I know it has never provided enough benefit over pure mpi configurations to make its use recommended. Comment @PeterHjortLauritzen https://github.com/PeterHjortLauritzen ?
— Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CAM/issues/1006#issuecomment-2025855909, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACL2HPNKFNWMITV7KJJU4LLY2RONNAVCNFSM6AAAAABFNIMCCKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRVHA2TKOJQHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I can add that @johnmauff put considerable effort into improving SE dycore performance by using threads in the past. He may have comments as well. My guess is that a GPU-ized version is perhaps a better direction to head in, although it seems that previous efforts in this direction have never made it to the CAM trunk.
From my point of view I think the concern is that our workhorse configurations for CAM7 will straight up crash if we try to turn on threading. I am personally less concerned about performance then about it literally running (although I agree that it is not a high priority at the moment.)
Can't the PE layouts be configured to not use threads for SE grids?
After discussion during the SE coordination meeting, we agreed that the short-term solution is to put in an appropriate endrun or configure abort
The "short-term" solution mentioned above was the subject of issue #1087. This was implemented in cam6_4_017 and issue #1087 was closed. Closing this issue now.