eamxx: Non-BFB behavior with `ne4pg2_ne4pg2.F2010-SCREAMv1` cases on pm-cpu with Intel when changing NTASKS
I'm seeing that I get different results when I change the number of MPI tasks for CPU jobs of scream. Only tested on pm-cpu (and muller-cpu). I've been running scaling tests for both e3sm/scream. All e3sm cases are BFB, but it looks like, every different node count used for a scream case results in a different set of hashes. For a given MPI task count, re-running the case looks BFB as expected.
And, just now, I tried PEM.ne30pg2_ne30pg2.F2010-SCREAMv1.pm-cpu_intel which does fail.
/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/PEM.ne30pg2_ne30pg2.F2010-SCREAMv1.pm-cpu_intel.r00
Looks like it passes with DEBUG
PEM_D_P1024_Ld1.ne30pg2_ne30pg2.F2010-SCREAMv1.pm-cpu_intel
Below I show can reproduce with ne4, so I changed the title of the issue.
This might be specific to Intel. We have PEM_Ln90.ne30pg2_ne30pg2.F2010-SCREAMv1.pm-cpu_gnu.scream-spa_remap--scream-output-preset-4 in our nightly, which is GNU on pm-cpu.
Yes this is with Intel and we don't see it with GNU.
Noting we also see diff with ne4.
PEM.ne4pg2_oQU480.F2010-SCREAMv1.pm-cpu_intel fails
Noting I still see a diff with Intel compiler for at least this test:
ERP_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_intel.eamxx-bfbhash--eamxx-output-preset-6
/pscratch/sd/n/ndk/e3sm_scratch/pm-cpu/nexty-feb21/ERP_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_intel.eamxx-bfbhash--eamxx-output-preset-6.e3sm_eamxx_v1_medres
With Nov 6 2025 checkout
GNU all pass
ERP.ne4pg2_oQU480.F2010-SCREAMv1.pm-cpu_gnu
PEM.ne4pg2_oQU480.F2010-SCREAMv1.pm-cpu_gnu
SMS_D_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_gnu.eamxx-L72
PEM_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_gnu.eamxx-L72
ERS_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_gnu.eamxx-L72
ERP_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_gnu.eamxx-L72
AMD all pass
ERP_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_amdclang.eamxx-L72
PEM_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_amdclang.eamxx-L72
Intel pass
SMS.ne4pg2_oQU480.F2010-SCREAMv1.pm-cpu_intel
SMS_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_intel.eamxx-L72
ERS_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_intel.eamxx-L72
ERP_D_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_intel.eamxx-L72
PEM_D_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_intel.eamxx-L72
Intel fail compare:
ERP.ne4pg2_oQU480.F2010-SCREAMv1.pm-cpu_intel
PEM.ne4pg2_oQU480.F2010-SCREAMv1.pm-cpu_intel
ERP_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_intel.eamxx-L72
PEM_Ln22.conusx4v1pg2_r05_oECv3.F2010-SCREAMv1-noAero.pm-cpu_intel.eamxx-L72
Running the ne4 case with intel, and turning on all hashing, we can see where the diffs happen between a case with 64 and 96 tasks (all single thread, without any forcing of openmp builds as we did in other issues):
File 1: f4.F2010-SCREAMv1.ne4pg2_ne4pg2.nexty-nov11.intel.n001.p064x111111.18s.bfb2.L72/run/e3sm.log.45118395.251111-162903.gz
File 2: f4.F2010-SCREAMv1.ne4pg2_ne4pg2.nexty-nov11.intel.n001.p096x111111.18s.bfb2.L72/run/e3sm.log.45118424.251111-162835.gz
================================================================================
REPORT: ABSOLUTE FIRST HASH DIFFERENCE (Starting at First Step)
================================================================================
# DIVERGENCE FOUND at BFB Step 0 (Entry #504)
--- Context (Last 3 identical hashes) ---
0: hxxhash> 5 0 58d0e91b44d916e0 (E BE-pre-ComposeTransport-q-HV-0)
0: hxxhash> 5 1 c9f0a2ef2b3fd16d (E BE-pre-ComposeTransport-q-HV-0)
0: hxxhash> 5 2 b874c373b720216d (E BE-pre-ComposeTransport-q-HV-0)
--- Divergence Found ---
-0: hxxhash> 5 0 4912c99cc239cc17 (T BE-pre-ComposeTransport-q-HV-0) (File 1)
+0: hxxhash> 5 0 4912c99cc239c5bb (T BE-pre-ComposeTransport-q-HV-0) (File 2)
================================================================================
================================================================================
REPORT: FIRST DIVERGENCE AFTER INITIALIZATION (Skipping First Step)
================================================================================
# DIVERGENCE FOUND at BFB Step 1 (Entry #626)
--- Context (Last 3 identical hashes) ---
0: hxxhash> 5 2 b874c373b720216d (E BE-post-ComposeTransport-qdp-DSS-1)
0: hxxhash> 5 0 c6f9d9a1d4ce8efa (T BE-post-ComposeTransport-qdp-DSS-1)
0: hxxhash> 5 1 d75cd8fd8d26a7f8 (T BE-post-ComposeTransport-qdp-DSS-1)
--- Divergence Found ---
-0: bfbhash> 1 4ec4947c9350cbaf (Hommexx) (File 1)
+0: bfbhash> 1 4ec4947c93ab3f1b (Hommexx) (File 2)
================================================================================
So first we note the diffs that happen in first step (step 0), which may or may not be result if initialization.
But then, the very next bfbhash is different between the two cases. Note these two cases both use 72 vertical levels to further reduce complexity.