Andrew M. Bradley

Results 104 comments of Andrew M. Bradley

So far Crusher does not reproduce any of these. A summary of results follows. ``` Repo is master at e57ed3848d. PEM_P128x1_Ld1.ne30pg2_ne30pg2.F2010-SCREAMv1.crusher-scream-gpu_crayclang-scream.scream-internal_diagnostics_level (Overall: PASS) details: PEM_P16x1_Ld1.ne30pg2_ne30pg2.F2010-SCREAMv1.crusher-scream-gpu_crayclang-scream.scream-internal_diagnostics_level (Overall: PASS) details: PEM_P32x1_Ld1.ne30pg2_ne30pg2.F2010-SCREAMv1.crusher-scream-gpu_crayclang-scream.scream-internal_diagnostics_level (Overall:...

Currently running the following on Frontier with machines/frontier at 2a918e18ef and will update this comment with results. The goal is to reproduce Noel's run, hopefully with ne4pg2 and the scream-internal_diagnostics_level...

@ndkeen this is very unlikely, but is it possible that the DIFF you saw resulted from the following sequence? 1. Run PEM_D.ne4pg2... 2. The run gets cancelled due to the...

On pm-cpu/intel, have you run with the scream-internal_diagnostics_level testmod to isolate the diff?

Re: the PEM_D fails, I'm seeing this in your test directory: ``` [[email protected] PEM_D.ne4pg2_ne4pg2.F2010-SCREAMv1.frontier-scream-gpu_crayclang-scream.20230623_205834_u7fj79]$ for i in `find . -name env_run.xml`; do echo $i; grep "\"STOP_N" $i; done ./case2/PEM_D.ne4pg2_ne4pg2.F2010-SCREAMv1.frontier-scream-gpu_crayclang-scream.20230623_205834_u7fj79/env_run.xml ./env_run.xml...

@ndkeen this looks promising: https://github.com/E3SM-Project/scream/commit/68a174902527d3fa831a5d4ccc55d27f6c763cee If you'd like to put it through its paces on Frontier to confirm what I'm seeing and you find it works, I'll merge the commit...

Noel and I think the CICE optimization reduction is promising for Frontier: all of our tests have passed. I've merged the commit into machines/frontier.

> The DEBUG test on frontier with internal diag fails compare here: > /lustre/orion/cli115/proj-shared/noel/e3sm_scratch/maf-jun26/PEM_D_P8x1.ne4pg2_ne4pg2.F2010-SCREAMv1.frontier-scream-gpu_crayclang-scream.scream-internal_diagnostics_level.r00 0. There are only two e3sm.log files in this test directory, so subsequent points refer to...

For the pm-cpu run, the hash lines show the diff is in the Hommexx version of SL transport. The hyperviscosity operator sees the diff first because of the boundary exchange...

@ndkeen, one thing you might check is `-fp-model` for the C++ code. In intel_pm-cpu.cmake, I see ``` string(APPEND CXXFLAGS " -fp-model=precise") # and manually add precise ... string(APPEND CXXFLAGS "...