Trey White
Trey White
Looks like we need to submit an AMD compiler bug. Can you give me permission to access https://github.com/xyuan/e3sm_p3_shoc/tree/e3sm_p3_shoc_hip? Github is telling me it doesn't exist. Or could you copy your...
I was able to reproduce the error with just the compile command. ``` ++ CC -DKOKKOS_DEPENDENCE -DMPICH_SKIP_MPICXX -DSCREAM_CONFIG_IS_CMAKE -DSPDLOG_COMPILED_LIB -D__HIP_ROCclr -I/gpfs/alpine/cli115/world-shared/e3sm_p3_crusher/components/eam/src/physics/crm/scream/src/physics/shoc/../share -I/gpfs/alpine/cli115/world-shared/e3sm_p3_crusher/components/eam/src/physics/crm/scream/src -I/gpfs/alpine/cli115/proj-shared/trey/4984/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.amdclanggpu.4x1/bld/cmake-bld/scream/src -I/gpfs/alpine/cli115/world-shared/e3sm_p3_crusher/externals/ekat/src -I/gpfs/alpine/cli115/proj-shared/trey/4984/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.amdclanggpu.4x1/bld/cmake-bld/externals/ekat/src -I/gpfs/alpine/cli115/proj-shared/trey/4984/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.amdclanggpu.4x1/bld/cmake-bld/externals/ekat/src/ekat/ekat_f90_modules -I/gpfs/alpine/cli115/proj-shared/trey/4984/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.amdclanggpu.4x1/bld/cmake-bld/externals/kokkos -I/gpfs/alpine/cli115/proj-shared/trey/4984/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.amdclanggpu.4x1/bld/cmake-bld/externals/kokkos/core/src -I/gpfs/alpine/cli115/world-shared/e3sm_p3_crusher/externals/ekat/extern/kokkos/core/src -I/gpfs/alpine/cli115/proj-shared/trey/4984/ACME_SIMULATIONS/F-MMFXX-P3.ne4pg2_ne4pg2.crusher.amdclanggpu.4x1/bld/cmake-bld/externals/kokkos/containers/src...
I was able to work around the error with either of the following. - Compile with `hipcc` instead. - Add the compiler options, `-mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false`. Either of these...
FYI, I also updated an internal HPE ticket to make the above options the default for `PrgEnv-amd` and `PrgEnv-cray-amd`, as they already are for `hipcc` and `PrgEnv-cray`.
The raw test is very sensitive to compiler optimizations. Starting with the original codes `basic2-flexibleType.cu` and `basic2-flexibleType.hip`, I see similar results to @oksanaguba's on Summit and Spock. Summit: ``` Timer...
I had some success improving the performance of the Kokkos examples. They both have significant re-use of values that we can take advantage of using `Kokkos::TeamPolicy` and shared memory (AKA...
One thing that I'm seeing is that the best tuning strategies can be very sensitive to detailed specifics of the example. I wonder if "real" E3SM code has very different...
Calls to `assert` in Hip kernels can be very expensive, even if they are always true. They turn on `printf` support in the kernels, which uses up lots of registers...
@oksanaguba, I'm not in the project CLI115, so I can't read `$myfol`. Could you copy the directory into `/gpfs/alpine/cli115/world-shared/onguba`? For the runs on your plot, does `ne10` mean `NP=10`? And...
@oksanaguba, I was able to reproduce the 31s run on Spock. Thanks! I want to test some code optimizations. In preparation for testing, I tried inserting a bug (commented out...