Cannot run CUDA-enabled version
**Issue: ** execution of kripke.exe results in illegal memory access
Tagged release 1.2.4 does not exhibit this behavior. I did not perform any sort of bisection to find the culprit, but I suspect it's an issue with RAJA somewhere.
Build environment:
- GCC 8.4
- CUDA Toolkit 11.4
- AMD CPU (Threadripper 3960X)
- NVIDIA A6000 GPU (compute capability of 8.6)
- no warnings when building
host-config file:
set(CMAKE_BUILD_TYPE "Release" CACHE STRING "")
set(CMAKE_CXX_FLAGS "" CACHE STRING "")
set(CMAKE_CXX_FLAGS_RELEASE "-O3 -ffast-math" CACHE STRING "")
set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "-O3 -g -ffast-math" CACHE STRING "")
set(CMAKE_CXX_FLAGS_DEBUG "-O0 -g" CACHE STRING "")
set(ENABLE_CHAI On CACHE BOOL "")
set(ENABLE_CUDA On CACHE BOOL "")
set(CUDA_ARCH "sm_86" CACHE STRING "")
set(ENABLE_OPENMP Off CACHE BOOL "")
set(ENABLE_MPI Off CACHE BOOL "")
set(ENABLE_MPI_WRAPPER Off CACHE BOOL "")
set(CMAKE_CUDA_FLAGS "-restrict -gencode=arch=compute_86,code=sm_86" CACHE STRING "")
set(CMAKE_CUDA_FLAGS_RELEASE "-O3 --expt-extended-lambda" CACHE STRING "")
set(CMAKE_CUDA_FLAGS_RELWITHDEBINFO "-O3 -lineinfo --expt-extended-lambda" CACHE STRING "")
set(CMAKE_CUDA_FLAGS_DEBUG "-O0 -g -G --expt-extended-lambda" CACHE STRING "")
set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_CXX_COMPILER}" CACHE STRING "")
Output:
~/kripke$ ./build/bin/kripke.exe
_ __ _ _
| |/ / (_) | |
| ' / _ __ _ _ __ | | __ ___
| < | '__|| || '_ \ | |/ // _ \
| . \ | | | || |_) || <| __/
|_|\_\|_| |_|| .__/ |_|\_\\___|
| |
|_| Version 1.2.5-dev
LLNL-CODE-775068
Copyright (c) 2014-2019, Lawrence Livermore National Security, LLC
Kripke is released under the BSD 3-Clause License, please see the
LICENSE file for the full license
This work was produced under the auspices of the U.S. Department of
Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344.
Author: Adam J. Kunen <[email protected]>
Compilation Options:
Architecture: CUDA
Compiler: /usr/bin/c++
Compiler Flags: " -Wall -Wextra "
Linker Flags: " "
CHAI Enabled: Yes
CUDA Enabled: Yes
NVCC: /usr/local/cuda/bin/nvcc
NVCC Flags: "-restrict -gencode=arch=compute_86,code=sm_86 -O3 --expt-extended-lambda"
MPI Enabled: No
OpenMP Enabled: No
Caliper Enabled: No
Input Parameters
================
Problem Size:
Zones: 16 x 16 x 16 (4096 total)
Groups: 32
Legendre Order: 4
Quadrature Set: Dummy S2 with 96 points
Physical Properties:
Total X-Sec: sigt=[0.100000, 0.000100, 0.100000]
Scattering X-Sec: sigs=[0.050000, 0.000050, 0.050000]
Solver Options:
Number iterations: 10
MPI Decomposition Options:
Total MPI tasks: 1
Spatial decomp: 1 x 1 x 1 MPI tasks
Block solve method: Sweep
Per-Task Options:
DirSets/Directions: 8 sets, 12 directions/set
GroupSet/Groups: 2 sets, 16 groups/set
Zone Sets: 1 x 1 x 1
Architecture: CUDA
Data Layout: DGZ
Generating Problem
==================
Decomposition Space: Procs: Subdomains (local/global):
--------------------- ---------- --------------------------
(P) Energy: 1 2 / 2
(Q) Direction: 1 8 / 8
(R) Space: 1 1 / 1
(Rx,Ry,Rz) R in XYZ: 1x1x1 1x1x1 / 1x1x1
(PQR) TOTAL: 1 16 / 16
Material Volumes=[8.789062e+03, 1.177734e+05, 2.753438e+06]
Memory breakdown of Field variables:
Field Variable Num Elements Megabytes
-------------- ------------ ---------
data/sigs 15360 0.117
dx 16 0.000
dy 16 0.000
dz 16 0.000
ell 2400 0.018
ell_plus 2400 0.018
i_plane 786432 6.000
j_plane 786432 6.000
k_plane 786432 6.000
mixelem_to_fraction 4352 0.033
phi 3276800 25.000
phi_out 3276800 25.000
psi 12582912 96.000
quadrature/w 96 0.001
quadrature/xcos 96 0.001
quadrature/ycos 96 0.001
quadrature/zcos 96 0.001
rhs 12582912 96.000
sigt_zonal 131072 1.000
volume 4096 0.031
-------- ------------ ---------
TOTAL 34238832 261.222
Generation Complete!
Steady State Solve
==================
CUDAassert: an illegal memory access was encountered /home/williamk/kripke/tpl/raja/include/RAJA/policy/cuda/MemUtils_CUDA.hpp 183
terminate called after throwing an instance of 'std::runtime_error'
what(): CUDAassert
Aborted (core dumped)
Hey @willkill07, sorry for getting back to you late. I'm not seeing this issue on our LLNL machines, after running it with both gcc/8.3.1 and gcc/8.4.0, and cuda/11.4.1. Could you try it with the latest kripke/develop, and with the following two additional cmake lines?
set(CHAI_ENABLE_RAJA_PLUGIN On CACHE BOOL "")
set(ENABLE_RAJA_PLUGIN On CACHE BOOL "")
This enhancement should also help (https://github.com/LLNL/Kripke/pull/38), and you'd only need to specify ENABLE_CHAI=On without any of the X_ENABLE_RAJA_PLUGIN variables.