Segfault with test files run against AMD GPUs using 2025.1
Attempting to update our internal copy of CP2K to 'v2025.1', I am finding that our AMD GPU build segfaults when run against with a few of the test files, specifically GFX90A and GFX942 models. The same tests run with the 'v2024.3' version works, so this appears to be a regression. In this specific instance, I am using ROCm 6.1.0.
The test files which are triggering this specific segfault are:
- tests/SE/regtest-2-1/md.inp
- tests/QS/regtest-ot-1/H2O-print.inp
- tests/QS/regtest-gpw-3/NO2-mulliken.inp
- tests/QS/regtest-gpw-4/H2O-debug-2.inp
Here are the contents of the 'arch' file used for AMD GPU builds:
# This expects the 'rocm' module to be loaded for the '${ROCM_PATH}'
# variable to be set as expected.
OFFLOAD_CC = hipcc
CC = cc
CXX = CC
FC = ftn
LD = ftn
AR = ar -r
GPUVER = Mi250
HIPARCH = <GPU_MODEL>
DFLAGS = -D__OFFLOAD_HIP -D__GRID_HIP -D__HIP_PLATFORM_AMD__ \
-D__FFTW3 -D__parallel -D__SCALAPACK -D__ACC
#
WFLAGS =
#
FCDEBFLAGS =
CFLAGS = -fno-omit-frame-pointer -fopenmp -g -mtune=native \
$(PROFOPT) -std=c11 -Wall -Wextra -Werror \
-Wno-vla-parameter -Wno-deprecated-declarations \
-Wno-implicit-function-declaration \
$(DFLAGS) -I${ROCM_PATH}/include
CXXFLAGS = --std=c++14 $(DFLAGS) -Wno-deprecated-declarations \
-fopenmp -Wall -Wextra -Werror -I${ROCM_PATH}/include
FCFLAGS = -fno-omit-frame-pointer -fopenmp -funroll-loops \
-ftree-vectorize -ffree-form -ffree-line-length-512 \
-g -mtune=native $(PROFOPT) $(FCDEBFLAGS) $(WFLAGS) \
$(DFLAGS) ${allow_arg_mismatch_flag}
LDFLAGS = $(FCFLAGS) -Wl,--enable-new-dtags -L'${ROCM_PATH}/lib' \
-Wl,-rpath,'${ROCM_PATH}/lib'
OFFLOAD_FLAGS = -g --offload-arch=${HIPARCH} -munsafe-fp-atomics \
-fopenmp -m64 -pthread \
-fPIC -O3 --std=c++11 -Wall -Wextra -Werror \
$(DFLAGS) -I${ROCM_PATH}/include
OFFLOAD_TARGET = hip
LIBS = -lfftw3 -lfftw3_threads -lhipfft
LIBS += -lstdc++ -lamdhip64
To build the code, I run the following using a Cray Programming Environment:
- For the purposes of these instructions, I'll assume the arch file is called 'amd_gfx.psmp', has the contents shared above, and is in the 'arch/' directory.
- Run:
module load cray-fftw - Also run either
module load craype-accel-amd-gfx90aormodule load craype-accel-amd-gfx942depending on the GPU model. - Make sure to have
PrgEnv-gnuloaded. - Either run
module load rocmor make sure 'ROCM_PATH' is set appropriately. - Set:
export allow_arg_mismatch_flag="-fallow-argument-mismatch" - Set:
HIP_MODEL=gfx90aorHIP_MODEL=gfx942depending on the GPU model. - Run:
sed -ie -e "s/<GPU_MODEL>/${HIP_MODEL}/" arch/amd_gfx.psmp - Proceed to build it:
CC=cc CXX=CC make -j 4 ARCH=amd_gfx VERSION=psmp
I will also state that for the 'v2025.1' build, I am using the 'v2.8.0' tagged version of 'dbcsr'; with the 'v2024.3' build, we use the 'v2.7.0' tagged version of 'dbcsr'.
When I run using the 'tests/SE/regtest-2-1/md.inp' with 2 MPI tasks, this is the output I get at the end; the end output doesn't matter whether one node is used or 2:
...
PW_GRID| Information for grid number 2
PW_GRID| Grid distributed over 2 processors
PW_GRID| Real space group dimensions 2 1
PW_GRID| the grid is blocked: NO
PW_GRID| Cutoff [a.u.] 1.5
PW_GRID| spherical cutoff: YES
PW_GRID| Grid points within cutoff 4547
PW_GRID| Bounds 1 -13 13 Points: 27
PW_GRID| Bounds 2 -13 13 Points: 27
PW_GRID| Bounds 3 -13 13 Points: 27
PW_GRID| Volume element (a.u.^3) 5.357 Volume (a.u.^3) 105442.7279
PW_GRID| Grid span HALFSPACE
PW_GRID| Distribution Average Max Min
PW_GRID| G-Vectors 2273.5 2284 2263
PW_GRID| G-Rays 258.5 259 258
PW_GRID| Real Space Points 9841.5 10206 9477
RS_GRID| Information for grid number 2
RS_GRID| Bounds 1 -13 13 Points: 27
RS_GRID| Bounds 2 -13 13 Points: 27
RS_GRID| Bounds 3 -13 13 Points: 27
RS_GRID| Real space fully replicated
RS_GRID| Group size 1
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7fc108b732e2 in ???
#1 0x7fc108b72475 in ???
#2 0x7fc10883297f in ???
#0 0x7f1a3313d2e2 in ???
#1 0x7f1a3313c475 in ???
#2 0x7f1a32dfc97f in ???
#3 0x15c3f77 in __eeq_method_MOD_eeq_solver
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/eeq_method.F:807
#3 0x15c3f77 in __eeq_method_MOD_eeq_solver
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/eeq_method.F:807
#4 0x15cd1ef in __eeq_method_MOD_eeq_charges
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/eeq_method.F:296
#5 0x15cdac7 in __eeq_method_MOD_eeq_print
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/eeq_method.F:143
#4 0x15cd1ef in __eeq_method_MOD_eeq_charges
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/eeq_method.F:296
#5 0x15cdac7 in __eeq_method_MOD_eeq_print
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/eeq_method.F:143
#6 0x1b6a134 in qs_scf_post_charges
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_scf_post_se.F:569
#6 0x1b6a134 in qs_scf_post_charges
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_scf_post_se.F:569
#7 0x1b6f5c5 in __qs_scf_post_se_MOD_scf_post_calculation_se
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_scf_post_se.F:129
#7 0x1b6f5c5 in __qs_scf_post_se_MOD_scf_post_calculation_se
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_scf_post_se.F:129
#8 0x1ae7b57 in __qs_scf_post_scf_MOD_qs_scf_compute_properties
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_scf_post_scf.F:57
#8 0x1ae7b57 in __qs_scf_post_scf_MOD_qs_scf_compute_properties
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_scf_post_scf.F:57
#9 0x23bab86 in __qs_scf_MOD_scf
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_scf.F:299
#9 0x23bab86 in __qs_scf_MOD_scf
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_scf.F:299
#10 0x15e81d3 in __qs_energy_MOD_qs_energies
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_energy.F:126
#10 0x15e81d3 in __qs_energy_MOD_qs_energies
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_energy.F:126
#11 0x128b0e8 in qs_forces
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_force.F:200
#12 0x128efd2 in __qs_force_MOD_qs_calc_energy_force
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_force.F:112
#11 0x128b0e8 in qs_forces
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_force.F:200
#12 0x128efd2 in __qs_force_MOD_qs_calc_energy_force
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/qs_force.F:112
#13 0x91b6ef in __force_env_methods_MOD_force_env_calc_energy_force
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/force_env_methods.F:260
#13 0x91b6ef in __force_env_methods_MOD_force_env_calc_energy_force
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/force_env_methods.F:260
#14 0x413a8b in cp2k_run
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/start/cp2k_runs.F:329
#15 0x40e5c4 in __cp2k_runs_MOD_run_input
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/start/cp2k_runs.F:935
#14 0x413a8b in cp2k_run
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/start/cp2k_runs.F:329
#15 0x40e5c4 in __cp2k_runs_MOD_run_input
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/start/cp2k_runs.F:935
#16 0x40e140 in cp2k
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/start/cp2k.F:379
#16 0x40e140 in cp2k
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/start/cp2k.F:379
#17 0x40e282 in main
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/start/cp2k.F:44
#17 0x40e282 in main
at /cp2k/TMP/BUILD/cray-test-cp2k-2025.1.0/src/start/cp2k.F:44
srun: error: nid001044: task 0: Segmentation fault
srun: Terminating StepId=1511673.0
slurmstepd: error: *** STEP 1511673.0 ON nid001044 CANCELLED AT 2025-01-16T14:09:17 ***
srun: error: nid001044: task 1: Segmentation fault (core dumped)
Here is a backtrace from the 'core' file using 'gdb':
(gdb) bt
#0 0x000000000149030b in eeq_method::eeq_solver (charges=..., lambda=0, eeq_energy=0,
particle_set=..., kind_of=..., cell=0x6b3e280, chia=..., gam=..., gab=...,
para_env=0x68a7a10, blacs_env=0x6ba3570, dft_control=0x6ba5420, eeq_sparam=...,
totalcharge=0, ewald=.TRUE., ewald_env=0x9446c10, ewald_pw=0x6b7b390, iounit=1)
at /lus/cflus03/cminear/src/cp2k/src/eeq_method.F:807
#1 0x00000000014994f6 in eeq_method::eeq_charges (qs_env=0x6b9d7f0, charges=...,
eeq_sparam=..., eeq_model=2, enshift_type=1)
at /lus/cflus03/cminear/src/cp2k/src/eeq_method.F:296
#2 0x0000000001499db4 in eeq_method::eeq_print (qs_env=0x6b9d7f0, iounit=-1, print_level=1)
at /lus/cflus03/cminear/src/cp2k/src/eeq_method.F:143
#3 0x000000000243f2da in qs_scf_post_se::qs_scf_post_charges (input=0x6880b60,
logger=0x68a82d0, qs_env=0x6b9d7f0, rho=0x48d9790, para_env=0x68a7a10)
at /lus/cflus03/cminear/src/cp2k/src/qs_scf_post_se.F:569
#4 0x0000000002444403 in qs_scf_post_se::scf_post_calculation_se (qs_env=0x6b9d7f0)
at /lus/cflus03/cminear/src/cp2k/src/qs_scf_post_se.F:129
#5 0x0000000002361aea in qs_scf_post_scf::qs_scf_compute_properties (qs_env=0x6b9d7f0,
wf_type=<error reading variable: Cannot access memory at address 0x0>,
do_mp2=<error reading variable: Cannot access memory at address 0x0>, _wf_type=0)
at /lus/cflus03/cminear/src/cp2k/src/qs_scf_post_scf.F:57
#6 0x0000000001d19205 in qs_scf::scf (qs_env=0x6b9d7f0,
has_converged=<error reading variable: Cannot access memory at address 0x0>,
total_scf_steps=<error reading variable: Cannot access memory at address 0x0>)
at /lus/cflus03/cminear/src/cp2k/src/qs_scf.F:299
#7 0x0000000000a606ab in qs_energy::qs_energies (qs_env=0x6b9d7f0,
consistent_energies=<error reading variable: Cannot access memory at address 0x0>,
calc_forces=.TRUE.) at /lus/cflus03/cminear/src/cp2k/src/qs_energy.F:126
#8 0x00000000019efe2f in qs_force::qs_forces (qs_env=0x6b9d7f0)
at /lus/cflus03/cminear/src/cp2k/src/qs_force.F:200
#9 0x00000000019f3d07 in qs_force::qs_calc_energy_force (qs_env=0x6b9d7f0,
calc_force=.TRUE., consistent_energies=.FALSE., linres=.FALSE.)
at /lus/cflus03/cminear/src/cp2k/src/qs_force.F:112
#10 0x00000000013c72c3 in force_env_methods::force_env_calc_energy_force (
force_env=0x6d26440, calc_force=.TRUE.,
consistent_energies=<error reading variable: Cannot access memory at address 0x0>,
skip_external_control=<error reading variable: Cannot access memory at address 0x0>,
eval_energy_forces=<error reading variable: Cannot access memory at address 0x0>,
--Type <RET> for more, q to quit, c to continue without paging--
require_consistent_energy_force=<error reading variable: Cannot access memory at address 0x0>, linres=<error reading variable: Cannot access memory at address 0x0>,
calc_stress_tensor=<error reading variable: Cannot access memory at address 0x0>)
at /lus/cflus03/cminear/src/cp2k/src/force_env_methods.F:260
#11 0x0000000000436c63 in cp2k_runs::cp2k_run (input_declaration=0x4aafa40,
input_file_name=..., output_unit=-1, mpi_comm=..., initial_variables=...,
_input_file_name=1024, _initial_variables=1024)
at /lus/cflus03/cminear/src/cp2k/src/start/cp2k_runs.F:329
#12 0x00000000004317de in cp2k_runs::run_input (input_declaration=0x4aafa40,
input_file_path=..., output_file_path=..., initial_variables=...,
mpi_comm=<error reading variable: Cannot access memory at address 0x0>,
_input_file_path=1024, _output_file_path=1024, _initial_variables=1024)
at /lus/cflus03/cminear/src/cp2k/src/start/cp2k_runs.F:935
#13 0x000000000040e0fd in cp2k () at /lus/cflus03/cminear/src/cp2k/src/start/cp2k.F:379
#14 0x000000000040e241 in main (argc=2, argv=0x7ffe3d02d31f)
at /lus/cflus03/cminear/src/cp2k/src/start/cp2k.F:44
#15 0x00007fb465e03eec in __libc_start_call_main () from /lib64/libc.so.6
#16 0x00007fb465e03fb5 in __libc_start_main_impl () from /lib64/libc.so.6
#17 0x000000000040b9d1 in _start () at ../sysdeps/x86_64/start.S:115
To reiterate, if the same test is run using the 'v2024.3' version, the test works as expected.
out of curiosity, what happens if you run a "CPU-only" run?
From the output, it seems the error comes from the PW (FFTW), so maybe you can try to disable it (-D__NO_OFFLOAD_PW).
Is there any reason why you are not using DBCSR acceleration, i.e. -D__DBCSR_ACC? Note that -D__ACC is not used anymore, as far as I know...
BTW, are you running on a HPE Cray system? if so, I would suggest do not use -mtune=native...
The CPU-only versions of the 4 tests identified work with both the 'v2024.3' and 'v2025.1' versions. I will also note that our Nvidia GPU tests also work with both versions.
I can try adding the -D__NO_OFFLOAD_PW. But the original case would still seem to be a regression as it was working in 'v2024.3' without this option (unless it is new).
I am working with an HPE Cray system. I will investigate dropping the -mtune=native option. (I'm surprised I missed that, since I try to avoid using that with other builds.) I will also investigate changing to use -D__DBCSR_ACC. I assume whenever I first added building CP2K for AMD GPUs, I found that the one option worked. Admittedly, I tend not to investigate if new build options are present or preferred if the new version builds with the existing 'arch' file.
The same tests are still falling in the same way with the use of -D__NO_OFFLOAD_PW in place. I also removed the -mtune=native use and replaces -D__ACC with -D__DBCSR_ACC; I had to add -lhipblas to the LIBS +=... line to get the compilation to complete successfully.