Grid
Grid copied to clipboard
Cuda error invalid device ordinal
Describe the issue:
When running on more than 1 GPU (4 in the example here), I can see entries per each additional GPU:
Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149
Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149
Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149
Code example:
mpirun -np 4 ./wrapper.sh Benchmark_ITT --mpi 1.1.1.4
$cat wrapper.sh
#!/bin/bash
lrank=$OMPI_COMM_WORLD_LOCAL_RANK
export OMP_NUM_THREADS=1
case ${lrank} in
[0])
GPU=0
CPUBIND="0-19"
;;
[1])
GPU=1
CPUBIND="20-39"
;;
[2])
GPU=2
CPUBIND="40-59"
;;
[3])
GPU=3
CPUBIND="50-79"
;;
esac
CMD="env CUDA_VISIBLE_DEVICES=${GPU} numactl --physcpubind=${CPUBIND}"
echo "$CMD $@"
$CMD $@
Target platform:
Intel (40 cores/node) + 4xA100
Configure options:
../configure --enable-comms=mpi \
--enable-simd=GPU \
--enable-accelerator=cuda \
--prefix $prefix \
CXX=nvcc \
LDFLAGS=-L$prefix/lib/ \
CXXFLAGS="-ccbin mpicxx -gencode arch=compute_80,code=sm_80 -I$prefix/include/ -std=c++14"