Grid icon indicating copy to clipboard operation
Grid copied to clipboard

Cuda error invalid device ordinal

Open lcebaman opened this issue 1 year ago • 0 comments

Describe the issue:

When running on more than 1 GPU (4 in the example here), I can see entries per each additional GPU:

Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149

Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149

Cuda error invalid device ordinal /home/Grid/lattice/Lattice_base.h Line 149

Code example:

mpirun -np 4 ./wrapper.sh Benchmark_ITT --mpi 1.1.1.4
$cat wrapper.sh
#!/bin/bash
lrank=$OMPI_COMM_WORLD_LOCAL_RANK   
                                                                                                                                                      
export OMP_NUM_THREADS=1                                                                                                                                                                  
case ${lrank} in                                                                                                                                                                          
    [0])                                                                                                                                                                                  
        GPU=0                                                                                                                                                                             
        CPUBIND="0-19"                                                                                                                                                                    
        ;;                                                                                                                                                                                
    [1])                                                                                                                                                                                  
        GPU=1                                                                                                                                                                             
        CPUBIND="20-39"                                                                                                                                                                   
        ;;                                                                                                                                                                                
    [2])                                                                                                                                                                                  
        GPU=2                                                                                                                                                                             
        CPUBIND="40-59"                                                                                                                                                                   
        ;;                                                                                                                                                                                
    [3])                                                                                                                                                                                  
        GPU=3                                                                                                                                                                             
        CPUBIND="50-79"                                                                                                                                                                   
        ;;                                                                                                                                                                                
esac                                                                                                                                                                                      
                                                                                                                                                                                          
CMD="env CUDA_VISIBLE_DEVICES=${GPU} numactl --physcpubind=${CPUBIND}"                                                                                                                    
echo "$CMD $@"                                                                                                                                                                            
                                                                                                                                                                                          
$CMD $@

Target platform:

Intel (40 cores/node) + 4xA100

Configure options:

../configure --enable-comms=mpi          \
             --enable-simd=GPU           \
             --enable-accelerator=cuda   \
             --prefix $prefix       \
             CXX=nvcc                    \
             LDFLAGS=-L$prefix/lib/ \
            CXXFLAGS="-ccbin mpicxx -gencode arch=compute_80,code=sm_80 -I$prefix/include/ -std=c++14"

lcebaman avatar Jul 13 '23 13:07 lcebaman