Build failure with CUDA 12.6 as Blackwell support was added in CUDA 12.8 not 12.6
The script get_cuda_gencode.sh assumes CUDA 12.6 adds Blackwell support:
https://github.com/NVIDIA/gdrcopy/blob/master/scripts/get_cuda_gencode.sh#L54
whilst in fact it was added first in CUDA 12.8; therefore, building gdrcopy with CUDA 12.6 fails as it does not support sm_100.
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:16:24_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0
$ nvcc -code-ls
sm_50
sm_52
sm_53
sm_60
sm_61
sm_62
sm_70
sm_72
sm_75
sm_80
sm_86
sm_87
sm_89
sm_90
$ sudo make
awk: cmd. line:1: warning: regexp escape sequence `\#' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `\#' is not a known regexp operator
GDRAPI_ARCH=ARM64
cd src/gdrdrv && \
make
make[1]: Entering directory '/opt/mellanox/gdrcopy/src/gdrdrv'
Picking NVIDIA driver sources from NVIDIA_SRC_DIR=/usr/src/nvidia-535.183.01/nvidia. If that does not meet your expectation, you might have a stale driver still around and that might cause problems.
Setting NVIDIA_IS_OPENSOURCE=y
Setting HAVE_VM_FLAGS_SET=n
Setting HAVE_PROC_OPS=y
make[2]: Entering directory '/usr/src/linux-headers-5.15.0-1020-nvidia-tegra-igx'
make[2]: Leaving directory '/usr/src/linux-headers-5.15.0-1020-nvidia-tegra-igx'
make[1]: Leaving directory '/opt/mellanox/gdrcopy/src/gdrdrv'
awk: cmd. line:1: warning: regexp escape sequence `\#' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `\#' is not a known regexp operator
cd src && \
make LIB_MAJOR_VER=2 LIB_MINOR_VER=5
make[1]: Entering directory '/opt/mellanox/gdrcopy/src'
GDRAPI_ARCH=ARM64
make[1]: Leaving directory '/opt/mellanox/gdrcopy/src'
cd tests && \
make CUDA=/usr/local/cuda
make[1]: Entering directory '/opt/mellanox/gdrcopy/tests'
/usr/local/cuda/bin/nvcc -o pplat.o -c pplat.cu -lcuda -lpthread -ldl -lgdrapi -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include -gencode arch=compute_60,code=compute_60 -gencode arch=compute_61,code=compute_61 -gencode arch=compute_62,code=compute_62 -gencode arch=compute_70,code=compute_70 -gencode arch=compute_72,code=compute_72 -gencode arch=compute_75,code=compute_75 -gencode arch=compute_80,code=compute_80 -gencode arch=compute_86,code=compute_86 -gencode arch=compute_87,code=compute_87 -gencode arch=compute_90,code=compute_90 -gencode arch=compute_100,code=compute_100 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_100,code=sm_100
nvcc fatal : Unsupported gpu architecture 'compute_100'
Hi @brudfors, The fix has been merged to R2.5 branch (https://github.com/NVIDIA/gdrcopy/tree/R2.5).
Hi @pakmarkthub, do you have any information on the timeline for the next gdrcopy release that will include this fix?
@brudfors can you please close this issue? Thanks
@jinyan-li1 FYI 2.5.1 has been release a few weeks ago.