gdrcopy icon indicating copy to clipboard operation
gdrcopy copied to clipboard

Build failure with CUDA 12.6 as Blackwell support was added in CUDA 12.8 not 12.6

Open brudfors opened this issue 7 months ago • 2 comments

The script get_cuda_gencode.sh assumes CUDA 12.6 adds Blackwell support:

https://github.com/NVIDIA/gdrcopy/blob/master/scripts/get_cuda_gencode.sh#L54

whilst in fact it was added first in CUDA 12.8; therefore, building gdrcopy with CUDA 12.6 fails as it does not support sm_100.

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:16:24_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0

$ nvcc -code-ls
sm_50
sm_52
sm_53
sm_60
sm_61
sm_62
sm_70
sm_72
sm_75
sm_80
sm_86
sm_87
sm_89
sm_90

$ sudo make
awk: cmd. line:1: warning: regexp escape sequence `\#' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `\#' is not a known regexp operator
GDRAPI_ARCH=ARM64
cd src/gdrdrv && \
make
make[1]: Entering directory '/opt/mellanox/gdrcopy/src/gdrdrv'
Picking NVIDIA driver sources from NVIDIA_SRC_DIR=/usr/src/nvidia-535.183.01/nvidia. If that does not meet your expectation, you might have a stale driver still around and that might cause problems.
Setting NVIDIA_IS_OPENSOURCE=y
Setting HAVE_VM_FLAGS_SET=n
Setting HAVE_PROC_OPS=y
make[2]: Entering directory '/usr/src/linux-headers-5.15.0-1020-nvidia-tegra-igx'
make[2]: Leaving directory '/usr/src/linux-headers-5.15.0-1020-nvidia-tegra-igx'
make[1]: Leaving directory '/opt/mellanox/gdrcopy/src/gdrdrv'
awk: cmd. line:1: warning: regexp escape sequence `\#' is not a known regexp operator
awk: cmd. line:1: warning: regexp escape sequence `\#' is not a known regexp operator
cd src && \
make LIB_MAJOR_VER=2 LIB_MINOR_VER=5
make[1]: Entering directory '/opt/mellanox/gdrcopy/src'
GDRAPI_ARCH=ARM64
make[1]: Leaving directory '/opt/mellanox/gdrcopy/src'
cd tests && \
make CUDA=/usr/local/cuda
make[1]: Entering directory '/opt/mellanox/gdrcopy/tests'
/usr/local/cuda/bin/nvcc -o pplat.o -c pplat.cu -lcuda -lpthread -ldl -lgdrapi -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include  -gencode arch=compute_60,code=compute_60 -gencode arch=compute_61,code=compute_61 -gencode arch=compute_62,code=compute_62 -gencode arch=compute_70,code=compute_70 -gencode arch=compute_72,code=compute_72 -gencode arch=compute_75,code=compute_75 -gencode arch=compute_80,code=compute_80 -gencode arch=compute_86,code=compute_86 -gencode arch=compute_87,code=compute_87 -gencode arch=compute_90,code=compute_90 -gencode arch=compute_100,code=compute_100 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_100,code=sm_100
nvcc fatal   : Unsupported gpu architecture 'compute_100'

brudfors avatar May 28 '25 12:05 brudfors

Hi @brudfors, The fix has been merged to R2.5 branch (https://github.com/NVIDIA/gdrcopy/tree/R2.5).

pakmarkthub avatar May 30 '25 20:05 pakmarkthub

Hi @pakmarkthub, do you have any information on the timeline for the next gdrcopy release that will include this fix?

jinyan-li1 avatar Jun 16 '25 18:06 jinyan-li1

@brudfors can you please close this issue? Thanks

drossetti avatar Sep 15 '25 17:09 drossetti

@jinyan-li1 FYI 2.5.1 has been release a few weeks ago.

drossetti avatar Sep 15 '25 17:09 drossetti