gdrcopy
gdrcopy copied to clipboard
Improve cuda gencode flags
Context
- The current cuda SASS/PTX list is hardcoded manually based on a versioning heuristic that is error-prone. Case in point:
- blackwell sm 10.1 is missing
- blackwell sm 10.0 is supported after 12.8, not 12.6.
make[1]: Entering directory '***/gdrcopy/tests' /usr/local/cuda/bin/nvcc -o pplat.o -c pplat.cu -lcuda -lpthread -ldl -lgdrapi -I /usr/local/cuda/include -I ../include -I ../src -I /usr/local/cuda/include -gencode arch=compute_60,code=compute_60 -gencode arch=compute_61,code=compute_61 -gencode arch=compute_62,code=compute_62 -gencode arch=compute_70,code=compute_70 -gencode arch=compute_72,code=compute_72 -gencode arch=compute_75,code=compute_75 -gencode arch=compute_80,code=compute_80 -gencode arch=compute_86,code=compute_86 -gencode arch=compute_87,code=compute_87 -gencode arch=compute_90,code=compute_90 -gencode arch=compute_100,code=compute_100 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_87,code=sm_87 -gencode arch=compute_90,code=sm_90 -gencode arch=compute_100,code=sm_100 nvcc fatal : Unsupported gpu architecture 'compute_100' make[1]: *** [Makefile:54: pplat.o] Error 1
- full arch list are used for code (SASS) and compute (PTX). For PTX, only latest is needed.
Changes
- Use sm list from
nvcc --list-gpu-codedirectly when available - Fix blackwell sm list and version compatibility
- Consolidate compute & sm list in a single variable
- Only build PTX for last supported arch
@agirault The fix has been merged to R2.5 branch (https://github.com/NVIDIA/gdrcopy/tree/R2.5)