GPU auto-detect capability for kernel builds
Fixes to CI -should work in both environments
This is a proposal in case there is interest for kernel builds.
Usage:
Auto detect GPU capability:
make (e.g. if your GPU capability type is 80 then --generate-code=arch=compute_80,code=[compute_80,sm_80] is used with CFLAGS)
Do not specify capability:
make GPU_COMPUTE_CAPABILITY= (CFLAGS = -O3 --use_fast_math)
Override capability:
make GPU_COMPUTE_CAPABILITY=86 (e.g. even if your GPU capability type is 80 then --generate-code=arch=compute_86,code=[compute_86,sm_86] is used with CFLAGS)
Tested on Linux Ubuntu 22.04 only.
If it's for compiling for the local architecture, why not just use -arch=native?
Also, note the system could have more than one GPU and -arch=native will compile for all GPUs present:
https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-architecture-arch
When -arch=native is specified, nvcc detects the visible GPUs on the system and generates codes for them, no PTX program will be generated for this option. It is a warning if there are no visible supported GPU on the system, and the default architecture will be used.
I think -arch=native is a relatively new option.
nvidia-cuda-toolkit in ubuntu 22.04 comes with 11.5, which doesn't support this option yet.
any reason we don't do this in main makefile too?
~/llm.c/dev/cuda$ make gelu_backward
/usr/bin/nvcc -O3 --use_fast_math --generate-code=arch=compute_80 ,code=[compute_80 ,sm_80 ] -lcublas -lcublasLt gelu_backward.cu -o gelu_backward
nvcc fatal : Option '--generate-code arch=compute_80', missing code
make: *** [Makefile:27: gelu_backward] Error 1
huh
Okay the extra space at the end of 80 is fixed. Also, fixed the command line override too. Tested all 3 cases on Ubuntu.
One strange thing - it appears that the = after the generate-code was superfluous. It didn't seem to make any difference leaving it there or removing it.
So, these two below appear to run fine even though there's an extra = in there. What's the right syntax?
NVCC_FLAGS = -O3 -t=0 --use_fast_math --generate-code=arch=compute_80,code=[compute_80,sm_80]
NVCC_FLAGS = -O3 -t=0 --use_fast_math --generate-code arch=compute_80,code=[compute_80,sm_80]
Main Makefile GPU auto-detect change is here: https://github.com/karpathy/llm.c/pull/371
This change is in the current cuda Makefile. Closing.