llm.c GPU auto-detect capability for kernel builds

Fixes to CI -should work in both environments

This is a proposal in case there is interest for kernel builds.

Usage:

Auto detect GPU capability:

make (e.g. if your GPU capability type is 80 then --generate-code=arch=compute_80,code=[compute_80,sm_80] is used with CFLAGS)

Do not specify capability:

make GPU_COMPUTE_CAPABILITY= (CFLAGS = -O3 --use_fast_math)

Override capability:

make GPU_COMPUTE_CAPABILITY=86 (e.g. even if your GPU capability type is 80 then --generate-code=arch=compute_86,code=[compute_86,sm_86] is used with CFLAGS)

Tested on Linux Ubuntu 22.04 only.

May 03 '24 06:05 rosslwheeler

If it's for compiling for the local architecture, why not just use -arch=native?

Also, note the system could have more than one GPU and -arch=native will compile for all GPUs present:

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#gpu-architecture-arch

When -arch=native is specified, nvcc detects the visible GPUs on the system and generates codes for them, no PTX program will be generated for this option. It is a warning if there are no visible supported GPU on the system, and the default architecture will be used.

May 03 '24 09:05 alecco

I think -arch=native is a relatively new option. nvidia-cuda-toolkit in ubuntu 22.04 comes with 11.5, which doesn't support this option yet.

May 03 '24 16:05 ngc92

any reason we don't do this in main makefile too?

May 05 '24 21:05 karpathy

~/llm.c/dev/cuda$ make gelu_backward
/usr/bin/nvcc -O3 --use_fast_math --generate-code=arch=compute_80 ,code=[compute_80 ,sm_80 ] -lcublas -lcublasLt gelu_backward.cu -o gelu_backward
nvcc fatal   : Option '--generate-code arch=compute_80', missing code
make: *** [Makefile:27: gelu_backward] Error 1

huh

May 05 '24 21:05 karpathy

Okay the extra space at the end of 80 is fixed. Also, fixed the command line override too. Tested all 3 cases on Ubuntu.

One strange thing - it appears that the = after the generate-code was superfluous. It didn't seem to make any difference leaving it there or removing it.

So, these two below appear to run fine even though there's an extra = in there. What's the right syntax?

NVCC_FLAGS = -O3 -t=0 --use_fast_math --generate-code=arch=compute_80,code=[compute_80,sm_80]
NVCC_FLAGS = -O3 -t=0 --use_fast_math --generate-code arch=compute_80,code=[compute_80,sm_80]

May 06 '24 00:05 rosslwheeler

Main Makefile GPU auto-detect change is here: https://github.com/karpathy/llm.c/pull/371

May 06 '24 01:05 rosslwheeler

This change is in the current cuda Makefile. Closing.

Jul 07 '24 08:07 rosslwheeler