llama.cpp For CUDA versions < 11.7 a target CUDA architecture must be explicitly provided via CUDA_DOCKER

trafficstars

I tried to build llama.cpp with cublas and got an error.

I used cuda 11.4, then installed 12.4, changed the PATH of nvcc to the new version.

Windows 11.

Logs from w64devkit:

E:/llama.cpp $ make LLAMA_CUBLAS=1
I ccache not found. Consider installing it for faster compilation.
I llama.cpp build info:
I UNAME_S:   Windows_NT
I UNAME_P:   unknown
I UNAME_M:   x86_64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/usr/local/cuda/targets/x86_64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -Wdouble-promotion
I CXXFLAGS:  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -I. -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/usr/local/cuda/targets/x86_64-linux/include
I NVCCFLAGS: -std=c++11 -O3 -use_fast_math --forward-unknown-to-host-compiler -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128
I LDFLAGS:   -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/usr/lib64 -L/usr/local/cuda/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
I CC:        cc (GCC) 13.2.0
I CXX:       g++ (GCC) 13.2.0
I NVCC:      Build cuda_12.4.r12.4/compiler.33961263_0
grep: unknown option -- P
BusyBox v1.37.0.git (2023-12-05 18:36:56 UTC)

Usage: grep [-HhnlLoqvsrRiwFE] [-m N] [-A|B|C N] { PATTERN | -e PATTERN... | -f FILE... } [FILE]...

Search for PATTERN in FILEs (or stdin)

        -H      Add 'filename:' prefix
        -h      Do not add 'filename:' prefix
        -n      Add 'line_no:' prefix
        -l      Show only names of files that match
        -L      Show only names of files that don't match
        -c      Show only count of matching lines
        -o      Show only the matching part of line
        -q      Quiet. Return 0 if PATTERN is found, 1 otherwise
        -v      Select non-matching lines
        -s      Suppress open and read errors
        -r      Recurse
        -R      Recurse and dereference symlinks
        -i      Ignore case
        -w      Match whole words only
        -x      Match whole lines only
        -F      PATTERN is a literal (not regexp)
        -E      PATTERN is an extended regexp
        -m N    Match up to N times per file
        -A N    Print N lines of trailing context
        -B N    Print N lines of leading context
        -C N    Same as '-A N -B N'
        -e PTRN Pattern to match
        -f FILE Read pattern from file
Makefile:613: *** I ERROR: For CUDA versions < 11.7 a target CUDA architecture must be explicitly provided via CUDA_DOCKER_ARCH.  Stop.

also from w64devkit:

E:/llama.cpp $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

Mar 10 '24 15:03 mqopi

This solved the issue for me:

sudo apt install nvidia-cuda-toolkit
rm -rf build; cmake -S . -B build -DLLAMA_CUBLAS=ON && cmake --build build --config Release

Mar 10 '24 16:03 crashr

This solved the issue for me:

sudo apt install nvidia-cuda-toolkit
rm -rf build; cmake -S . -B build -DLLAMA_CUBLAS=ON && cmake --build build --config Release

That didn't work for me because I'm using Windows. I would try this using WSL, but I want to use the already installed CUDA for Windows.

Mar 11 '24 06:03 mqopi

i'm having similar issues. please update if you figure something

Mar 11 '24 23:03 lobsterchan27

@mqopi I never used Window so it's just a guess but how about installing nvidia-cuda-toolkit for Windows? Just to be clear, it worked for me for a long time without nvidia-cuda-toolkit, I had to install it after one of the recent commits because it sopped working.

Mar 13 '24 11:03 crashr

i managed to get it working by installing visual studio rather than just the build tools alone then installing cuda afterwards with vs integration. should work after that. also i used cmake

Mar 13 '24 12:03 lobsterchan27

I believe the issue stems from this part of the error: "grep: unknown option -- P". I assume you are, like me, using w64devkit as the README suggests. It seems like w64devkit's version of grep does not support the --P flag. If you look at line 645 of the Makefile, try changing the following from

CUDA_VERSION := $(shell $(NVCC) --version | grep -oP 'release (\K[0-9]+\.[0-9])')

To:

CUDA_VERSION := $(shell $(NVCC) --version | perl -nle 'print $& if /release ([0-9]+\.[0-9])/')

This uses perl instead, which was inspired by this stack overflow question. On the latest version of w64devkit on my system, I get the following output when I try and parse the version of nvcc:

$ nvcc.exe --version | perl -nle 'print $& if /release ([0-9]+\.[0-9])/'
release 12.3

This seems like a valid output, and I am able to get further along in compilation (I do, however, get a new and hopefully unrelated error: "Cannot find compiler 'cl.exe' in PATH").

Mar 29 '24 23:03 ntriche

I had the same problem. Contributer from this problem helped

make CUDA_DOCKER_ARCH=sm_86 (for video cards from 3060 to 3090ti)

Mar 31 '24 10:03 MaddoDev

@ntriche >>

CUDA_VERSION := $(shell $(NVCC) --version | perl -nle 'print $& if /release ([0-9]+.[0-9])/')

Also using windows, also using w64devkit as the readme reccomended.

I did not have perl installed, so this line using the grep installed w64devkit worked: grep -oe 'release [0-9]*\.[0-9]*' @Maintainers, please switch the grep params to use -oe instead of -p, if that works. I don't have access to a *nix system right now so I cant test it there, but I suspect it'll work the same.

s\n: Posting error message here for future searchers, as this issue was difficult for me to find in the first place

Error output

~/Programs/llama.cpp-master $ make -j LLAMA_CUDA=1 I ccache not found. Consider installing it for faster compilation. I llama.cpp build info: I UNAME_S: Windows_NT I UNAME_P: unknown I UNAME_M: x86_64 I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_CUDA -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/include -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/targets/x86_64-linux/include -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -Wdouble-promotion I CXXFLAGS: -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -I. -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_CUDA -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/include -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/targets/x86_64-linux/include I NVCCFLAGS: -std=c++11 -O3 -use_fast_math --forward-unknown-to-host-compiler -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 I LDFLAGS: -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -LC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/lib64 -L/usr/lib64 -LC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/targets/x86_64-linux/lib -L/usr/lib/wsl/lib I CC: cc (GCC) 13.2.0 I CXX: g++ (GCC) 13.2.0 I NVCC: Build cuda_12.4.r12.4/compiler.33961263_0 grep: unknown option -- P BusyBox v1.37.0.git (2023-12-05 17:59:02 UTC)

Usage: grep [-HhnlLoqvsrRiwFE] [-m N] [-A|B|C N] { PATTERN | -e PATTERN... | -f FILE... } [FILE]...

Search for PATTERN in FILEs (or stdin)

    -H      Add 'filename:' prefix
    -h      Do not add 'filename:' prefix
    -n      Add 'line_no:' prefix
    -l      Show only names of files that match
    -L      Show only names of files that don't match
    -c      Show only count of matching lines
    -o      Show only the matching part of line
    -q      Quiet. Return 0 if PATTERN is found, 1 otherwise
    -v      Select non-matching lines
    -s      Suppress open and read errors
    -r      Recurse
    -R      Recurse and dereference symlinks
    -i      Ignore case
    -w      Match whole words only
    -x      Match whole lines only
    -F      PATTERN is a literal (not regexp)
    -E      PATTERN is an extended regexp
    -m N    Match up to N times per file
    -A N    Print N lines of trailing context
    -B N    Print N lines of leading context
    -C N    Same as '-A N -B N'
    -e PTRN Pattern to match
    -f FILE Read pattern from file

Makefile:649: *** I ERROR: For CUDA versions < 11.7 a target CUDA architecture must be explicitly provided via CUDA_DOCKER_ARCH. Stop.

I now get error:

nvcc fatal   : A single input file is required for a non-link phase when an outputfile is specified
make: *** [Makefile:482: ggml-cuda.o] Error 1
make: *** Waiting for unfinished jobs....
nvcc fatal   : A single input file is required for a non-link phase when an outputfile is specified
make: *** [Makefile:479: ggml-cuda/acc.o] Error 1
nvcc fatal   : A single input file is required for a non-link phase when an outputfile is specified
make: *** [Makefile:479: ggml-cuda/alibi.o] Error 1
nvcc fatal   : A single input file is required for a non-link phase when an outputfile is specified
make: *** [Makefile:479: ggml-cuda/arange.o] Error 1

but atleast the first issue is no more :). #4119 reccomends using Cmake, I'll investigate that later.

Apr 02 '24 20:04 Kyu

Windows 11 / Nvidia 4060ti 16 GB - same issue -> getting GPU to work.

NOTE The following was done prior:

visual studio community must be installed (2022) With c++ ; otherwise all kinds of pain.
then install / update Cuda Tool Kit.
wheel creation: used powershell instead of a cmd window

manually set: $env:CMAKE_ARGS="-DLLAMA_CUBLAS=on" and $env:FORCE_CMAKE=1

ran : pip uninstall llama-cpp-python followed by pip install llama-cpp-python --force-reinstall --no-cache-dir --verbose

POSSIBLE ADJUST: set CMAKE_ARGS=-DLLAMA_CUBLAS=on and not set CMAKE_ARGS="-DLLAMA_CUBLAS=on"

Did the following (this is standard CPU install) in "dev64" (see windows install using "make"):

On Windows:

Download the latest fortran version of [w64devkit](https://github.com/skeeto/w64devkit/releases).
Extract w64devkit on your pc.
Run w64devkit.exe.
Use the cd command to reach the llama.cpp folder.
From here you can run:
make

Regardless of this step + this step [also ran in w64devkit]: make LLAMA_CUDA=1

Cuda still would not work / exe files would not "compile" with "cuda" so to speak.

Finally this worked:

STEP 1 - Getting GPU to work ( using w64devkit ):

mkdir build cd build cmake .. -DLLAMA_CUDA=ON cmake --build . --config Release

Step 2: Moved "EXE" files from build/bin/release -> to main "llamacpp" Directory. These will overwrite the "old" cpu only EXE files -> and then GPU should be used now/available.

Confirmed this via Imatrix calculations (7B model, 113 chunks) -> CPU 1 hr 15 mins VS GPU 1.27 minutes.

Side note:

-> GPU (with no offloading of layers??) only is now 10 minutes (removed "NGL" flag) VS "old cpu" @ 1 hr 15 minutes. (I kept a copy) (GPU confirmed -> being used)

-> GPU with with -ngl 99 : 1.27 minutes

Maybe back up the "old" exe files / or rename for other usage(s)?

Could moderator / contributor please confirm this step ?

That it is ok/will not cause other issues?

ALTERNATIVE: Download -> https://github.com/ggerganov/llama.cpp/releases/ The " llama-b2694-bin-win-cuda-cu12.2.0-x64.zip " (or 11.7 ... etc) Extract -> These files can be used from a separate folder (IE I setup "_exe" folder in main llamacpp folder.)

Files could be pasted and overwrite "old cpu" exe files - but this may have unintended issues. NOTE: Quick check showed "imatrix" / GPU cal was SLOWER using these files: IE: Same Imatrix -> 1.82 minutes VS 1.27 minutes Could moderator / contributor please confirm this step ?

Apr 19 '24 01:04 David-AU-github

I had the same problem. Contributer from this problem helped

make CUDA_DOCKER_ARCH=sm_86 (for video cards from 3060 to 3090ti)

This worked for me as well for RTX 3060: make LLAMA_CUDA=1 CUDA_DOCKER_ARCH=sm_86

Apr 23 '24 11:04 santoshbs

I went with changing the Makefile L653 to CUDA_VERSION := $(shell $(NVCC) --version | perl -nle 'print $& if /release ([0-9]+\.[0-9])/')

Have VB with cpp installed, Nvidia CUDA tools installed Win11 RTX3080ti Installed "strawberry" open source version or perl.

I got further, but now am getting this error:

C:/dev/llama.cpp $ make LLAMA_CUDA=1 CUDA_DOCKER_ARCH=sm_86
I ccache found, compilation results will be cached. Disable with LLAMA_NO_CCACHE.
I llama.cpp build info:
I UNAME_S:   Windows_NT
I UNAME_P:   unknown
I UNAME_M:   x86_64
I CFLAGS:    -I. -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/include -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/targets/x86_64-linux/include  -std=c11   -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -march=native -mtune=native -Xassembler -muse-unaligned-vector-move -Wdouble-promotion
I CXXFLAGS:  -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move  -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -I. -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/include -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/targets/x86_64-linux/include
I NVCCFLAGS: -std=c++11 -O3 -use_fast_math --forward-unknown-to-host-compiler -Wno-deprecated-gpu-targets -arch=sm_86 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128
I LDFLAGS:   -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -LC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/lib64 -L/usr/lib64 -LC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
I CC:        cc (GCC) 13.2.0
I CXX:       x86_64-w64-mingw32-g++ (GCC) 13.2.0
I NVCC:      Build cuda_12.4.r12.4/compiler.34097967_0

C:/Strawberry/c/bin/ccache.exe nvcc -std=c++11 -O3 -use_fast_math --forward-unknown-to-host-compiler -Wno-deprecated-gpu-targets -arch=sm_86 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128  -I. -Icommon -D_XOPEN_SOURCE=600 -DNDEBUG -D_WIN32_WINNT=0x602 -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/include -IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.4/targets/x86_64-linux/include  -Xcompiler "-std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -Xassembler -muse-unaligned-vector-move  -Wno-array-bounds -Wno-pedantic" -c ggml-cuda.cu -o ggml-cuda.o
nvcc fatal   : A single input file is required for a non-link phase when an outputfile is specified
make: *** [Makefile:487: ggml-cuda.o] Error 1

Some fatal nvcc error about a single output file. ~~Any suggestions?~~

EDIT:

Worked with Cmake instead of Make :)

Apr 25 '24 18:04 KartulUdus

This issue was closed because it has been inactive for 14 days since being marked as stale.

Jun 09 '24 01:06 github-actions[bot]

rm -rf build; cmake -S . -B build -DLLAMA_CUBLAS=ON && cmake --build build --config Release

I had to use

rm -rf build; cmake -S . -B build -DGGML_CUDA=ON && cmake --build build --config Release

but it worked wonders. Thank you!

Jun 28 '24 20:06 Irrw1sch

llama.cpp llama.cpp copied to clipboard

For CUDA versions < 11.7 a target CUDA architecture must be explicitly provided via CUDA_DOCKER_ARCH

llama.cpp
llama.cpp copied to clipboard