GPTQ-for-LLaMa Nvcc fatal : Unsupported gpu architecture 'compute

I get the following error when trying to run setup.py from gptq install. I have a RTX 3090 and followed instructions from this github gist FAILED: D:/AI/text-generation-webui/repositories/GPTQ-for-LLaMa/build/temp.win-amd64-cpython-310/Release/quant_cuda_kernel.obj C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc --generate-dependencies-with-compile --dependency-output D:\AI\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\cruge\miniconda3\envs\textgen\lib\site-packages\torch\include -IC:\Users\cruge\miniconda3\envs\textgen\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\cruge\miniconda3\envs\textgen\lib\site-packages\torch\include\TH -IC:\Users\cruge\miniconda3\envs\textgen\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include" -IC:\Users\cruge\miniconda3\envs\textgen\include -IC:\Users\cruge\miniconda3\envs\textgen\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\cppwinrt" -c D:\AI\text-generation-webui\repositories\GPTQ-for-LLaMa\quant_cuda_kernel.cu -o D:\AI\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 nvcc fatal : Unsupported gpu architecture 'compute_86' ninja: build stopped: subcommand failed. Traceback (most recent call last): File "C:\Users\cruge\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py", line 1808, in _run_ninja_build subprocess.run( File "C:\Users\cruge\miniconda3\envs\textgen\lib\subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

Mar 14 '23 05:03 DamonianoStudios

Check the cuda version with nvcc -V

Mar 14 '23 06:03 qwopqwop200

Ran that and got "unknown option --V" but running nvcc --version gave me nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Jun__8_16:59:34_Pacific_Daylight_Time_2022 Cuda compilation tools, release 11.7, V11.7.99 Build cuda_11.7.r11.7/compiler.31442593_0

Mar 14 '23 06:03 DamonianoStudios

You are currently using cuda 11.0. Install cuda 11.6.

Mar 14 '23 06:03 qwopqwop200

I now get the following error after installing Cuda 11.6.

`FAILED: D:/ai/text-generation-webui/repositories/GPTQ-for-LLaMa/build/temp.win-amd64-cpython-310/Release/quant_cuda_kernel.obj C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc --generate-dependencies-with-compile --dependency-output D:\ai\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\Users\cruge\miniconda3\envs\textgen\lib\site-packages\torch\include -IC:\Users\cruge\miniconda3\envs\textgen\lib\site-packages\torch\include\torch\csrc\api\include -IC:\Users\cruge\miniconda3\envs\textgen\lib\site-packages\torch\include\TH -IC:\Users\cruge\miniconda3\envs\textgen\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\include" -IC:\Users\cruge\miniconda3\envs\textgen\include -IC:\Users\cruge\miniconda3\envs\textgen\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\cppwinrt" -c D:\ai\text-generation-webui\repositories\GPTQ-for-LLaMa\quant_cuda_kernel.cu -o D:\ai\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 C:/Users/cruge/miniconda3/envs/textgen/lib/site-packages/torch/include\c10/macros/Macros.h(143): warning C4067: unexpected tokens following preprocessor directive - expected a newline C:/Users/cruge/miniconda3/envs/textgen/lib/site-packages/torch/include\c10/macros/Macros.h(143): warning C4067: unexpected tokens following preprocessor directive - expected a newline C:/Users/cruge/miniconda3/envs/textgen/lib/site-packages/torch/include\c10/core/SymInt.h(84): warning #68-D: integer conversion resulted in a change of sign

C:\Users\cruge\miniconda3\envs\textgen\lib\site-packages\torch\include\pybind11\cast.h(1429): error: too few arguments for template template parameter "Tuple" detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]" (1507): here

C:\Users\cruge\miniconda3\envs\textgen\lib\site-packages\torch\include\pybind11\cast.h(1503): error: too few arguments for template template parameter "Tuple" detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]" (1507): here`

Mar 14 '23 07:03 DamonianoStudios

Okay I was able to load the model by following the instructions here

          > Finally I managed to get it running. (I still can't compile it, thank you @Brawlence for providing windows wheel) Here is the guide;

1. Install the latest version of text-generation-webui

2. Create directory `text-generation-webui\repositories` and clone GPTQ-for-LLaMa there

3. Stay in the same conda env and install [this wheel](https://github.com/oobabooga/text-generation-webui/files/10947842/quant_cuda-0.0.0-cp310-cp310-win_amd64.whl.zip) with CUDA module. (`pip install quant_cuda-0.0.0-cp310-cp310-win_amd64.whl`)

4. Copy 4bit model to `models` folder and ensure that its name is in following format (example: `llama-30b-4bit.pt`). You still must have the directory with 8bit model in HFv2 format.

5. Start the webui `python .\server.py --model llama-30b --load-in-4bit --no-stream --listen`

Tested on Windows 11 with 30B model and RTX 4090.

If you have CUDA errors do the following:

Download this and this DLLs
Copy them to %USERPROFILE%\miniconda3\envs\textgen\lib\site-packages\bitsandbytes
Edit %USERPROFILE%\miniconda3\envs\textgen\lib\site-packages\bitsandbytes\cuda_setup\main.py
Change ct.cdll.LoadLibrary(binary_path) to ct.cdll.LoadLibrary(str(binary_path)) (two times)
Replace if not torch.cuda.is_available(): return 'libsbitsandbytes_cpu.so', None, None, None, None with if torch.cuda.is_available(): return 'libbitsandbytes_cuda116.dll', None, None, None, None

Originally posted by @Zerogoki00 in https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/11#issuecomment-1464961225

Can't compile but I can run so I'm fine with the outcome. Will keep trying to compile if people have solutions though.

Mar 14 '23 07:03 DamonianoStudios

Since we have changed to use triton now, we do not have this issue.

Apr 02 '23 03:04 qwopqwop200

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard

Nvcc fatal : Unsupported gpu architecture 'compute_86'

GPTQ-for-LLaMa GPTQ-for-LLaMa copied to clipboard

Nvcc fatal : Unsupported gpu architecture 'compute_86'

GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard