GPTQ-for-LLaMa
GPTQ-for-LLaMa copied to clipboard
Issues with cuda setup
I attempt to install using python setup_cuda.py install and get the error trace below. Checking my nvidia cuda install, I get the following trace: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0
(Full error trace) (llama4bit) E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa>python setup_cuda.py install running install C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools. warnings.warn( running bdist_egg running egg_info writing quant_cuda.egg-info\PKG-INFO writing dependency_links to quant_cuda.egg-info\dependency_links.txt writing top-level names to quant_cuda.egg-info\top_level.txt reading manifest file 'quant_cuda.egg-info\SOURCES.txt' writing manifest file 'quant_cuda.egg-info\SOURCES.txt' installing library code to build\bdist.win-amd64\egg running install_lib running build_ext C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py:388: UserWarning: The detected CUDA version (11.4) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem. warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda)) building 'quant_cuda' extension Emitting ninja build file E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/1] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\TH -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\ProgramData\miniconda3\envs\llama4bit\include -IC:\ProgramData\miniconda3\envs\llama4bit\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\quant_cuda_kernel.cu -o E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 FAILED: E:/llmRunner/text-generation-webui/repositories/GPTQ-for-LLaMa/build/temp.win-amd64-cpython-310/Release/quant_cuda_kernel.obj C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\torch\csrc\api\include -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\TH -IC:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.4\include" -IC:\ProgramData\miniconda3\envs\llama4bit\include -IC:\ProgramData\miniconda3\envs\llama4bit\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt" -c E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\quant_cuda_kernel.cu -o E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\build\temp.win-amd64-cpython-310\Release\quant_cuda_kernel.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: unexpected tokens following preprocessor directive - expected a newline C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\pybind11\cast.h(624): error: too few arguments for template template parameter "Tuple" detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]" (721): here
C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\include\pybind11\cast.h(717): error: too few arguments for template template parameter "Tuple" detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]" (721): here
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/util/irange.h(54): warning: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided,
C:/ProgramData/miniconda3/envs/llama4bit/lib/site-packages/torch/include\c10/util/irange.h(54): warning: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided,
2 errors detected in the compilation of "e:/llmrunner/text-generation-webui/repositories/gptq-for-llama/quant_cuda_kernel.cu". quant_cuda_kernel.cu ninja: build stopped: subcommand failed. Traceback (most recent call last): File "C:\ProgramData\miniconda3\envs\llama4bit\lib\site-packages\torch\utils\cpp_extension.py", line 1893, in _run_ninja_build subprocess.run( File "C:\ProgramData\miniconda3\envs\llama4bit\lib\subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "E:\llmRunner\text-generation-webui\repositories\GPTQ-for-LLaMa\setup_cuda.py", line 4, in