text-generation-webui
text-generation-webui copied to clipboard
No module named 'llama_inference_offload' on Arch Linux
Describe the bug
Try to run server.py in following: python server.py --wbits 4 --groupsize 128 and get error No module named 'llama_inference_offload' I did this fix: https://github.com/oobabooga/text-generation-webui/issues/400#issuecomment-1474876859 did not help.
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Run following command: python server.py --wbits 4 --groupsize 128
Screenshot
No response
Logs
$ python server.py --wbits 4 --groupsize 128
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/x/miniconda3/envs/textgen/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /home/x/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
Loading vicuna-13b-GPTQ-4bit-128g...
Traceback (most recent call last):
File "/home/x/text-generation-webui/server.py", line 308, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "/home/x/text-generation-webui/modules/models.py", line 100, in load_model
from modules.GPTQ_loader import load_quantized
File "/home/x/text-generation-webui/modules/GPTQ_loader.py", line 14, in <module>
import llama_inference_offload
ModuleNotFoundError: No module named 'llama_inference_offload'
System Info
-` x@archlinux
.o+` ----------------
`ooo/ OS: Arch Linux x86_64
`+oooo: Host: X670 GAMING X AX -CF
`+oooooo: Kernel: 6.2.9-zen1-1-zen
-+oooooo+: Uptime: 1 hour, 51 mins
`/:-:++oooo+: Packages: 1623 (pacman), 20 (flatpak), 7 (snap)
`/++++/+++++++: Shell: zsh 5.9
`/++++++++++++++: Resolution: 2560x1440
`/+++ooooooooooooo/` DE: Plasma 5.27.4
./ooosssso++osssssso+` WM: kwin
.oossssso-`/ossssss+` WM Theme: Endless
-osssssso. :ssssssso. Theme: [Plasma], Breeze [GTK3]
:osssssss/ osssso+++. Icons: [Plasma], Relax-Dark-Icons [GTK2/3]
/ossssssss/ +ssssooo/- Terminal: terminator
`/ossssso+/:- -:/+osssso+- CPU: AMD Ryzen 9 7900X (24) @ 4.700GHz
`+sso+:-` `.-/+oso: GPU: AMD ATI 16:00.0 Raphael
`++:. `-/+/ GPU: NVIDIA GeForce RTX 2080 Ti Rev. A
.` `/ Memory: 12939MiB / 31231MiB
Follow the steps here
https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#installation
Specifically, the llama_inference_offload
function is only available in the triton branch of GPTQ-for-LLaMA
Also, I had better luck with vicuna-13b-4bit-128g on cuda. You will probably need to specify --model_type llama
as well. There's a lot of trial and error here in the comments
Follow the steps here
https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#installation
This failed with following error: :( RuntimeError: The current installed version of g++ (12.2.1) is greater than the maximum required version by CUDA 11.7. Please make sure to use an adequate version of g++ (>=6.0.0, <12.0).
Ah I'm guessing arch has a newer one available than Ubuntu, for example. I would install g++ manually, staying on 11.x, then try
https://github.com/oobabooga/text-generation-webui/issues/850 looks relevant to that
Follow the steps here https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#installation
This failed with following error: :( RuntimeError: The current installed version of g++ (12.2.1) is greater than the maximum required version by CUDA 11.7. Please make sure to use an adequate version of g++ (>=6.0.0, <12.0).
I had the same issue with fedora 37. To fix this I did the following.
conda install -c conda-forge gxx
If that doesn't work try
conda install gcc_linux-64==11.2.0
same error on windows 11
Got this working on Arch. Here are the steps:
- git clone https://github.com/oobabooga/text-generation-webui.git
- sudo pacman -S rocm-hip-sdk python-tqdm
- cd text-generation-webui
- python -m venv --system-site-packages venv
- export PATH=/opt/rocm/bin:$PATH
- export HSA_OVERRIDE_GFX_VERSION=10.3.0 HCC_AMDGPU_TARGET=gfx1030
- python -m venv --system-site-packages venv
- source venv/bin/activate
- pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
- mkdir repositories && cd repositories
- git clone https://github.com/agrocylo/bitsandbytes-rocm
- cd bitsandbytes-rocm
- make hip
- python setup.py install
- cd ..
- git clone https://github.com/WapaMario63/GPTQ-for-LLaMa-ROCm GPTQ-for-LLaMa
- cd GPTQ-for-LLaMa
- python setup_rocm.py install
- cd ../..
- python download-model.py anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g
- rm models/anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g/gpt-x-alpaca-13b-native-4bit-128g.pt (needed or it will just spam random numbers)
- pip install -r requirements.txt
- python server.py --wbits 4 --groupsize 128
That should do it. Just did this with a fresh install so it should not be missing any steps. If you are using the Nvidia card steps should be the same but you will install the nvidia equivalent of rocm-hip-sdk and use the normal GPTQ and bitsandbytes repos which instructions are in this repos wiki for.
python3 setup_cuda.py install
failed with error: command '/usr/bin/nvcc' failed with exit code 1
EDIT: Seem like this might be a problem with having mismatched cuda and nvcc versions. Fixed by reinstalling Linux and Installing cuda toolkit with nvcc using this script https://gist.github.com/X-TRON404/e9cab789041ef03bcba13da1d5176e28
(You probably don't need to reinstall linux, i just did it out of frustration and found out the script afterward . Running that script should work as it will delete all previously installed drivers for you.)
Full output:
running install
/home/ass/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
/home/ass/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running bdist_egg
running egg_info
writing quant_cuda.egg-info/PKG-INFO
writing dependency_links to quant_cuda.egg-info/dependency_links.txt
writing top-level names to quant_cuda.egg-info/top_level.txt
/home/ass/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'quant_cuda.egg-info/SOURCES.txt'
writing manifest file 'quant_cuda.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
/home/ass/.local/lib/python3.10/site-packages/torch/utils/cpp_extension.py:388: UserWarning: The detected CUDA version (11.5) has a minor version mismatch with the version that was used to compile PyTorch (11.7). Most likely this shouldn't be a problem.
warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
building 'quant_cuda' extension
gcc -pthread -B /home/ass/miniconda3/envs/textgen/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/ass/miniconda3/envs/textgen/include -fPIC -O2 -isystem /home/ass/miniconda3/envs/textgen/include -fPIC -I/home/ass/.local/lib/python3.10/site-packages/torch/include -I/home/ass/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/ass/.local/lib/python3.10/site-packages/torch/include/TH -I/home/ass/.local/lib/python3.10/site-packages/torch/include/THC -I/home/ass/miniconda3/envs/textgen/include/python3.10 -c quant_cuda.cpp -o build/temp.linux-x86_64-cpython-310/quant_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
/usr/bin/nvcc -I/home/ass/.local/lib/python3.10/site-packages/torch/include -I/home/ass/.local/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/ass/.local/lib/python3.10/site-packages/torch/include/TH -I/home/ass/.local/lib/python3.10/site-packages/torch/include/THC -I/home/ass/miniconda3/envs/textgen/include/python3.10 -c quant_cuda_kernel.cu -o build/temp.linux-x86_64-cpython-310/quant_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=quant_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++17
/home/ass/.local/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
(61): here
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"
/home/ass/.local/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here
/home/ass/.local/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero
detected during:
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"
(61): here
instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"
/home/ass/.local/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
error: command '/usr/bin/nvcc' failed with exit code 1
When facing the original problem, I somehow missed the need for the GPTQ-for-LLaMa
directory to be inside of the repositories
dir and had GPTQ-for-LLaMa
placed just in the root of text generation web ui, which caused the problem.
Make sure the hierarchy of directories goes like this: text-generation-webui/repositories/GPTQ-for-LLaMa
, and not like this: text-generation-webui/GPTQ-for-LLaMa
.
Relevant source line is here.
Hope this helps!
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.