text-generation-webui
text-generation-webui copied to clipboard
RuntimeError when loading gpt4-x-alpaca or vicuna | 13b
Describe the bug
Getting the following RunTime error when trying to use one of the following model.
If i run the server with: python server.py --auto-devices --chat and choose the decapoda-research_llama-7b-hf model It works just fine.
I have used the windows installer to install everything. (have tried reinstalling)
It seems to be an issue with only 4bits models that I currently downloaded. Is it because of GPU compatibility issues? not enough vram?
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
run server with command: server.py --auto-devices --chat --wbits 4 --groupsize 128
choose either: gpt4-x-alpaca-13b-native-4bit-128g (cuda | https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g) or vicuna-13b-GPTQ-4bit-128g (https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g)
Screenshot
No response
Logs
Starting the web UI...
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: G:\oobabooga-windows\installer_files\env\bin\cudart64_110.dll
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 117
G:\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\cuda_setup\main.py:141: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary G:\oobabooga-windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117_nocublaslt.dll...
The following models are available:
1. decapoda-research_llama-7b-hf
2. gpt4-x-alpaca-13b-native-4bit-128g
3. vicuna-13b-GPTQ-4bit-128g
Which one do you want to load? 1-3
2
Loading gpt4-x-alpaca-13b-native-4bit-128g...
Loading model ...
Done.
Traceback (most recent call last):
File "G:\oobabooga-windows\text-generation-webui\server.py", line 302, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "G:\oobabooga-windows\text-generation-webui\modules\models.py", line 176, in load_model
tokenizer = LlamaTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/{shared.model_name}/"), clean_up_tokenization_spaces=True)
File "G:\oobabooga-windows\installer_files\env\lib\site-packages\transformers\tokenization_utils_base.py", line 1811, in from_pretrained
return cls._from_pretrained(
File "G:\oobabooga-windows\installer_files\env\lib\site-packages\transformers\tokenization_utils_base.py", line 1965, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "G:\oobabooga-windows\installer_files\env\lib\site-packages\transformers\models\llama\tokenization_llama.py", line 96, in __init__
self.sp_model.Load(vocab_file)
File "G:\oobabooga-windows\installer_files\env\lib\site-packages\sentencepiece\__init__.py", line 905, in Load
return self.LoadFromFile(model_file)
File "G:\oobabooga-windows\installer_files\env\lib\site-packages\sentencepiece\__init__.py", line 310, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: D:\a\sentencepiece\sentencepiece\src\sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
Press any key to continue . . .
System Info
GPU: GTX 1080 8gb
CPU: i7 6850k
RAM: 48GB
OS: Windows 10
I get the exact same error
I get the exact same error
what are your system specs?
I'm using an AMD GPU (RX 5500 XT) using Rocm and Triton on Linux ~~, but I might have a different problem? I can load the default models, but trying to generate gives me a segmentation fault error :thinking:~~ ignore that driver issue, fixed it
I'm using an AMD GPU (RX 5500 XT) using Rocm and Triton on Linux, but I might have a different problem? I can load the default models, but trying to generate gives me a segmentation fault error 🤔
I just tried out a non quantized model for gpt4-x-alpaca and it worked fine... https://huggingface.co/chavinlo/gpt4-x-alpaca
I am sure the issue has to do with the quantization either not being supported as mine does give out the error: "Only slow 8-bit matmul is supported for your GPU!" not sure if it is related to that
I've got the same error and I've fixed it. Or I have successfully launched the webui and I can chat. But...I frankly I still don't know what went wrong. To clear things up, This oobabooga webui was designed to be running in linux, not windows. They got a hack that can run natively in windows though. All of our problems are from running from windows.
Why we saw this error? I guessed. could be wrong, but here is my guess: We shouldn't follow the instructions on the readme.md or any instruciton in gptq folder like this one: pip install -r requirements.txt Those are for linux users or who installed WSL in windows. WSL is a subsystem for windows to run linux commands. Which requires hyper-v, which mess with my vmware installation, and I can't install WSL...
Here are the steps I took (and I wrote as a memo for myself)
-
Install conda and create an env, activate that env
-
download this windows 1-click installer repo, git clone xxx
https://github.com/oobabooga/one-click-installers
-
go there and execute the install.bat in command prompt. it will take some time (5-10 minutes). and it will install a windows hack version of gptq, which is officially not supported on windows. bitsandbytes was also not supported on windows but they got a hack version installed too.
-
After installation, you should see there is a textwebgui folder. Go there and execute the following command to download vicuna quant 4 bits model using the downloader provided.
cd text-generation-webui python download-model.py --text-only anon8231489123/vicuna-13b-GPTQ-4bit-128g
parameter --text-only tells the downloader to download small text configs, not the model. Download the model
https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g/resolve/main/vicuna-13b-4bit-128g.safetensors
Move the downloaded model file to oobabooga-windows\text-generation-webui\models\anon8231489123_vicuna-13b-GPTQ-4bit-128g folder and done.
-
Run the model
python server.py --model anon8231489123_vicuna-13b-GPTQ-4bit-128g --model_type llama --chat --wbits 4 --groupsize 128
I'm using an AMD GPU (RX 5500 XT) using Rocm and Triton on Linux, but I might have a different problem? I can load the default models, but trying to generate gives me a segmentation fault error thinking
I just tried out a non quantized model for gpt4-x-alpaca and it worked fine... https://huggingface.co/chavinlo/gpt4-x-alpaca
I am sure the issue has to do with the quantization either not being supported as mine does give out the error: "Only slow 8-bit matmul is supported for your GPU!" not sure if it is related to that
try and remove --groupsize 128 from command. could you load 4bit 13B on 8GB VRAM without --pre_layers offload?
I've got the same error and I've fixed it. Or I have successfully launched the webui and I can chat. But...I frankly I still don't know what went wrong. To clear things up, This oobabooga webui was designed to be running in linux, not windows. They got a hack that can run natively in windows though. All of our problems are from running from windows.
Why we saw this error? I guessed. could be wrong, but here is my guess: We shouldn't follow the instructions on the readme.md or any instruciton in gptq folder like this one: pip install -r requirements.txt Those are for linux users or who installed WSL in windows. WSL is a subsystem for windows to run linux commands. Which requires hyper-v, which mess with my vmware installation, and I can't install WSL...
Here are the steps I took (and I wrote as a memo for myself)
0. Install conda and create an env, activate that env 1. download this windows 1-click installer repo, git clone xxx https://github.com/oobabooga/one-click-installers 2. go there and execute the install.bat in command prompt. it will take some time (5-10 minutes). and it will install a windows hack version of gptq, which is officially not supported on windows. bitsandbytes was also not supported on windows but they got a hack version installed too. 3. After installation, you should see there is a textwebgui folder. Go there and execute the following command to download vicuna quant 4 bits model using the downloader provided. cd text-generation-webui python download-model.py --text-only anon8231489123/vicuna-13b-GPTQ-4bit-128g
parameter --text-only tells the downloader to download small text configs, not the model. Download the model
https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g/resolve/main/vicuna-13b-4bit-128g.safetensors
Move the downloaded model file to oobabooga-windows\text-generation-webui\models\anon8231489123_vicuna-13b-GPTQ-4bit-128g folder and done.
5. Run the model python server.py --model anon8231489123_vicuna-13b-GPTQ-4bit-128g --model_type llama --chat --wbits 4 --groupsize 128
Getting this error too, but I'm on linux, normal models work completely fine.
@jan-tennert
I haven't tested it on linux although I might do a linux box soon... There's a youtube video teaching how to install it on a linux box.
https://www.youtube.com/watch?v=F_pFH-AngoE
The following are all from my guessing, By guessing...I would say that if you'd followed the installation instrucitons to the letter, then, maybe your problem could be fixed by
a) try to make sure the python version and the python environment are the one specified. And all the requiremetns are installed correctly. create a new environment and pip install from scratch. b) try to install this on a virtual pc that you can rent (with less vram and with sufficent vram to compare), and, to see if you can install and run. if you do, maybe consider fixing your linux box first. reinstall conda, reinstall nvidia driver...etc
A lot of problems are from python and it's "abysmal" package management. Personally I've been trying to reinstall everything from scratch hundreds of times. There are version conflicts, nvidia driver problems, cuda version and python wheels compatibility problems...And python is old, google search returns a lot of outdated information, try to filter those with a time window.
I hope it helps
I get this error, and I'm running on Linux
I've been wrestling with this problem for the last couple days and finally managed to get it to work on Linux. Check out my guide here
check if you have the full tokenizer.model file (about 500kb)
check if you have the full tokenizer.model file (about 500kb)
fixed my issue -- anyone coming here make sure you pay attention to other files with the lfs pointer other than just the model when using git clone
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.