text-generation-webui
text-generation-webui copied to clipboard
answer problem with VICUNA 13b 4bit model
Describe the bug
Hello everyone, have you ever had this problem with the use of the VICUNA 13B 4bit model ? (see screenshot) All is ok with others models.
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
installation on windows with the automatic method.
Screenshot
Logs
-
System Info
windows 10 - nvidia M40-24G - Cuda 11.7
I have that issue on gpt4-x-alcapa 4bits with a cloud computer with a Quadro P5000. But it works fine on my rtx3080. I just copy/pasted the same conda directory. Works fine with 8 bits models but not 4 bits.
I have had this issue when using setting num_beams to anything other than 1
Same issue on gpt4-x-alpaca 4 bits with an AMD 5700XT.
Did you all update your tokenizers to the new transformers? I can also make this happen with really high temperatures.
I've installed all the required packages and I've left the default settings.
The issue seems to be the related to this #931
Do you happen to have more than one 4bit model?
If so then revert 8c6155251ae9852bbae1fd4df40934988c86a0b1 and report back
I have two 4 bit models but the commit didn't fix the gibberish responses
This output looks like you might be trying to run a GPTQ model quanted with --true-sequential and --act-order on Windows CUDA. Sadly, there's a pretty unlabeled split happening with new model quants and the options used to build them.
If the 13B version you're using is TheBloke's Vicuna model, check the gozfarb version of the same model and if you still get bad outputs, it's not the model but I think it might be.
got the same thing, it is cause by the model that uses a different GPTQ-for-LLaMa. Fix : from the text-generation-webui/repositories folder, run the following (assuming you are on windows, using a GPU and did not setup WSL) :
note : last line must be run from the conda env
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda cd GPTQ-for-LLaMa python setup_cuda.py install
I thought act order + true sequential work on windows cuda without group size.
nvidia M40-24G
how much vram is needed for 8 bit and is it running the full model or just the 4 bit version?
gibberish and cuda missing error, can be fixed with these instructions: (for windows, nvidia) install the newest ogaabogaa 1-click-installer then do this:
- open cmd_windows.bat
- pip uninstall quant-cuda
- cd text-generation-webui\repositories
- rm -f -d -r GPTQ-for-LLaMa
- git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda
- cd GPTQ-for-LLaMa
- python setup_cuda.py install
- close the cmd and run the start_windows.bat like normal
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.