trafficstars

Describe the bug

Hello everyone, have you ever had this problem with the use of the VICUNA 13B 4bit model ? (see screenshot) All is ok with others models.

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

installation on windows with the automatic method.

Screenshot

Logs

System Info

windows 10 - nvidia M40-24G - Cuda 11.7

Apr 11 '23 11:04 Florian0077

I have that issue on gpt4-x-alcapa 4bits with a cloud computer with a Quadro P5000. But it works fine on my rtx3080. I just copy/pasted the same conda directory. Works fine with 8 bits models but not 4 bits.

Apr 11 '23 14:04 qbyss

I have had this issue when using setting num_beams to anything other than 1

Apr 11 '23 15:04 p1nkl0bst3r

Same issue on gpt4-x-alpaca 4 bits with an AMD 5700XT.

Apr 11 '23 16:04 daniandtheweb

Did you all update your tokenizers to the new transformers? I can also make this happen with really high temperatures.

Apr 11 '23 17:04 Ph0rk0z

I've installed all the required packages and I've left the default settings.

Apr 11 '23 17:04 daniandtheweb

The issue seems to be the related to this #931

Apr 11 '23 18:04 daniandtheweb

Do you happen to have more than one 4bit model?

If so then revert 8c6155251ae9852bbae1fd4df40934988c86a0b1 and report back

Apr 11 '23 19:04 USBhost

I have two 4 bit models but the commit didn't fix the gibberish responses

Apr 11 '23 19:04 daniandtheweb

This output looks like you might be trying to run a GPTQ model quanted with --true-sequential and --act-order on Windows CUDA. Sadly, there's a pretty unlabeled split happening with new model quants and the options used to build them.

If the 13B version you're using is TheBloke's Vicuna model, check the gozfarb version of the same model and if you still get bad outputs, it's not the model but I think it might be.

Apr 12 '23 01:04 digiwombat

got the same thing, it is cause by the model that uses a different GPTQ-for-LLaMa. Fix : from the text-generation-webui/repositories folder, run the following (assuming you are on windows, using a GPU and did not setup WSL) :

note : last line must be run from the conda env

git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda cd GPTQ-for-LLaMa python setup_cuda.py install

Apr 12 '23 08:04 franklin050187

I thought act order + true sequential work on windows cuda without group size.

Apr 12 '23 14:04 Ph0rk0z

nvidia M40-24G

how much vram is needed for 8 bit and is it running the full model or just the 4 bit version?

Apr 26 '23 06:04 Tom-Neverwinter

gibberish and cuda missing error, can be fixed with these instructions: (for windows, nvidia) install the newest ogaabogaa 1-click-installer then do this:

open cmd_windows.bat
pip uninstall quant-cuda
cd text-generation-webui\repositories
rm -f -d -r GPTQ-for-LLaMa
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda
cd GPTQ-for-LLaMa
python setup_cuda.py install
close the cmd and run the start_windows.bat like normal

May 01 '23 05:05 madwurmz

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sep 28 '23 23:09 github-actions[bot]

text-generation-webui
text-generation-webui copied to clipboard

answer problem with VICUNA 13b 4bit model

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

note : last line must be run from the conda env

text-generation-webui text-generation-webui copied to clipboard

answer problem with VICUNA 13b 4bit model

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

note : last line must be run from the conda env

text-generation-webui
text-generation-webui copied to clipboard