text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

answer problem with VICUNA 13b 4bit model

Open Florian0077 opened this issue 1 year ago • 13 comments

Describe the bug

Hello everyone, have you ever had this problem with the use of the VICUNA 13B 4bit model ? (see screenshot) All is ok with others models.

Is there an existing issue for this?

  • [X] I have searched the existing issues

Reproduction

installation on windows with the automatic method.

Screenshot

image

Logs

-

System Info

windows 10 - nvidia M40-24G - Cuda 11.7

Florian0077 avatar Apr 11 '23 11:04 Florian0077

I have that issue on gpt4-x-alcapa 4bits with a cloud computer with a Quadro P5000. But it works fine on my rtx3080. I just copy/pasted the same conda directory. Works fine with 8 bits models but not 4 bits.

qbyss avatar Apr 11 '23 14:04 qbyss

I have had this issue when using setting num_beams to anything other than 1

p1nkl0bst3r avatar Apr 11 '23 15:04 p1nkl0bst3r

Same issue on gpt4-x-alpaca 4 bits with an AMD 5700XT.

daniandtheweb avatar Apr 11 '23 16:04 daniandtheweb

Did you all update your tokenizers to the new transformers? I can also make this happen with really high temperatures.

Ph0rk0z avatar Apr 11 '23 17:04 Ph0rk0z

I've installed all the required packages and I've left the default settings.

daniandtheweb avatar Apr 11 '23 17:04 daniandtheweb

The issue seems to be the related to this #931

daniandtheweb avatar Apr 11 '23 18:04 daniandtheweb

Do you happen to have more than one 4bit model?

If so then revert 8c6155251ae9852bbae1fd4df40934988c86a0b1 and report back

USBhost avatar Apr 11 '23 19:04 USBhost

I have two 4 bit models but the commit didn't fix the gibberish responses

daniandtheweb avatar Apr 11 '23 19:04 daniandtheweb

This output looks like you might be trying to run a GPTQ model quanted with --true-sequential and --act-order on Windows CUDA. Sadly, there's a pretty unlabeled split happening with new model quants and the options used to build them.

If the 13B version you're using is TheBloke's Vicuna model, check the gozfarb version of the same model and if you still get bad outputs, it's not the model but I think it might be.

digiwombat avatar Apr 12 '23 01:04 digiwombat

got the same thing, it is cause by the model that uses a different GPTQ-for-LLaMa. Fix : from the text-generation-webui/repositories folder, run the following (assuming you are on windows, using a GPU and did not setup WSL) :

note : last line must be run from the conda env

git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda cd GPTQ-for-LLaMa python setup_cuda.py install

franklin050187 avatar Apr 12 '23 08:04 franklin050187

I thought act order + true sequential work on windows cuda without group size.

Ph0rk0z avatar Apr 12 '23 14:04 Ph0rk0z

nvidia M40-24G

how much vram is needed for 8 bit and is it running the full model or just the 4 bit version?

Tom-Neverwinter avatar Apr 26 '23 06:04 Tom-Neverwinter

gibberish and cuda missing error, can be fixed with these instructions: (for windows, nvidia) install the newest ogaabogaa 1-click-installer then do this:

  1. open cmd_windows.bat
  2. pip uninstall quant-cuda
  3. cd text-generation-webui\repositories
  4. rm -f -d -r GPTQ-for-LLaMa
  5. git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda
  6. cd GPTQ-for-LLaMa
  7. python setup_cuda.py install
  8. close the cmd and run the start_windows.bat like normal

madwurmz avatar May 01 '23 05:05 madwurmz

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

github-actions[bot] avatar Sep 28 '23 23:09 github-actions[bot]