text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

text generation webui llama generating random nonsense

Open floppaselfbot opened this issue 1 year ago • 9 comments

Describe the bug

when running llama 7b 4 bits groupsize 32 on text generation webui, i get completely nonsense responses for example: This is a conversation with your Assistant. The Assistant is very helpful and is eager to chat with you and answer your questions. You: hi Assistant: ekdia →dra defectRT”ÄRTRTRTRTRTRTRTRTRTRTRT You: why are you typing up random stuff Assistant: ädâDOCC Assistant:

Is there an existing issue for this?

  • [X] I have searched the existing issues

Reproduction

run llama 7b in any way on text generation webui and try chatting to it

Screenshot

No response

Logs

(/home/floppa/oobagooba/installer_files/env) floppa@flop-PC:~/oobagooba/text-generation-webui$ python server.py --cai-chat --verbose --cpu-memory 4GB --wbits 4 --groupsize 32 --auto-device --gpu-memory 16 --listen --listen-port 7861 --extensions llama_prompts api long_term_memory --model llama_7b
Gradio HTTP request redirected to localhost :)
Warning: --cai-chat is deprecated. Use --chat instead.

bin /home/floppa/oobagooba/installer_files/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
Loading llama_7b...
Found the following quantized model: models/llama_7b/llama-7b-4bit-32g.safetensors
Loading model ...
Done.
Using the following device map for the quantized model: {'': 0}
Loaded the model in 2.82 seconds.
Loading the extension "api"... Ok.
Loading the extension "gallery"... Starting KoboldAI compatible api at http://0.0.0.0:5000/api
Ok.
Running on local URL:  http://0.0.0.0:7861

To create a public link, set `share=True` in `launch()`.


This is a conversation with your Assistant. The Assistant is very helpful and is eager to chat with you and answer your questions.
You: hi
Assistant:
--------------------

Output generated in 13.18 seconds (15.09 tokens/s, 199 tokens, context 36, seed 1654316909)


This is a conversation with your Assistant. The Assistant is very helpful and is eager to chat with you and answer your questions.
You: hi
Assistant: ekdia →dra defectRT”ÄRTRTRTRTRTRTRTRTRTRTRT
You: why are you typing up random stuff
Assistant:
--------------------

Output generated in 12.85 seconds (15.48 tokens/s, 199 tokens, context 70, seed 441487768)


This is a conversation with your Assistant. The Assistant is very helpful and is eager to chat with you and answer your questions.
You: hi
Assistant: ekdia →dra defectRT”ÄRTRTRTRTRTRTRTRTRTRTRT
You: why are you typing up random stuff
Assistant: ädâDOCCÂ
Assistant:
--------------------

Output generated in 12.94 seconds (15.38 tokens/s, 199 tokens, context 80, seed 69881342)
^CTraceback (most recent call last):
  File "/home/floppa/oobagooba/text-generation-webui/server.py", line 923, in <module>
    time.sleep(0.5)
KeyboardInterrupt

System Info

windows 11 with wsl ubuntu
nvidia rtx 3090 
intel core i9 12900k

floppaselfbot avatar Apr 20 '23 16:04 floppaselfbot

same here

Andyholm avatar Apr 20 '23 18:04 Andyholm

i had as similar problem, make sure you are running the correct model. Like cuda vs triton or something like that. Use cuda, delete the other.

CristianPi avatar Apr 20 '23 22:04 CristianPi

delete what and where?

Andyholm avatar Apr 21 '23 07:04 Andyholm

i had as similar problem, make sure you are running the correct model. Like cuda vs triton or something like that. Use cuda, delete the other.

im running the llama 7b models given in the huggingface link that is on the doc page for using llama on this webui, i tried both of them and got the same result

floppaselfbot avatar Apr 21 '23 22:04 floppaselfbot

same issue so this means this issue isnt 4 days its five days old. https://github.com/oobabooga/text-generation-webui/issues/1554 this may relate to more than just amd/windows builds. however they all have windows in common, all nvidia cards

torch 2.0.0 issue with cuda?

Tom-Neverwinter avatar Apr 26 '23 06:04 Tom-Neverwinter

Same issue with Neko-Institute-of-Science_LLaMA-13B-4bit-128g on Ubuntu and Nvidia. Regardless of the settings the model produces repetitive random noise.

C00reNUT avatar Apr 27 '23 09:04 C00reNUT

https://github.com/jllllll/one-click-installers

made pull request and updated all oobabooga web ui or download and replace files. https://github.com/oobabooga/text-generation-webui/commits/main

solved with updated installer and cuda installed 11.8[tesla m40] and 12.1 [modern cards]

Tom-Neverwinter avatar Apr 30 '23 17:04 Tom-Neverwinter

I spend an entire night on it now. Initially the one-click installer fixed it for me, but now I am lost...

veritaism avatar May 06 '23 18:05 veritaism

I spend an entire night on it now. Initially the one-click installer fixed it for me, but now I am lost...

I have not used webui for two weeks due to this error and moved to llama.cpp which works great. I think the problem of python projects including this repo is that python developers cannot control the quality of the program. It really needs a test based development.

yunghoy avatar May 06 '23 18:05 yunghoy

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

github-actions[bot] avatar Aug 29 '23 23:08 github-actions[bot]