text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

Message token length issue with llama-4bit (7B)

Open 1aienthusiast opened this issue 1 year ago • 2 comments

Describe the bug

Almost every message is marked as 200 tokens regardless if it's 1 word or multiple words/sentences.

Is there an existing issue for this?

  • [X] I have searched the existing issues

Reproduction

conda activate textgen python3.10 server.py --gptq-bits 4 --model llama-7b-hf --cai-chat --no-stream

Screenshot

Screenshot_2023-03-25_15-17-27 Screenshot_2023-03-25_15-16-03

Logs

Loading llama-7b-hf...
Found models/llama-7b-4bit.pt
Loading model ...
Done.
Loaded the model in 2.42 seconds.
Loading the extension "gallery"... Ok.
/home/username/.local/lib/python3.10/site-packages/gradio/deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
  warnings.warn(value)
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Output generated in 5.06 seconds (39.52 tokens/s, 200 tokens)
Output generated in 4.73 seconds (42.28 tokens/s, 200 tokens)
Output generated in 5.79 seconds (34.55 tokens/s, 200 tokens)

System Info

operating system: Linux
GPU brand: Nvidia
GPU model: GeForce RTX 3060 12GB

1aienthusiast avatar Mar 25 '23 14:03 1aienthusiast

Thank you, I missed that. This should have solved it https://github.com/oobabooga/text-generation-webui/commit/8c8e8b44508972a37fd15d760f9e4214e5105306

oobabooga avatar Mar 25 '23 15:03 oobabooga

Thank you, I missed that. This should have solved it 8c8e8b4

doesn't seem like its fixed image image

1aienthusiast avatar Mar 25 '23 17:03 1aienthusiast

I noticed the same behavior with today's release (commit 49c10c5), which seems to be model-dependent: I get a huge speed increase and correct token sizes only when using the ozcur/alpaca-native-4bit model from Hugging Face. With llama-7b-4bit (without group size) and llama-7b-4bit-128g (with group size 128) from the Torrents, it still tends to reach the token limit even with short answers and takes longer to generate the output.

StefanDanielSchwarz avatar Mar 26 '23 20:03 StefanDanielSchwarz

I noticed the same behavior with today's release (commit 49c10c5), which seems to be model-dependent: I get a huge speed increase and correct token sizes only when using the ozcur/alpaca-native-4bit model from Hugging Face. With llama-7b-4bit (without group size) and llama-7b-4bit-128g (with group size 128) from the Torrents, it still tends to reach the token limit even with short answers and takes longer to generate the output.

I have this error too, but i'm noticing that is present only when i use --no_stream arg. have you tried without it?

mvenezia00 avatar Apr 02 '23 01:04 mvenezia00

This is an upstream issue in the transformers library

https://github.com/huggingface/transformers/issues/22436

oobabooga avatar Apr 02 '23 04:04 oobabooga

I noticed the same behavior with today's release (commit 49c10c5), which seems to be model-dependent: I get a huge speed increase and correct token sizes only when using the ozcur/alpaca-native-4bit model from Hugging Face. With llama-7b-4bit (without group size) and llama-7b-4bit-128g (with group size 128) from the Torrents, it still tends to reach the token limit even with short answers and takes longer to generate the output.

I have this error too, but i'm noticing that is present only when i use --no_stream arg. have you tried without it?

can confirm that it only happens with the --no-stream arg for me aswell

1aienthusiast avatar Apr 03 '23 19:04 1aienthusiast

When using openai extension, look up completions.py and fix hardcoded value if needed. generate_params['max_new_tokens'] = 200

Probably better to create separate issue to get value from settings.yaml

donjaron777 avatar Nov 24 '23 08:11 donjaron777

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

github-actions[bot] avatar Jan 05 '24 23:01 github-actions[bot]