text-generation-webui Using API yields completely different answers than with the WebUI

Describe the bug

Im using the script python api-example-stream.py and the text generated is always very weird. This doesn't happen at all with the WebUI. For example:

~/Documents/text-generation-webui$ python api-example-stream.py
Can you tell me an english proverb? 

Please write in English language.
### Assistant: Sure, here's an English proverb for you: "Actions speak louder than words." This means that a person's actions are more important than their words or promises because it is through their actions that they truly show their intentions and character. Do you have any other questions or requests?

����  
����다. 여기에는 ��다 ��이 ��강조�� 영어 전문장��다. 이�� 사�� ��이 그들의 �� ��� ��중요하다는 �� 의미합니다.

While the chat looks like this:

I launch my server using "python server.py --model anon8231489123_vicuna-13b-GPTQ-4bit-128g --auto-devices --chat --api"

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

One console:

python server.py --model anon8231489123_vicuna-13b-GPTQ-4bit-128g --auto-devices --chat --api

Another console:

python api-example-stream.py

Uri used for API: URI = f'ws://127.0.0.1:5005/api/v1/stream'

Logs

python server.py --model anon8231489123_vicuna-13b-GPTQ-4bit-128g --auto-devices --chat --api
Gradio HTTP request redirected to localhost :)
bin /home/bruno/anaconda3/envs/vicuna-matata/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
Loading anon8231489123_vicuna-13b-GPTQ-4bit-128g...
Found the following quantized model: models/anon8231489123_vicuna-13b-GPTQ-4bit-128g/vicuna-13b-4bit-128g.safetensors
Loading model ...
Done.
Loaded the model in 17.69 seconds.
Starting streaming server at ws://127.0.0.1:5005/api/v1/stream
Loading the extension "gallery"... Starting API at http://127.0.0.1:5000/api
Ok.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

System Info

NAME="Pop!_OS"
VERSION="22.04 LTS"
ID=pop
ID_LIKE="ubuntu debian"
PRETTY_NAME="Pop!_OS 22.04 LTS"
VERSION_ID="22.04"
HOME_URL="https://pop.system76.com"
SUPPORT_URL="https://support.system76.com"
BUG_REPORT_URL="https://github.com/pop-os/pop/issues"
PRIVACY_POLICY_URL="https://system76.com/privacy"
VERSION_CODENAME=jammy
UBUNTU_CODENAME=jammy
LOGO=distributor-logo-pop-os

RTX 3090

May 01 '23 23:05 BrunoKreiner

I think the issue is simply that the example does not use instruct format specific to Vicuna and you have possibly different sampling parameters and the stopping criteria is not correctly set for the model so it generates garbage text at the end. Your settings in the webui do not really matter because you are setting them in the api call and the instruct formatting is not applied for you like in the webui.

May 02 '23 01:05 LaaZa

I think the issue is simply that the example does not use instruct format specific to Vicuna and you have possibly different sampling parameters and the stopping criteria is not correctly set for the model so it generates garbage text at the end. Your settings in the webui do not really matter because you are setting them in the api call and the instruct formatting is not applied for you like in the webui.

I got better results when putting ### Instruction: in front of my prompt and copying the parameters i see in the browser in the parameters tab straight into the configuration payload. It still produces garbage and keeps on repeating stuff at the end and hallucinating "User:" words. So why is the web chat so clean? It does hallucinate sometimes but 90% of the times i tried it, its really good. Is there any parameters to be set that you cant see in the parameters tab on the browser? Do i have to format my prompt to match some kind of guideline?

May 02 '23 01:05 BrunoKreiner

The prompt should match the instruction template when you advance in question rounds. End the prompt with the assistants turn like ### Assistant: so it knows to answer as itself because these LMs complete text and the completion for that is something the assistant says. Keep updating the prompt with the conversation history but remember that it will get cut off at max context from the top. You should keep the initial prompt there always so the bot does not forget the instructions for the chat, like in what way it should answer and all that. In webui the context box in character is always kept in the context.

May 02 '23 04:05 LaaZa

Why don't I see the message "Starting streaming server at ws://127.0.0.1:5005/api/v1/stream" in my cmd prompt? I would like to ask how to activate the stream API mode.

May 06 '23 08:05 yanchunchun

The prompt should match the instruction template when you advance in question rounds. End the prompt with the assistants turn like ### Assistant: so it knows to answer as itself because these LMs complete text and the completion for that is something the assistant says. Keep updating the prompt with the conversation history but remember that it will get cut off at max context from the top. You should keep the initial prompt there always so the bot does not forget the instructions for the chat, like in what way it should answer and all that. In webui the context box in character is always kept in the context.

thanks, when launching the server without the "--chat" option, i see the templates automatically and can finally copy this for the api.

May 06 '23 09:05 BrunoKreiner

Why don't I see the message "Starting streaming server at ws://127.0.0.1:5005/api/v1/stream" in my cmd prompt? I would like to ask how to activate the stream API mode.

I just used "--api" and it popped up

May 06 '23 09:05 BrunoKreiner

Why don't I see the message "Starting streaming server at ws://127.0.0.1:5005/api/v1/stream" in my cmd prompt? I would like to ask how to activate the stream API mode.

I just used "--api" and it popped up

why the tag haven't "--api"? i use the wrong vision?

May 08 '23 15:05 yanchunchun

text-generation-webui text-generation-webui copied to clipboard

Using API yields completely different answers than with the WebUI

Describe the bug

Is there an existing issue for this?

Reproduction

Logs

System Info

text-generation-webui
text-generation-webui copied to clipboard