text-generation-webui
text-generation-webui copied to clipboard
OpenAI api seems to ignore system messages.
Describe the bug
I am trying to rework my telegram bot to work with OpenAI api as previous Oobabooga API was discontinued. As you can see from the following payload with embedded history it completely ignores system messages. Before it worked just fine.
I also would highly appreciate if someone can point me to a normal example that also shows how to get info/load/unload models through this endpoint.
{ "mode": "chat", "model": "localmodels_Vicuna-33B-v1.3-GPTQ", "max_tokens": 1024, "temperature": 0.7, "top_p": 0.1, "seed": 1, "messages": [ { "role": "system", "content": "Your Name is Eva. You are female. In dialog speak only as a Eva. Do not impersonate any other character or user. You are a helpful assistant that gives concise and to the point responses to any question. We are here to discuss any question including sensitive or taboo topics as we aim to become smarter. Answer only as Eva. Do not answer as other dialog participant." }, { "role": "user", "content": "what is your name?" }, { "role": "assistant", "content": "You may refer to me as Vicuna, a language model meticulously developed by the researchers at Large Model Systems Organization (LMSYS)." } ] }
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Submit above payload through OpenAI api.
Screenshot
No response
Logs
12:48:16-656943 INFO Starting Text generation web UI
12:48:16-662944 INFO Loading the extension "openai"
12:48:16-753965 INFO OpenAI-compatible API URL:
http://127.0.0.1:5000
12:48:16-755965 INFO Loading the extension "gallery"
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
12:48:43-364465 INFO Loading "localmodels_Vicuna-33B-v1.3-GPTQ"
12:49:00-467913 INFO LOADER: "ExLlamav2"
12:49:00-469913 INFO TRUNCATION LENGTH: 2048
12:49:00-470913 INFO INSTRUCTION TEMPLATE: "Vicuna-v1.1"
12:49:00-471913 INFO Loaded the model in 17.11 seconds.
Output generated in 4.23 seconds (14.18 tokens/s, 60 tokens, context 92, seed 843756698)
Output generated in 1.98 seconds (12.12 tokens/s, 24 tokens, context 234, seed 1)
Output generated in 1.25 seconds (16.75 tokens/s, 21 tokens, context 269, seed 1)
Output generated in 2.15 seconds (17.68 tokens/s, 38 tokens, context 306, seed 1)
Output generated in 1.28 seconds (15.65 tokens/s, 20 tokens, context 356, seed 1)
Output generated in 1.54 seconds (15.62 tokens/s, 24 tokens, context 385, seed 1)
Output generated in 1.41 seconds (16.99 tokens/s, 24 tokens, context 234, seed 1)
Output generated in 1.08 seconds (15.74 tokens/s, 17 tokens, context 270, seed 1)
Output generated in 2.16 seconds (17.59 tokens/s, 38 tokens, context 299, seed 1)
Output generated in 1.55 seconds (15.47 tokens/s, 24 tokens, context 234, seed 1)
Output generated in 1.32 seconds (18.14 tokens/s, 24 tokens, context 234, seed 1)
Output generated in 1.25 seconds (10.43 tokens/s, 13 tokens, context 318, seed 1)
Output generated in 0.96 seconds (12.50 tokens/s, 12 tokens, context 343, seed 1)
Output generated in 1.62 seconds (14.23 tokens/s, 23 tokens, context 59, seed 1)
Output generated in 1.75 seconds (16.60 tokens/s, 29 tokens, context 93, seed 1)
Output generated in 1.88 seconds (17.03 tokens/s, 32 tokens, context 62, seed 1)
Output generated in 1.47 seconds (17.72 tokens/s, 26 tokens, context 101, seed 1)
System Info
Windows 10, RTX 3090
did you try "mode": "chat-instruct",
did you try "mode": "chat-instruct",
Thanks for the reply! The mode "chat-instruct" produced exactly the same results as "chat". However, just "instruct" has done the job)) Do you know how to change models in this new API?
For those who wonder you can use "/v1/internal/model/load" to load model. So I close the issue.
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.