text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

initial, kinda working openai compatible (ish) api

Open matatonic opened this issue 2 years ago • 23 comments

This is a first cut of a working api, it supports text completion, with/without streaming and listing models. chat completions are partly working, but stopping strings are not working yet (I'm not sure why).

I've tested compatibility using the python-openai client, and it seems to work fully for Completion.create() and models.list(). I've outlined the current working/not-working bits in the README.md.

What do you think? Is this something useful?

matatonic avatar Apr 22 '23 06:04 matatonic

I think this is very usefull, happy this goes on!

GitHub1712 avatar Apr 22 '23 07:04 GitHub1712

Yeah this looks awesome! I check this out today!

CyberTimon avatar Apr 22 '23 08:04 CyberTimon

Thank you very very much for this api! It works flawless! I hope this get's merged soon.

CyberTimon avatar Apr 22 '23 09:04 CyberTimon

@matatonic this is definitely useful as I assume this allows the web UI to work with many existing clients for the OpenAI API.

A new API for the web UI has just been merged: https://github.com/oobabooga/text-generation-webui/pull/990

Can you check if any of the code in it is relevant for this new extension? If not let me know and I'll merge this.

oobabooga avatar Apr 23 '23 19:04 oobabooga

I reviewed the #990 but I don't think it applied to this, the websocket streaming code is very different. I did manage to fix up the stopping strings though, so that's much better now. I think it's good to merge now.

matatonic avatar Apr 23 '23 22:04 matatonic

I have integrated home assistant's conversation component with your new openAI compatible API and I ran into some issues with the stopping strings, I will provide more info tomorrow when I'll have more time to debug the issue.

The expected result should be just the answer not the whole prompt with system data, https://github.com/keldenl/gpt-llama.cpp works as intended

drndos avatar Apr 23 '23 23:04 drndos

@drndos I looked over their code and I changed a couple parts (added some extra stopping strings standard, I think I fixed why it's always replying as system too, they prompt with assistant: also, so I do all that now too). Results from the chat endpoint look much better.

matatonic avatar Apr 24 '23 03:04 matatonic

@oobabooga I think this is a good start, it seems to work well enough for a number of use cases.

matatonic avatar Apr 25 '23 15:04 matatonic

@oobabooga I'm not sure how you feel about this extension, but just as a heads up gpt4free ( a similar api to openai ) just got a takedown notice from openai.com. If I understand the project well enough (and I probably don't) it seem it somehow lets you access the real ChatGPT without paying, so maybe it's a completely different problem.

matatonic avatar Apr 28 '23 16:04 matatonic

Updated to include the embeddings API. I used SentenceTransformer with the all-MiniLM-L6-v2 model, seems to work well. @oobabooga

matatonic avatar Apr 29 '23 19:04 matatonic

gpt4free

It looks like the gpt4free project is explicitly hijacking private API endpoints of third-party services. Definitely not a concern in regards to this extension as the the users will be owning API keys to OpenAI services.

h3xcat avatar Apr 30 '23 01:04 h3xcat

Thank you very much for your contribution. I have tested the interface and found that it performs almost identically to OpenAI's API. However, I'm not sure why when I tried it with a web project, it seemed to not be supported. I checked the log files and confirmed that the project successfully made the API request, but the response indicated a timeout. Please refer to the following link for more information.

https://github.com/Chanzhaoyu/chatgpt-web

The web interface experience mentioned above is excellent. It would be a great improvement if it could be compatible with the plugins here. I have reviewed the code and it appears to be referencing a package from npm, as shown below. https://www.npmjs.com/package/chatgpt

B1gM8c avatar May 01 '23 08:05 B1gM8c

So cool. This just works with auto-gpt out of the box: Run matatonic/text-generation-webui server with --openai Run Gdev91/Auto-GPT with OPENAI_API_BASE_URL=http://127.0.0.1:5001/ in .env I used EMBED_DIM=5120 in .env with eachadea_vicuna-13b-1.1 so far. So we are fully under control of a local model chatgroup now, cheers ;)

GitHub1712 avatar May 01 '23 14:05 GitHub1712

Thank you very much for your contribution. I have tested the interface and found that it performs almost identically to OpenAI's API. However, I'm not sure why when I tried it with a web project, it seemed to not be supported. I checked the log files and confirmed that the project successfully made the API request, but the response indicated a timeout. Please refer to the following link for more information.

https://github.com/Chanzhaoyu/chatgpt-web

The web interface experience mentioned above is excellent. It would be a great improvement if it could be compatible with the plugins here. I have reviewed the code and it appears to be referencing a package from npm, as shown below. https://www.npmjs.com/package/chatgpt

@B1gM8c Firstly, thanks a lot for your feedback, there are so many projects and it's too much for me to test with all of them! Node.js projects seem to be more difficult to redirect. The openai node package doesn't use the OPENAI_API_BASE environment variable by default (yet), so some code changes may be required to redirect the API URL to text-gen. Check the README.md for the openai extension for more info on what I've found with Node.js projects. Meanwhile, I will check out both those links and see what needs to be done.

Thanks!

matatonic avatar May 01 '23 17:05 matatonic

So cool. This just works with auto-gpt out of the box: Run matatonic/text-generation-webui server with --openai Run Gdev91/Auto-GPT with OPENAI_API_BASE_URL=http://127.0.0.1:5001/ in .env I used EMBED_DIM=5120 in .env with eachadea_vicuna-13b-1.1 so far. So we are fully under control of a local model chatgroup now, cheers ;)

How have you done this? I did this but I still get this error every time: openai.error.AuthenticationError: Incorrect API key provided: dummylol. You can find your API key at https://platform.openai.com/account/api-keys. Press any key to continue...

CyberTimon avatar May 01 '23 17:05 CyberTimon

This is the .env: ```################################################################################

LLM PROVIDER

################################################################################

OPENAI

OPENAI_API_KEY - OpenAI API Key (Example: my-openai-api-key)

TEMPERATURE - Sets temperature in OpenAI (Default: 0)

USE_AZURE - Use Azure OpenAI or not (Default: False)

OPENAI_API_BASE_URL=http://127.0.0.1:5001/ OPENAI_API_KEY=dummylol EMBED_DIM=5120```

CyberTimon avatar May 01 '23 17:05 CyberTimon

This is the .env: ```################################################################################

LLM PROVIDER

################################################################################

OPENAI

OPENAI_API_KEY - OpenAI API Key (Example: my-openai-api-key)

TEMPERATURE - Sets temperature in OpenAI (Default: 0)

USE_AZURE - Use Azure OpenAI or not (Default: False)

OPENAI_API_BASE_URL=http://127.0.0.1:5001/ OPENAI_API_KEY=dummylol EMBED_DIM=5120```

try OPENAI_API_BASE=http://127.0.0.1:5001/v1 not OPENAI_API_BASE_URL

matatonic avatar May 01 '23 18:05 matatonic

Thank you it fixed the issue but there's a new issue now: ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1536 and the array at index 1 has size 768 Press any key to continue...

I think this is because of the embeddings model size. I still want to know how @GitHub1712 got this to work.

CyberTimon avatar May 01 '23 19:05 CyberTimon

1536 is the standard sized embedding from OpenAI's text-embedding-ada-002 model, the current default. The embeddings from that model are not compatible with the embeddings from this - ie. you cannot mix embedding types. The only way we could ever generate compatible embeddings is if we could run their exact model (as far as I understand it). If you have old embeddings from other runs of auto-gpt you will probably have to purge them and start over.

matatonic avatar May 01 '23 20:05 matatonic

Do I have to install the whole matatonic webui like @GitHub1712 did or can I install just the matatonic openai extension?

GiusTex avatar May 01 '23 21:05 GiusTex

Seems to work with the matatonic openai/extension only on oo-master. in .env I use EMBED_DIM=5120 with model anon8231489123_gpt4-x-alpaca-13b-native-4bit-128g or eachadea_vicuna-13b-1.1

Actually I try to make the models understand how to use the Autogpt Commands. Training a lora works but ends in those output from Autogpt:

response: {'command': 'browse', 'website': 'https://example.com'}; NEXT ACTION: COMMAND = None ARGUMENTS = None

GitHub1712 avatar May 01 '23 22:05 GitHub1712

So cool. This just works with auto-gpt out of the box: Run matatonic/text-generation-webui server with --openai Run Gdev91/Auto-GPT with OPENAI_API_BASE_URL=http://127.0.0.1:5001/ in .env I used EMBED_DIM=5120 in .env with eachadea_vicuna-13b-1.1 so far. So we are fully under control of a local model chatgroup now, cheers ;)

It's DGdev91 :) I'm glad my fork is useful to someone!

Well, this extension looks really cool, good job @matatonic!

DGdev91 avatar May 01 '23 22:05 DGdev91

I have tried using this project: https://github.com/xtekky/chatgpt-clone However, I found that the number of responses seems to be limited each time. I am not sure if I need to modify the script.py file in the plugin you provided, or if I need to modify the backend.py file in the web project mentioned above.

B1gM8c avatar May 02 '23 10:05 B1gM8c

I have tried using this project: https://github.com/xtekky/chatgpt-clone However, I found that the number of responses seems to be limited each time. I am not sure if I need to modify the script.py file in the plugin you provided, or if I need to modify the backend.py file in the web project mentioned above.

I know where the problem lies. I tried modifying the system_message in backend.py. When I left it empty, the replies were mostly complete. I'm not sure if it's because the system_message is consuming too many tokens, which caused the previous replies to be incomplete.

https://github.com/xtekky/chatgpt-clone/blob/main/server/backend.py#L34

            system_message = f''

It seems that this approach can ensure that the answers are almost identical to what is displayed in the web UI interface.

B1gM8c avatar May 03 '23 01:05 B1gM8c

Since many people are saying that this works, I'll merge. @matatonic please submit a new PR if you can think of further improvements.

oobabooga avatar May 03 '23 01:05 oobabooga

I am trying to use it to replace the langchain openAI. But unsuccessful.

Found the following quantized model: models/TheBloke_stable-vicuna-13B-GPTQ/stable-vicuna-13B-GPTQ-4bit.compat.no-act-order.safetensors
Loading model ...
Done.
Loaded the model in 2.15 seconds.
Starting streaming server at ws://127.0.0.1:5005/api/v1/stream
Loading the extension "gallery"... Ok.
Starting API at http://127.0.0.1:5000/api
Running on local URL:  http://127.0.0.1:7860
import os
os.environ["OPENAI_API_KEY"] = "dummylol"
os.environ["OPENAI_API_BASE"] = "http://127.0.0.1:5005/api/v1/stream/"

from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate, LLMChain
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    AIMessagePromptTemplate,
    HumanMessagePromptTemplate,
)
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)

chat = ChatOpenAI(temperature=0)
chat([HumanMessage(content="Translate this sentence from English to French. I love programming.")])

However, I receive this output.

Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 1.0 seconds as it raised APIError: Invalid response object from API: 'Failed to open a WebSocket connection: did not receive a valid HTTP request[.\n](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a223139322e3136382e312e33222c2275736572223a226361697273227d.vscode-resource.vscode-cdn.net/hdd/code_hdd/chatbot/langchain/n)' (HTTP response code was 400).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 2.0 seconds as it raised APIError: Invalid response object from API: 'Failed to open a WebSocket connection: did not receive a valid HTTP request[.\n](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a223139322e3136382e312e33222c2275736572223a226361697273227d.vscode-resource.vscode-cdn.net/hdd/code_hdd/chatbot/langchain/n)' (HTTP response code was 400).
Retrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised APIError: Invalid response object from API: 'Failed to open a WebSocket connection: did not receive a valid HTTP request[.\n](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a223139322e3136382e312e33222c2275736572223a226361697273227d.vscode-resource.vscode-cdn.net/hdd/code_hdd/chatbot/langchain/n)' (HTTP response code was 400)

Any idea? Thank you!

BenjiKCF avatar May 04 '23 09:05 BenjiKCF

@BenjiKCF - I think you're missing the --extension openai
Once it starts you should see a message about the OpenAI extension starting. After that the OPENAI_API_BASE setting will be like:

os.environ["OPENAI_API_BASE"] = "http://127.0.0.1:5001/v1"

matatonic avatar May 04 '23 23:05 matatonic

@BenjiKCF - I think you're missing the --extension openai Once it starts you should see a message about the OpenAI extension starting. After that the OPENAI_API_BASE setting will be like:

os.environ["OPENAI_API_BASE"] = "http://127.0.0.1:5001/v1"

Thank you. Working beautifully. I thought api and openai are the same extension.

BenjiKCF avatar May 05 '23 00:05 BenjiKCF

I'm trying to make this work in Colab, with auto-gpt running on my pc (no gpu to run models). I'm running with --trust-remote-code --lora samwit_alpaca7B-lora --share --extensions openai --auto-devices --gpu-memory 15 . On port 5001, i got openai.error.APIConnectionError: Error communicating with OpenAI: HTTPSConnectionPool(host='with-yes-ips-child.trycloudflare.com', port=5001): Max retries exceeded with url: /v1/chat/completions (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fb16a191060>: Failed to establish a new connection: [Errno 111] Connection refused')) and on port 443 I got

    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

luisinhobr avatar May 06 '23 08:05 luisinhobr

@luisinhobr I don't use colab nor the cloudflare interface so I'm not sure what network limits they have there... What value for OPENAI_API_BASE are you using (can you also share the startup messages from the extension)?

matatonic avatar May 06 '23 13:05 matatonic