Improvements to the ChatGPT API to more closely match the official OpenAI endpoint and improve compatibility
Hey, as I mentioned on Discord, we are experimenting with a Mac mini Exo deploy at work and noticed that the ChatGPT API offered by Exo behaved differently than the official OpenAI endpoint in a way which made working with 3rd party chat clients difficult.
To make the Exo API behave more similarly to OpenAI's, we have done the following:
- Remove the EOS token from the output.
- This was the primary issue as 3rd party chat clients did not expect the EOS token (
<|eot_id|>for LLAMA 3.2) to be included in the response, so they were not filtering them. - It did not manifest in the built-in Tinychat UI as the EOS token was only emitted with
finish_reason="stop", in which case Tinychat ignored the content of the delta.
- This was the primary issue as 3rd party chat clients did not expect the EOS token (
- When streaming, the API now emits a
data: [DONE]event to indicate completion before terminating the stream. - When streaming, only include the message under the
deltakey rather than both that and themessage, mirroring the OpenAI type. - If the streamed content is empty (as will occur for the finishing chunk now that the EOS token is stripped), set
delta: {}.
Testing
To test this you can use the llm command line tool configured to talk to exo by editing the following file "~/Library/Application\ Support/io.datasette.llm/extra-openai-models.yaml" on macOS to contain the following,
- model_id: llama-3.2-1b
model_name: llama-3.2-1b
api_base: "http://localhost:52415/v1"
Old
❯ llm chat -m llama-3.2-1b
Chatting with llama-3.2-1b
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hello
Hello! How can I assist you today?<|eot_id|>
New
❯ llm chat -m llama-3.2-1b
Chatting with llama-3.2-1b
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hello
Hello! How can I assist you today?
Note the lack of "<|eot_id|>" in the output.
Awesome work. Will merge if tests pass/
please fix bench.py to support this. suggested fix:
I will resolve this tomorrow when I am at my desk. Thanks!
I've made the changes you suggested to bench.py and removed some additional changes that slipped in that I am still working on for another PR.
Closing in favour of the combined PR #734