exo icon indicating copy to clipboard operation
exo copied to clipboard

Improvements to the ChatGPT API to more closely match the official OpenAI endpoint and improve compatibility

Open joshuacoles opened this issue 1 year ago • 4 comments

Hey, as I mentioned on Discord, we are experimenting with a Mac mini Exo deploy at work and noticed that the ChatGPT API offered by Exo behaved differently than the official OpenAI endpoint in a way which made working with 3rd party chat clients difficult.

To make the Exo API behave more similarly to OpenAI's, we have done the following:

  1. Remove the EOS token from the output.
    • This was the primary issue as 3rd party chat clients did not expect the EOS token (<|eot_id|> for LLAMA 3.2) to be included in the response, so they were not filtering them.
    • It did not manifest in the built-in Tinychat UI as the EOS token was only emitted with finish_reason="stop", in which case Tinychat ignored the content of the delta.
  2. When streaming, the API now emits a data: [DONE] event to indicate completion before terminating the stream.
  3. When streaming, only include the message under the delta key rather than both that and the message, mirroring the OpenAI type.
  4. If the streamed content is empty (as will occur for the finishing chunk now that the EOS token is stripped), set delta: {}.

Testing

To test this you can use the llm command line tool configured to talk to exo by editing the following file "~/Library/Application\ Support/io.datasette.llm/extra-openai-models.yaml" on macOS to contain the following,

- model_id: llama-3.2-1b
  model_name: llama-3.2-1b
  api_base: "http://localhost:52415/v1"

Old

❯ llm chat -m llama-3.2-1b
Chatting with llama-3.2-1b
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hello
Hello! How can I assist you today?<|eot_id|>

New

❯ llm chat -m llama-3.2-1b
Chatting with llama-3.2-1b
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hello
Hello! How can I assist you today?

Note the lack of "<|eot_id|>" in the output.

joshuacoles avatar Feb 19 '25 17:02 joshuacoles

Awesome work. Will merge if tests pass/

AlexCheema avatar Feb 19 '25 21:02 AlexCheema

please fix bench.py to support this. suggested fix:

Screenshot 2025-02-19 at 9 22 35 PM

AlexCheema avatar Feb 19 '25 21:02 AlexCheema

I will resolve this tomorrow when I am at my desk. Thanks!

joshuacoles avatar Feb 19 '25 21:02 joshuacoles

I've made the changes you suggested to bench.py and removed some additional changes that slipped in that I am still working on for another PR.

joshuacoles avatar Feb 20 '25 09:02 joshuacoles

Closing in favour of the combined PR #734

joshuacoles avatar Mar 07 '25 15:03 joshuacoles