exo Improvements to the ChatGPT API to more closely match the official OpenAI endpoint and improve compatibility

Hey, as I mentioned on Discord, we are experimenting with a Mac mini Exo deploy at work and noticed that the ChatGPT API offered by Exo behaved differently than the official OpenAI endpoint in a way which made working with 3rd party chat clients difficult.

To make the Exo API behave more similarly to OpenAI's, we have done the following:

Remove the EOS token from the output.
- This was the primary issue as 3rd party chat clients did not expect the EOS token (<|eot_id|> for LLAMA 3.2) to be included in the response, so they were not filtering them.
- It did not manifest in the built-in Tinychat UI as the EOS token was only emitted with finish_reason="stop", in which case Tinychat ignored the content of the delta.
When streaming, the API now emits a data: [DONE] event to indicate completion before terminating the stream.
When streaming, only include the message under the delta key rather than both that and the message, mirroring the OpenAI type.
If the streamed content is empty (as will occur for the finishing chunk now that the EOS token is stripped), set delta: {}.

Testing

To test this you can use the llm command line tool configured to talk to exo by editing the following file "~/Library/Application\ Support/io.datasette.llm/extra-openai-models.yaml" on macOS to contain the following,

- model_id: llama-3.2-1b
  model_name: llama-3.2-1b
  api_base: "http://localhost:52415/v1"

Old

❯ llm chat -m llama-3.2-1b
Chatting with llama-3.2-1b
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hello
Hello! How can I assist you today?<|eot_id|>

New

❯ llm chat -m llama-3.2-1b
Chatting with llama-3.2-1b
Type 'exit' or 'quit' to exit
Type '!multi' to enter multiple lines, then '!end' to finish
> Hello
Hello! How can I assist you today?

Note the lack of "<|eot_id|>" in the output.

Feb 19 '25 17:02 joshuacoles

Awesome work. Will merge if tests pass/

Feb 19 '25 21:02 AlexCheema

please fix bench.py to support this. suggested fix:

Feb 19 '25 21:02 AlexCheema

I will resolve this tomorrow when I am at my desk. Thanks!

Feb 19 '25 21:02 joshuacoles

I've made the changes you suggested to bench.py and removed some additional changes that slipped in that I am still working on for another PR.

Feb 20 '25 09:02 joshuacoles

Closing in favour of the combined PR #734

Mar 07 '25 15:03 joshuacoles