jan
jan copied to clipboard
bug: responses from /chat/completions endpoint contain a leading space in the content
Jan's API server responds with a leading space. This leads to broken output (markdown tables don't render right) and illegal file names when the output is used to generate note titles which are in turn used as the .md filename.
Call:
POST http://127.0.0.1:1337/v1/chat/completions
**headers**
content-type: application/json
**body**
{
"messages": [
{
"content": "You are a helpful assistant.",
"role": "system"
},
{
"content": "Hello!",
"role": "user"
}
],
"model": "mistral-ins-7b-q5",
"stream": true,
"max_tokens": 4096,
"stop": [
"</s>"
],
"frequency_penalty": 0,
"presence_penalty": 0,
"temperature": 0.7,
"top_p": 0.95
}
Response:
data: {"choices":[{"delta":{"content":" Hello"},"finish_reason":null,"index":0}],"created":1711886550,"id":"K75WwlMq7nBjqPW4FGlR","model":"_","object":"chat.completion.chunk"}
data: {"choices":[{"delta":{"content":" there"},"finish_reason":null,"index":0}],"created":1711886550,"id":"ADqnjtzqwx1zbUiWVXVm","model":"_","object":"chat.completion.chunk"}
...etc
Expected output:
data: {"choices":[{"delta":{"content":"Hello"},"finish_reason":null,"index":0}],"created":1711886550,"id":"K75WwlMq7nBjqPW4FGlR","model":"_","object":"chat.completion.chunk"}
data: {"choices":[{"delta":{"content":" there"},"finish_reason":null,"index":0}],"created":1711886550,"id":"ADqnjtzqwx1zbUiWVXVm","model":"_","object":"chat.completion.chunk"}
...etc
Tested with "stream": false as well and the same is true for un-chunked chat.completion objects.
causes https://github.com/longy2k/obsidian-bmo-chatbot/issues/66 and https://github.com/longy2k/obsidian-bmo-chatbot/issues/67
https://github.com/ggerganov/llama.cpp/issues/3664 might be related? (would mean it's in nitro.exe which uses llama.cpp) also: https://github.com/ggerganov/llama.cpp/issues/367#issuecomment-1479348872
Reading the 2 issues above plus https://github.com/ggerganov/llama.cpp/pull/4081 the leading space appears to be added during tokenization on purpose and is even needed for some models to work correctly. I'm still unsure how/if tokenization, of what I thought was done to the input to be processed by a model, relates to the generation of a response. Is the same done/needed for the output? Perhaps because of threads of messages where previous replies become context/input for the next prompt?
Without going further into the rabbit hole of how tokenization works internally and whether it applies to completion.... The OpenAI API spec (and Mistral AI API as well) gives examples of expected response, without a leading space.
hi @Propheticus, dev team resolved the issue, would you mind retrying it? many thanks 🙏
Looks good to me @Van-QA 👍 tested on Jan v0.4.11-386 nightly.