jan bug: responses from /chat/completions endpoint contain a leading space in the content

Jan's API server responds with a leading space. This leads to broken output (markdown tables don't render right) and illegal file names when the output is used to generate note titles which are in turn used as the .md filename.

Call:

POST http://127.0.0.1:1337/v1/chat/completions
**headers**
content-type: application/json
**body**
{
  "messages": [
    {
      "content": "You are a helpful assistant.",
      "role": "system"
    },
    {
      "content": "Hello!",
      "role": "user"
    }
  ],
  "model": "mistral-ins-7b-q5",
  "stream": true,
  "max_tokens": 4096,
  "stop": [
    "</s>"
  ],
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "temperature": 0.7,
  "top_p": 0.95
}

Response:

data: {"choices":[{"delta":{"content":" Hello"},"finish_reason":null,"index":0}],"created":1711886550,"id":"K75WwlMq7nBjqPW4FGlR","model":"_","object":"chat.completion.chunk"}
data: {"choices":[{"delta":{"content":" there"},"finish_reason":null,"index":0}],"created":1711886550,"id":"ADqnjtzqwx1zbUiWVXVm","model":"_","object":"chat.completion.chunk"}
...etc

Expected output:

data: {"choices":[{"delta":{"content":"Hello"},"finish_reason":null,"index":0}],"created":1711886550,"id":"K75WwlMq7nBjqPW4FGlR","model":"_","object":"chat.completion.chunk"}
data: {"choices":[{"delta":{"content":" there"},"finish_reason":null,"index":0}],"created":1711886550,"id":"ADqnjtzqwx1zbUiWVXVm","model":"_","object":"chat.completion.chunk"}
...etc

Tested with "stream": false as well and the same is true for un-chunked chat.completion objects.

Mar 31 '24 12:03 Propheticus

causes https://github.com/longy2k/obsidian-bmo-chatbot/issues/66 and https://github.com/longy2k/obsidian-bmo-chatbot/issues/67

Mar 31 '24 12:03 Propheticus

https://github.com/ggerganov/llama.cpp/issues/3664 might be related? (would mean it's in nitro.exe which uses llama.cpp) also: https://github.com/ggerganov/llama.cpp/issues/367#issuecomment-1479348872

Mar 31 '24 15:03 Propheticus

Reading the 2 issues above plus https://github.com/ggerganov/llama.cpp/pull/4081 the leading space appears to be added during tokenization on purpose and is even needed for some models to work correctly. I'm still unsure how/if tokenization, of what I thought was done to the input to be processed by a model, relates to the generation of a response. Is the same done/needed for the output? Perhaps because of threads of messages where previous replies become context/input for the next prompt?

Apr 01 '24 17:04 Propheticus

Without going further into the rabbit hole of how tokenization works internally and whether it applies to completion.... The OpenAI API spec (and Mistral AI API as well) gives examples of expected response, without a leading space.

Apr 02 '24 09:04 Propheticus

hi @Propheticus, dev team resolved the issue, would you mind retrying it? many thanks 🙏

Apr 17 '24 09:04 Van-QA

Looks good to me @Van-QA 👍 tested on Jan v0.4.11-386 nightly.

Apr 17 '24 11:04 Propheticus

jan jan copied to clipboard

bug: responses from /chat/completions endpoint contain a leading space in the content

jan
jan copied to clipboard