continue openai API `/v1/completions` request fails for code autocompletion

Before submitting your bug report

[X] I believe this is a bug. I'll try to join the Continue Discord for questions
[X] I'm not able to find an open issue that reports the same bug
[X] I've seen the troubleshooting guide on the Continue Docs

Relevant environment info

- OS: MacOS / M3
- Continue: 0.9.76
- IDE: VSCodium / January 2024 (version 1.86)

Description

Autocompletion feature fails on openai-like API servers:

It should request /v1/chat/completions rather /v1/completions (deprecated)

extract of my config.json:

  "tabAutocompleteModel": {
    "title": "LMStudio",
    "provider": "openai",
    "model": "deepseek-ai_deepseek-coder-6.7b-base",
    "apiBase": "http://localhost:1234/v1/"
  },
  "tabAutocompleteOptions": {
    "useLegacyCompletionsEndpoint": false
  },

To reproduce

The failing request can be reproduced like this:

curl http://localhost:1234/v1/completions -H "Content-Type: application/json" -d '{
  "model": "deepseek-ai_deepseek-coder-6.7b-instruct",
  "max_tokens": 1024,
  "temperature": 0,
  "stop": [
    "\n",
    "<｜fim▁begin｜>",
    "<｜fim▁hole｜>",
    "<｜fim▁end｜>",
    "//",
    "\n\n",
    "```",
    "function",
    "class",
    "module",
    "export"
  ],
  "prompt": "<｜fim▁begin｜>hello!<｜fim▁hole｜>\n<｜fim▁end｜>",
  "stream": true
  }'

Log output

Logs from LMStudio (openai API compatible):

[2024-03-03 11:55:13.217] [INFO] [LM STUDIO SERVER] Processing queued request...
[2024-03-03 11:55:13.218] [INFO] Received POST request to /v1/completions with body: {
  "model": "deepseek-ai_deepseek-coder-6.7b-instruct",
  "max_tokens": 1024,
  "temperature": 0,
  "stop": [
    "\n",
    "<｜fim▁begin｜>",
    "<｜fim▁hole｜>",
    "<｜fim▁end｜>",
    "//",
    "\n\n",
    "```",
    "function",
    "class",
    "module",
    "export"
  ],
  "prompt": "<｜fim▁begin｜>hello!<｜fim▁hole｜>\n<｜fim▁end｜>",
  "stream": true
}
[2024-03-03 11:55:13.219] [INFO] Provided inference configuration: {
  "n_threads": 4,
  "n_predict": 1024,
  "top_k": 40,
  "min_p": 0.05,
  "top_p": 0.95,
  "temp": 0,
  "repeat_penalty": 1.1,
  "input_prefix": "### Instruction:\\n",
  "input_suffix": "\\n### Response:\\n",
  "antiprompt": [
    "### Instruction:",
    "\n",
    "<｜fim▁begin｜>",
    "<｜fim▁hole｜>",
    "<｜fim▁end｜>",
    "//",
    "\n\n",
    "```",
    "function",
    "class",
    "module",
    "export"
  ],
  "pre_prompt": "",
  "pre_prompt_suffix": "\\n",
  "pre_prompt_prefix": "",
  "seed": -1,
  "tfs_z": 1,
  "typical_p": 1,
  "repeat_last_n": 64,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "n_keep": 0,
  "logit_bias": {},
  "mirostat": 0,
  "mirostat_tau": 5,
  "mirostat_eta": 0.1,
  "memory_f16": true,
  "multiline_input": false,
  "penalize_nl": true
}
[2024-03-03 11:55:13.219] [INFO] Streaming response..
[2024-03-03 11:55:13.220] [INFO] [LM STUDIO SERVER] Processing...
[2024-03-03 11:55:13.554] [ERROR] [Server Error] {"title":"unordered_map::at: key not found"}
[2024-03-03 11:55:13.554] [INFO] [LM STUDIO SERVER] Finished streaming response
[2024-03-03 11:55:13.555] [ERROR] [Server Error] {"code":"ERR_HTTP_HEADERS_SENT"}
[2024-03-03 11:55:13.555] [ERROR] [Server Error] {"code":"ERR_HTTP_HEADERS_SENT"}

Logs from janhq/jan:

{"level":30,"time":1709469116179,"pid":46244,"hostname":"C4LM14-XRY9T.local","reqId":"req-j","req":{"method":"POST","url":"/v1/completions","hostname":"127.0.0.1:1337","remoteAddress":"127.0.0.1","remotePort":51269},"msg":"incoming request"}

{"level":30,"time":1709469116179,"pid":46244,"hostname":"C4LM14-XRY9T.local","reqId":"req-j","msg":"Route POST:/v1/completions not found"}

{"level":30,"time":1709469116179,"pid":46244,"hostname":"C4LM14-XRY9T.local","reqId":"req-j","res":{"statusCode":404},"responseTime":0.4752500057220459,"msg":"request completed"}

Mar 03 '24 11:03 clement-igonet

@clement-igonet thanks for reporting this. It turns out to be a bug on LM Studio's side, stopping us from correctly sending specific tokens required for the autocomplete prompt.

I'm in touch with them and will update you when there's a fix

Mar 03 '24 16:03 sestinj

Knowing that also janhq/jan does not support /v1/completions/ (anymore), making it possible to call /v1/chat/completions rather than /v1/completions for code autocompletion would synchronize Continue with updated openai API specs.

Mar 03 '24 18:03 clement-igonet

Any update on this?

Apr 07 '24 17:04 controldev

@controldev @clement-igonet Yes, and thanks for the bump! It's now possible to set "useLegacyCompletionsEndpoint": false in your model config in config.json

Apr 30 '24 16:04 sestinj

I tried using second-state/StarCoder2-15B-GGUF/starcoder2-15b-Q4_K_S.gguf with lmstudio, but it didn't generate any usable output, whereas it was working just fine with ollama. Is this related to the mentioned issue with the supported APIs in LM Studio?

May 21 '24 10:05 olee

continue continue copied to clipboard

openai API `/v1/completions` request fails for code autocompletion

Before submitting your bug report

Relevant environment info

Description

To reproduce

Log output

Logs from LMStudio (openai API compatible):

Logs from janhq/jan:

continue
continue copied to clipboard