continue icon indicating copy to clipboard operation
continue copied to clipboard

openai API `/v1/completions` request fails for code autocompletion

Open clement-igonet opened this issue 1 year ago • 4 comments

Before submitting your bug report

Relevant environment info

- OS: MacOS / M3
- Continue: 0.9.76
- IDE: VSCodium / January 2024 (version 1.86)

Description

Autocompletion feature fails on openai-like API servers:

It should request /v1/chat/completions rather /v1/completions (deprecated)

extract of my config.json:

  "tabAutocompleteModel": {
    "title": "LMStudio",
    "provider": "openai",
    "model": "deepseek-ai_deepseek-coder-6.7b-base",
    "apiBase": "http://localhost:1234/v1/"
  },
  "tabAutocompleteOptions": {
    "useLegacyCompletionsEndpoint": false
  },

To reproduce

The failing request can be reproduced like this:

curl http://localhost:1234/v1/completions -H "Content-Type: application/json" -d '{
  "model": "deepseek-ai_deepseek-coder-6.7b-instruct",
  "max_tokens": 1024,
  "temperature": 0,
  "stop": [
    "\n",
    "<|fim▁begin|>",
    "<|fim▁hole|>",
    "<|fim▁end|>",
    "//",
    "\n\n",
    "```",
    "function",
    "class",
    "module",
    "export"
  ],
  "prompt": "<|fim▁begin|>hello!<|fim▁hole|>\n<|fim▁end|>",
  "stream": true
  }'

Log output

Logs from LMStudio (openai API compatible):

[2024-03-03 11:55:13.217] [INFO] [LM STUDIO SERVER] Processing queued request...
[2024-03-03 11:55:13.218] [INFO] Received POST request to /v1/completions with body: {
  "model": "deepseek-ai_deepseek-coder-6.7b-instruct",
  "max_tokens": 1024,
  "temperature": 0,
  "stop": [
    "\n",
    "<|fim▁begin|>",
    "<|fim▁hole|>",
    "<|fim▁end|>",
    "//",
    "\n\n",
    "```",
    "function",
    "class",
    "module",
    "export"
  ],
  "prompt": "<|fim▁begin|>hello!<|fim▁hole|>\n<|fim▁end|>",
  "stream": true
}
[2024-03-03 11:55:13.219] [INFO] Provided inference configuration: {
  "n_threads": 4,
  "n_predict": 1024,
  "top_k": 40,
  "min_p": 0.05,
  "top_p": 0.95,
  "temp": 0,
  "repeat_penalty": 1.1,
  "input_prefix": "### Instruction:\\n",
  "input_suffix": "\\n### Response:\\n",
  "antiprompt": [
    "### Instruction:",
    "\n",
    "<|fim▁begin|>",
    "<|fim▁hole|>",
    "<|fim▁end|>",
    "//",
    "\n\n",
    "```",
    "function",
    "class",
    "module",
    "export"
  ],
  "pre_prompt": "",
  "pre_prompt_suffix": "\\n",
  "pre_prompt_prefix": "",
  "seed": -1,
  "tfs_z": 1,
  "typical_p": 1,
  "repeat_last_n": 64,
  "frequency_penalty": 0,
  "presence_penalty": 0,
  "n_keep": 0,
  "logit_bias": {},
  "mirostat": 0,
  "mirostat_tau": 5,
  "mirostat_eta": 0.1,
  "memory_f16": true,
  "multiline_input": false,
  "penalize_nl": true
}
[2024-03-03 11:55:13.219] [INFO] Streaming response..
[2024-03-03 11:55:13.220] [INFO] [LM STUDIO SERVER] Processing...
[2024-03-03 11:55:13.554] [ERROR] [Server Error] {"title":"unordered_map::at: key not found"}
[2024-03-03 11:55:13.554] [INFO] [LM STUDIO SERVER] Finished streaming response
[2024-03-03 11:55:13.555] [ERROR] [Server Error] {"code":"ERR_HTTP_HEADERS_SENT"}
[2024-03-03 11:55:13.555] [ERROR] [Server Error] {"code":"ERR_HTTP_HEADERS_SENT"}

Logs from janhq/jan:

{"level":30,"time":1709469116179,"pid":46244,"hostname":"C4LM14-XRY9T.local","reqId":"req-j","req":{"method":"POST","url":"/v1/completions","hostname":"127.0.0.1:1337","remoteAddress":"127.0.0.1","remotePort":51269},"msg":"incoming request"}

{"level":30,"time":1709469116179,"pid":46244,"hostname":"C4LM14-XRY9T.local","reqId":"req-j","msg":"Route POST:/v1/completions not found"}

{"level":30,"time":1709469116179,"pid":46244,"hostname":"C4LM14-XRY9T.local","reqId":"req-j","res":{"statusCode":404},"responseTime":0.4752500057220459,"msg":"request completed"}

clement-igonet avatar Mar 03 '24 11:03 clement-igonet

@clement-igonet thanks for reporting this. It turns out to be a bug on LM Studio's side, stopping us from correctly sending specific tokens required for the autocomplete prompt.

I'm in touch with them and will update you when there's a fix

sestinj avatar Mar 03 '24 16:03 sestinj

Knowing that also janhq/jan does not support /v1/completions/ (anymore), making it possible to call /v1/chat/completions rather than /v1/completions for code autocompletion would synchronize Continue with updated openai API specs.

clement-igonet avatar Mar 03 '24 18:03 clement-igonet

Any update on this?

controldev avatar Apr 07 '24 17:04 controldev

@controldev @clement-igonet Yes, and thanks for the bump! It's now possible to set "useLegacyCompletionsEndpoint": false in your model config in config.json

sestinj avatar Apr 30 '24 16:04 sestinj

I tried using second-state/StarCoder2-15B-GGUF/starcoder2-15b-Q4_K_S.gguf with lmstudio, but it didn't generate any usable output, whereas it was working just fine with ollama. Is this related to the mentioned issue with the supported APIs in LM Studio?

olee avatar May 21 '24 10:05 olee