continue
continue copied to clipboard
openai API `/v1/completions` request fails for code autocompletion
Before submitting your bug report
- [X] I believe this is a bug. I'll try to join the Continue Discord for questions
- [X] I'm not able to find an open issue that reports the same bug
- [X] I've seen the troubleshooting guide on the Continue Docs
Relevant environment info
- OS: MacOS / M3
- Continue: 0.9.76
- IDE: VSCodium / January 2024 (version 1.86)
Description
Autocompletion feature fails on openai-like API servers:
It should request /v1/chat/completions rather /v1/completions (deprecated)
extract of my config.json:
"tabAutocompleteModel": {
"title": "LMStudio",
"provider": "openai",
"model": "deepseek-ai_deepseek-coder-6.7b-base",
"apiBase": "http://localhost:1234/v1/"
},
"tabAutocompleteOptions": {
"useLegacyCompletionsEndpoint": false
},
To reproduce
The failing request can be reproduced like this:
curl http://localhost:1234/v1/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-ai_deepseek-coder-6.7b-instruct",
"max_tokens": 1024,
"temperature": 0,
"stop": [
"\n",
"<|fim▁begin|>",
"<|fim▁hole|>",
"<|fim▁end|>",
"//",
"\n\n",
"```",
"function",
"class",
"module",
"export"
],
"prompt": "<|fim▁begin|>hello!<|fim▁hole|>\n<|fim▁end|>",
"stream": true
}'
Log output
Logs from LMStudio (openai API compatible):
[2024-03-03 11:55:13.217] [INFO] [LM STUDIO SERVER] Processing queued request...
[2024-03-03 11:55:13.218] [INFO] Received POST request to /v1/completions with body: {
"model": "deepseek-ai_deepseek-coder-6.7b-instruct",
"max_tokens": 1024,
"temperature": 0,
"stop": [
"\n",
"<|fim▁begin|>",
"<|fim▁hole|>",
"<|fim▁end|>",
"//",
"\n\n",
"```",
"function",
"class",
"module",
"export"
],
"prompt": "<|fim▁begin|>hello!<|fim▁hole|>\n<|fim▁end|>",
"stream": true
}
[2024-03-03 11:55:13.219] [INFO] Provided inference configuration: {
"n_threads": 4,
"n_predict": 1024,
"top_k": 40,
"min_p": 0.05,
"top_p": 0.95,
"temp": 0,
"repeat_penalty": 1.1,
"input_prefix": "### Instruction:\\n",
"input_suffix": "\\n### Response:\\n",
"antiprompt": [
"### Instruction:",
"\n",
"<|fim▁begin|>",
"<|fim▁hole|>",
"<|fim▁end|>",
"//",
"\n\n",
"```",
"function",
"class",
"module",
"export"
],
"pre_prompt": "",
"pre_prompt_suffix": "\\n",
"pre_prompt_prefix": "",
"seed": -1,
"tfs_z": 1,
"typical_p": 1,
"repeat_last_n": 64,
"frequency_penalty": 0,
"presence_penalty": 0,
"n_keep": 0,
"logit_bias": {},
"mirostat": 0,
"mirostat_tau": 5,
"mirostat_eta": 0.1,
"memory_f16": true,
"multiline_input": false,
"penalize_nl": true
}
[2024-03-03 11:55:13.219] [INFO] Streaming response..
[2024-03-03 11:55:13.220] [INFO] [LM STUDIO SERVER] Processing...
[2024-03-03 11:55:13.554] [ERROR] [Server Error] {"title":"unordered_map::at: key not found"}
[2024-03-03 11:55:13.554] [INFO] [LM STUDIO SERVER] Finished streaming response
[2024-03-03 11:55:13.555] [ERROR] [Server Error] {"code":"ERR_HTTP_HEADERS_SENT"}
[2024-03-03 11:55:13.555] [ERROR] [Server Error] {"code":"ERR_HTTP_HEADERS_SENT"}
Logs from janhq/jan:
{"level":30,"time":1709469116179,"pid":46244,"hostname":"C4LM14-XRY9T.local","reqId":"req-j","req":{"method":"POST","url":"/v1/completions","hostname":"127.0.0.1:1337","remoteAddress":"127.0.0.1","remotePort":51269},"msg":"incoming request"}
{"level":30,"time":1709469116179,"pid":46244,"hostname":"C4LM14-XRY9T.local","reqId":"req-j","msg":"Route POST:/v1/completions not found"}
{"level":30,"time":1709469116179,"pid":46244,"hostname":"C4LM14-XRY9T.local","reqId":"req-j","res":{"statusCode":404},"responseTime":0.4752500057220459,"msg":"request completed"}
@clement-igonet thanks for reporting this. It turns out to be a bug on LM Studio's side, stopping us from correctly sending specific tokens required for the autocomplete prompt.
I'm in touch with them and will update you when there's a fix
Knowing that also janhq/jan does not support /v1/completions/ (anymore), making it possible to call /v1/chat/completions rather than /v1/completions for code autocompletion would synchronize Continue with updated openai API specs.
Any update on this?
@controldev @clement-igonet Yes, and thanks for the bump! It's now possible to set "useLegacyCompletionsEndpoint": false in your model config in config.json
I tried using second-state/StarCoder2-15B-GGUF/starcoder2-15b-Q4_K_S.gguf with lmstudio, but it didn't generate any usable output, whereas it was working just fine with ollama.
Is this related to the mentioned issue with the supported APIs in LM Studio?