continue
continue copied to clipboard
Not able to reach the Ollama local model hosted on another machine
Issue Category
Undocumented feature or missing documentation
Affected Documentation Page URL
No response
Issue Description
I have the following config.json:
{
"models": [
{
"title": "Qwen 2.5 Coder 7b",
"model": "qwen-2.5-coder-instruct-7b",
"provider": "ollama",
"apiBase": "http://192.168.120.243:9000/v1/chat/completions"
}
],
"contextProviders": [
{
"name": "code",
"params": {}
},
{
"name": "docs",
"params": {}
},
{
"name": "diff",
"params": {}
},
{
"name": "terminal",
"params": {}
},
{
"name": "problems",
"params": {}
},
{
"name": "folder",
"params": {}
},
{
"name": "codebase",
"params": {}
}
],
"slashCommands": [
{
"name": "share",
"description": "Export the current chat session to markdown"
},
{
"name": "cmd",
"description": "Generate a shell command"
},
{
"name": "commit",
"description": "Generate a git commit message"
}
],
"tabAutocompleteModel": {
"title": "Qwen 2.5b Autocomplete Model",
"provider": "ollama",
"model": "qwen-2.5-coder-instruct-7b",
"apiBase": "http://192.168.120.243:9000/v1/"
},
"data": [],
"docs": [
{
"startUrl": "https://requests.readthedocs.io",
"title": "requests"
}
]
}
I am not being to receive a response (error 404) Here is the Ollama server logs:
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-1f0828c4-2144-92a0-a19b-ece2a193546
a library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="2.7 GiB"
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-a4326111-a43d-9b34-3414-701320dafc9
5 library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="9.6 GiB"
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-5d59eeb5-c7c3-5bd7-6917-a1e7bac7a9f
e library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="10.1 GiB"
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-1ce4c642-299f-18b4-78b1-497671a6852
b library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="9.2 GiB"
[GIN] 2025/06/09 - 15:01:40 | 404 | 808.436µs | 192.168.120.28 | POST "/api/show"
[GIN] 2025/06/09 - 15:01:40 | 404 | 340.388µs | 192.168.120.28 | POST "/api/show"
[GIN] 2025/06/09 - 15:01:45 | 404 | 985.237µs | 192.168.120.28 | POST "/api/chat"
[GIN] 2025/06/09 - 15:01:59 | 404 | 8.957µs | 192.168.120.28 | POST "/v1/api/show"
[GIN] 2025/06/09 - 15:02:00 | 404 | 4.077µs | 192.168.120.28 | POST "/v1/api/show"
[GIN] 2025/06/09 - 15:02:05 | 404 | 6.563µs | 192.168.120.28 | POST "/v1/api/chat"
[GIN] 2025/06/09 - 15:02:35 | 404 | 11.732µs | 192.168.120.28 | POST "/v1/api/show"
[GIN] 2025/06/09 - 15:02:36 | 404 | 4.057µs | 192.168.120.28 | POST "/v1/api/show"
[GIN] 2025/06/09 - 15:02:42 | 404 | 7.965µs | 192.168.120.28 | POST "/v1/api/show"
[GIN] 2025/06/09 - 15:02:43 | 404 | 7.394µs | 192.168.120.28 | POST "/v1/api/show"
[GIN] 2025/06/09 - 15:02:51 | 404 | 7.253µs | 192.168.120.28 | POST "/v1/api/chat"
[GIN] 2025/06/09 - 15:03:23 | 404 | 8.096µs | 192.168.120.28 | POST "/v1/api/chat"
[GIN] 2025/06/09 - 15:03:44 | 404 | 7.905µs | 192.168.120.28 | POST "/v1/chat/api/show"
[GIN] 2025/06/09 - 15:03:44 | 404 | 4.258µs | 192.168.120.28 | POST "/v1/chat/api/show"
[GIN] 2025/06/09 - 15:03:47 | 404 | 8.836µs | 192.168.120.28 | POST "/v1/chat/api/chat"
[GIN] 2025/06/09 - 15:03:55 | 404 | 5.08µs | 192.168.120.28 | POST "/v1/chat/completions/api/show"
[GIN] 2025/06/09 - 15:03:56 | 404 | 4.328µs | 192.168.120.28 | POST "/v1/chat/completions/api/show"
[GIN] 2025/06/09 - 15:03:56 | 404 | 3.457µs | 192.168.120.28 | POST "/v1/chat/completions/api/chat"
...
I tried using /v1, /v1/ and /v1/chat/completions but none of them worked. The extension is requesting /api/chat and /api/show endpoints. There are nothing in the documentation mentioning how to handle this.
Expected Content
Detail how to precisely specify the URL of a locally hosted model, with the proper endpoints. Better to have an end-to-end config.json or config.yaml example. Thanks!