continue Not able to reach the Ollama local model hosted on another machine

Not able to reach the Ollama local model hosted on another machine

Open msharara1998 opened this issue 4 months ago • 5 comments

Issue Category

Undocumented feature or missing documentation

Affected Documentation Page URL

No response

Issue Description

I have the following config.json:

{
  "models": [
    {
      "title": "Qwen 2.5 Coder 7b",
      "model": "qwen-2.5-coder-instruct-7b",
      "provider": "ollama",
      "apiBase": "http://192.168.120.243:9000/v1/chat/completions"
    }
  ],
  "contextProviders": [
    {
      "name": "code",
      "params": {}
    },
    {
      "name": "docs",
      "params": {}
    },
    {
      "name": "diff",
      "params": {}
    },
    {
      "name": "terminal",
      "params": {}
    },
    {
      "name": "problems",
      "params": {}
    },
    {
      "name": "folder",
      "params": {}
    },
    {
      "name": "codebase",
      "params": {}
    }
  ],
  "slashCommands": [
    {
      "name": "share",
      "description": "Export the current chat session to markdown"
    },
    {
      "name": "cmd",
      "description": "Generate a shell command"
    },
    {
      "name": "commit",
      "description": "Generate a git commit message"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen 2.5b Autocomplete Model",
    "provider": "ollama",
    "model": "qwen-2.5-coder-instruct-7b",
    "apiBase": "http://192.168.120.243:9000/v1/"
  },
  "data": [],
  "docs": [
    {
      "startUrl": "https://requests.readthedocs.io",
      "title": "requests"
    }
  ]
}

I am not being to receive a response (error 404) Here is the Ollama server logs:

time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-1f0828c4-2144-92a0-a19b-ece2a193546
a library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="2.7 GiB"
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-a4326111-a43d-9b34-3414-701320dafc9
5 library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="9.6 GiB"
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-5d59eeb5-c7c3-5bd7-6917-a1e7bac7a9f
e library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="10.1 GiB"
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-1ce4c642-299f-18b4-78b1-497671a6852
b library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="9.2 GiB"
[GIN] 2025/06/09 - 15:01:40 | 404 |     808.436µs |  192.168.120.28 | POST     "/api/show"
[GIN] 2025/06/09 - 15:01:40 | 404 |     340.388µs |  192.168.120.28 | POST     "/api/show"
[GIN] 2025/06/09 - 15:01:45 | 404 |     985.237µs |  192.168.120.28 | POST     "/api/chat"
[GIN] 2025/06/09 - 15:01:59 | 404 |       8.957µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:00 | 404 |       4.077µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:05 | 404 |       6.563µs |  192.168.120.28 | POST     "/v1/api/chat"
[GIN] 2025/06/09 - 15:02:35 | 404 |      11.732µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:36 | 404 |       4.057µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:42 | 404 |       7.965µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:43 | 404 |       7.394µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:51 | 404 |       7.253µs |  192.168.120.28 | POST     "/v1/api/chat"
[GIN] 2025/06/09 - 15:03:23 | 404 |       8.096µs |  192.168.120.28 | POST     "/v1/api/chat"
[GIN] 2025/06/09 - 15:03:44 | 404 |       7.905µs |  192.168.120.28 | POST     "/v1/chat/api/show"
[GIN] 2025/06/09 - 15:03:44 | 404 |       4.258µs |  192.168.120.28 | POST     "/v1/chat/api/show"
[GIN] 2025/06/09 - 15:03:47 | 404 |       8.836µs |  192.168.120.28 | POST     "/v1/chat/api/chat"
[GIN] 2025/06/09 - 15:03:55 | 404 |        5.08µs |  192.168.120.28 | POST     "/v1/chat/completions/api/show"
[GIN] 2025/06/09 - 15:03:56 | 404 |       4.328µs |  192.168.120.28 | POST     "/v1/chat/completions/api/show"
[GIN] 2025/06/09 - 15:03:56 | 404 |       3.457µs |  192.168.120.28 | POST     "/v1/chat/completions/api/chat"
...

I tried using /v1, /v1/ and /v1/chat/completions but none of them worked. The extension is requesting /api/chat and /api/show endpoints. There are nothing in the documentation mentioning how to handle this.

Expected Content

Detail how to precisely specify the URL of a locally hosted model, with the proper endpoints. Better to have an end-to-end config.json or config.yaml example. Thanks!

Jun 09 '25 15:06 msharara1998

continue continue copied to clipboard

Not able to reach the Ollama local model hosted on another machine

Issue Category

Affected Documentation Page URL

Issue Description

Expected Content

continue
continue copied to clipboard