continue icon indicating copy to clipboard operation
continue copied to clipboard

Not able to reach the Ollama local model hosted on another machine

Open msharara1998 opened this issue 4 months ago • 5 comments

Issue Category

Undocumented feature or missing documentation

Affected Documentation Page URL

No response

Issue Description

I have the following config.json:

{
  "models": [
    {
      "title": "Qwen 2.5 Coder 7b",
      "model": "qwen-2.5-coder-instruct-7b",
      "provider": "ollama",
      "apiBase": "http://192.168.120.243:9000/v1/chat/completions"
    }
  ],
  "contextProviders": [
    {
      "name": "code",
      "params": {}
    },
    {
      "name": "docs",
      "params": {}
    },
    {
      "name": "diff",
      "params": {}
    },
    {
      "name": "terminal",
      "params": {}
    },
    {
      "name": "problems",
      "params": {}
    },
    {
      "name": "folder",
      "params": {}
    },
    {
      "name": "codebase",
      "params": {}
    }
  ],
  "slashCommands": [
    {
      "name": "share",
      "description": "Export the current chat session to markdown"
    },
    {
      "name": "cmd",
      "description": "Generate a shell command"
    },
    {
      "name": "commit",
      "description": "Generate a git commit message"
    }
  ],
  "tabAutocompleteModel": {
    "title": "Qwen 2.5b Autocomplete Model",
    "provider": "ollama",
    "model": "qwen-2.5-coder-instruct-7b",
    "apiBase": "http://192.168.120.243:9000/v1/"
  },
  "data": [],
  "docs": [
    {
      "startUrl": "https://requests.readthedocs.io",
      "title": "requests"
    }
  ]
}

I am not being to receive a response (error 404) Here is the Ollama server logs:

time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-1f0828c4-2144-92a0-a19b-ece2a193546
a library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="2.7 GiB"
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-a4326111-a43d-9b34-3414-701320dafc9
5 library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="9.6 GiB"
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-5d59eeb5-c7c3-5bd7-6917-a1e7bac7a9f
e library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="10.1 GiB"
time=2025-06-09T14:52:06.479Z level=INFO source=types.go:130 msg="inference compute" id=GPU-1ce4c642-299f-18b4-78b1-497671a6852
b library=cuda variant=v12 compute=12.0 driver=12.8 name="NVIDIA GeForce RTX 5090" total="31.4 GiB" available="9.2 GiB"
[GIN] 2025/06/09 - 15:01:40 | 404 |     808.436µs |  192.168.120.28 | POST     "/api/show"
[GIN] 2025/06/09 - 15:01:40 | 404 |     340.388µs |  192.168.120.28 | POST     "/api/show"
[GIN] 2025/06/09 - 15:01:45 | 404 |     985.237µs |  192.168.120.28 | POST     "/api/chat"
[GIN] 2025/06/09 - 15:01:59 | 404 |       8.957µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:00 | 404 |       4.077µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:05 | 404 |       6.563µs |  192.168.120.28 | POST     "/v1/api/chat"
[GIN] 2025/06/09 - 15:02:35 | 404 |      11.732µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:36 | 404 |       4.057µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:42 | 404 |       7.965µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:43 | 404 |       7.394µs |  192.168.120.28 | POST     "/v1/api/show"
[GIN] 2025/06/09 - 15:02:51 | 404 |       7.253µs |  192.168.120.28 | POST     "/v1/api/chat"
[GIN] 2025/06/09 - 15:03:23 | 404 |       8.096µs |  192.168.120.28 | POST     "/v1/api/chat"
[GIN] 2025/06/09 - 15:03:44 | 404 |       7.905µs |  192.168.120.28 | POST     "/v1/chat/api/show"
[GIN] 2025/06/09 - 15:03:44 | 404 |       4.258µs |  192.168.120.28 | POST     "/v1/chat/api/show"
[GIN] 2025/06/09 - 15:03:47 | 404 |       8.836µs |  192.168.120.28 | POST     "/v1/chat/api/chat"
[GIN] 2025/06/09 - 15:03:55 | 404 |        5.08µs |  192.168.120.28 | POST     "/v1/chat/completions/api/show"
[GIN] 2025/06/09 - 15:03:56 | 404 |       4.328µs |  192.168.120.28 | POST     "/v1/chat/completions/api/show"
[GIN] 2025/06/09 - 15:03:56 | 404 |       3.457µs |  192.168.120.28 | POST     "/v1/chat/completions/api/chat"
...

I tried using /v1, /v1/ and /v1/chat/completions but none of them worked. The extension is requesting /api/chat and /api/show endpoints. There are nothing in the documentation mentioning how to handle this.

Expected Content

Detail how to precisely specify the URL of a locally hosted model, with the proper endpoints. Better to have an end-to-end config.json or config.yaml example. Thanks!

msharara1998 avatar Jun 09 '25 15:06 msharara1998