Embedding unreachable, but llama is running
Hello, I'm trying to run/test tabby, but I have problems with the embedding instance Using version 0.27, NixOS unstable server.
Ai completion and Ai chat seem to work, but I can not add a git context provider of a public repo, it seems to clone successfully but can't parse a single file.
config.toml:
[model.completion.local]
model_id = "Qwen2.5-Coder-3B"
[model.chat.local]
model_id = "Qwen2.5-Coder-1.5B-Instruct"
[model.embedding.local]
model_id = "Nomic-Embed-Text"
running with:
tabby serve --model Qwen2.5-Coder-3B --host 192.168.1.10 --port 11029 --device rocm
testing on AMD Ryzen 7 8845HS w/ Radeon 780M Graphics
on the tabby web interface, on the systems page I see "Unreachable" only under "Enbedding", with error "error decoding response body"
The llama instance seems to be UP and by dumping the local traffic I see the following req/responses:
GET /health HTTP/1.1
accept: */*
host: 127.0.0.1:30888
HTTP/1.1 200 OK
Access-Control-Allow-Origin:
Content-Length: 15
Content-Type: application/json; charset=utf-8
Keep-Alive: timeout=5, max=5
Server: llama.cpp
------
POST /tokenize HTTP/1.1
content-type: application/json
accept: */*
host: 127.0.0.1:30888
content-length: 25
{"content":"hello Tabby"}
HTTP/1.1 200 OK
Access-Control-Allow-Origin:
Content-Length: 28
Content-Type: application/json; charset=utf-8
Keep-Alive: timeout=5, max=5
Server: llama.cpp
{"tokens":[7592,21628,3762]}
-----------
POST /embeddings HTTP/1.1
content-type: application/json
accept: */*
host: 127.0.0.1:30888
content-length: 27
{"content":"hello Tabby\n"}
HTTP/1.1 200 OK
Access-Control-Allow-Origin:
Content-Length: 16226
Content-Type: application/json; charset=utf-8
Keep-Alive: timeout=5, max=5
Server: llama.cpp
{"embedding":[0.0018252730369567871, **a lot more floats**,-0.024591289460659027],"index":0}
Additional tabby logging even when running with RUST_LOG=debug are all like:
WARN tabby_index::indexer: crates/tabby-index/src/indexer.rs:90: Failed to build chunk for document 'git:R1AWw5:::{"path":"/var/lib/tabby/repositories/[redacted]/src/connection/handshake/dirsync/req.rs","language":"rust","git_hash":"906b1491a1a0ecb98781568b24d8ba781d6765e2"}': Failed to embed chunk text: error decoding response body
what can I try/what I am doing wrong?
updated to 0.27.1, tried different models thanks to more ram, the local embedding is still marked as 'unreachable', same errors
workaround: use http for embeddings, not local
I literally copied the llama-server cmdline and ran llama manually.
connecting this way works
[model.embedding.http]
kind = "llama.cpp/embedding"
model_name = "Nomic-Embed-Text"
api_endpoint = "http://127.0.0.1:30887"
I think the kind is wrong when using the [model.embedding.local] at this point
hi @LucaFulchir, Did you utilize the llama.cpp that came with Tabby, or was it installed manually as a separate component?
tabby is configured to use nixos llama.cpp, built to use vulkan.
Currently seems to be release b4154
Now I notice that when I run llama manually instead it uses realase b5141, which is much newer
The llama.cpp included in the Tabby release should be functional.
The llama.cpp embedding API has been updated post-build b4356. Please verify your version and configure it appropriately.
for more detail, you could check: https://tabby.tabbyml.com/docs/references/models-http-api/llama.cpp/