tabby icon indicating copy to clipboard operation
tabby copied to clipboard

Embedding unreachable, but llama is running

Open LucaFulchir opened this issue 9 months ago • 3 comments

Hello, I'm trying to run/test tabby, but I have problems with the embedding instance Using version 0.27, NixOS unstable server.

Ai completion and Ai chat seem to work, but I can not add a git context provider of a public repo, it seems to clone successfully but can't parse a single file.

config.toml:

[model.completion.local]
model_id = "Qwen2.5-Coder-3B"

[model.chat.local]
model_id = "Qwen2.5-Coder-1.5B-Instruct"

[model.embedding.local]
model_id = "Nomic-Embed-Text"

running with:

tabby serve --model Qwen2.5-Coder-3B --host 192.168.1.10 --port 11029 --device rocm

testing on AMD Ryzen 7 8845HS w/ Radeon 780M Graphics

on the tabby web interface, on the systems page I see "Unreachable" only under "Enbedding", with error "error decoding response body"

The llama instance seems to be UP and by dumping the local traffic I see the following req/responses:

GET /health HTTP/1.1
accept: */*
host: 127.0.0.1:30888

HTTP/1.1 200 OK
Access-Control-Allow-Origin:
Content-Length: 15
Content-Type: application/json; charset=utf-8
Keep-Alive: timeout=5, max=5
Server: llama.cpp
------
POST /tokenize HTTP/1.1
content-type: application/json
accept: */*
host: 127.0.0.1:30888
content-length: 25

{"content":"hello Tabby"}

HTTP/1.1 200 OK
Access-Control-Allow-Origin: 
Content-Length: 28
Content-Type: application/json; charset=utf-8
Keep-Alive: timeout=5, max=5
Server: llama.cpp

{"tokens":[7592,21628,3762]}
-----------
POST /embeddings HTTP/1.1
content-type: application/json
accept: */*
host: 127.0.0.1:30888
content-length: 27

{"content":"hello Tabby\n"}

HTTP/1.1 200 OK
Access-Control-Allow-Origin:
Content-Length: 16226
Content-Type: application/json; charset=utf-8
Keep-Alive: timeout=5, max=5
Server: llama.cpp

{"embedding":[0.0018252730369567871, **a lot more floats**,-0.024591289460659027],"index":0}

Additional tabby logging even when running with RUST_LOG=debug are all like:

WARN tabby_index::indexer: crates/tabby-index/src/indexer.rs:90: Failed to build chunk for document 'git:R1AWw5:::{"path":"/var/lib/tabby/repositories/[redacted]/src/connection/handshake/dirsync/req.rs","language":"rust","git_hash":"906b1491a1a0ecb98781568b24d8ba781d6765e2"}': Failed to embed chunk text: error decoding response body

what can I try/what I am doing wrong?

LucaFulchir avatar Apr 03 '25 20:04 LucaFulchir

updated to 0.27.1, tried different models thanks to more ram, the local embedding is still marked as 'unreachable', same errors

LucaFulchir avatar Apr 11 '25 18:04 LucaFulchir

workaround: use http for embeddings, not local

I literally copied the llama-server cmdline and ran llama manually. connecting this way works

[model.embedding.http]
kind = "llama.cpp/embedding"
model_name = "Nomic-Embed-Text"
api_endpoint = "http://127.0.0.1:30887"

I think the kind is wrong when using the [model.embedding.local] at this point

LucaFulchir avatar Apr 12 '25 17:04 LucaFulchir

hi @LucaFulchir, Did you utilize the llama.cpp that came with Tabby, or was it installed manually as a separate component?

zwpaper avatar Apr 18 '25 02:04 zwpaper

tabby is configured to use nixos llama.cpp, built to use vulkan. Currently seems to be release b4154

Now I notice that when I run llama manually instead it uses realase b5141, which is much newer

LucaFulchir avatar Apr 27 '25 10:04 LucaFulchir

The llama.cpp included in the Tabby release should be functional.

The llama.cpp embedding API has been updated post-build b4356. Please verify your version and configure it appropriately.

for more detail, you could check: https://tabby.tabbyml.com/docs/references/models-http-api/llama.cpp/

zwpaper avatar May 03 '25 17:05 zwpaper