feat(config): add support for remote embedding services via config.toml
Description
This pull request introduces support for remote embedding models in config.toml, enabling users to delegate embedding generation to external servers. This is particularly valuable for environments without GPU or lacking the ability to run llama-server locally.
What’s Changed
-
Added support for
[embedding]section withtype = "remote"andendpointfields inconfig.toml. -
Updated
embedding::create()andmodel::load_embedding()to supportModelConfig::Http(remote models). -
Prevents Tabby from launching the local
llama-serverprocess when using a remote embedding service. -
Keeps compatibility with local embeddings (no breaking changes).
Example Usage
[embedding]
type = "remote"
endpoint = "http://localhost:5000"
model = "BAAI/bge-small-en"
Run an embedding service like:
uvicorn main:app --host 0.0.0.0 --port 5000
Then launch Tabby:
./target/release/tabby serve
Motivation
Currently, Tabby always attempts to launch its internal llama-server binary, which fails on machines without compatible GPU or CUDA libraries. This PR introduces flexibility and portability, enabling Tabby to run in lightweight environments with minimal dependencies.
How to Test
-
Start Tabby with a valid
config.tomlthat includes a remote embedding config. -
Verify that:
- Tabby starts without attempting to run
llama-server - Embedding API requests are successfully forwarded to the remote server
- Tabby starts without attempting to run
Known Limitations
-
This does not disable internal embeddings when
[embedding]is omitted (default behavior). -
The remote server must follow the expected API (e.g.,
/v1/embeddingsin OpenAI-compatible format).
Request for Review
Would love feedback on:
-
Integration approach
-
Potential edge cases to test
-
Any docs you'd like me to include
Let me know if you'd like me to add a sample embedding server (Python FastAPI) or documentation PR as a follow-up!
May I ask why you don't use this configuration
[model.embedding.http]
kind = "ollama/embedding"
model_name = "nomic-embed-text"
api_endpoint = "http://localhost:11434"
Right - you can always configure a remote embedding through http endpoint, e.g
https://tabby.tabbyml.com/docs/references/models-http-api/llama.cpp/
May I ask why you don't use this configuration
[model.embedding.http] kind = "ollama/embedding" model_name = "nomic-embed-text" api_endpoint = "http://localhost:11434"
I understand now that Tabby already supports HTTP-based embedding backends via the [model.embedding.http] configuration. However, my intention with this PR is to improve the usability and discoverability of this feature:
Why I chose to propose [embedding] type = "remote":
-
Consistency with existing config sections like
[model]that already usetype = "local"ortype = "remote". -
Cleaner and simpler UX for users who are not familiar with internal model kinds like
"ollama/embedding"orllama.cpp/embedding. -
This makes remote embedding configuration feel native and explicit, just like other parts of
config.toml, without requiring users to constructHttpModelConfigmanually.
If you think this is redundant or beyond the scope of the Tabby configuration concept, no problem, I will understand the decision to reject or revise the PR. I just wanted to explain the motivation and maybe initiate a dialogue.