Archon Feature request to use a self-hosted embedding model

Would it be possible to use a self-hosted embedding mode?

Something along the lines of: ghcr.io/huggingface/text-embeddings-inference

Where you can provide a model like: sentence-transformers/all-MiniLM-L6-v2

Let me know if this is a silly idea, but I would like to use a self hosted model without being forced to use openai/gemini, then at the same time the chat model can be from something behind litellm.

Aug 19 '25 12:08 tinuva

Hey, this should already work with ollama. Did not test with the model you mentioned but this would be the configs:

Setting LLM_PROVIDER=ollama (coming soon => but works basiscally already)
Setting EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2 (or any other model, not sure if you need to pull it yourself first at the moment)
Running the embedding service on localhost:11434

Aug 19 '25 22:08 leex279

@tazmon95 Assigned to you as i think your PR will solve this?

Aug 20 '25 19:08 Wirasm

@Wirasm What I'm working on will enable selfhosted Ollama embedding models, but not directly from a Hugging Face model hosted locally. I'd put this in the backlog right now

Aug 21 '25 14:08 tazmon95

I can confirm successful use of ollama, at least for embedding using nomic model. But the documentation to that regard is quite scarce. I had to use an openai compatible url: http://<ollama_url>:11434/v1 where the v1 is mandatory.

Aug 27 '25 09:08 vlebourl

@vlebourl a bigger update for Ollama is coming in the next days and better docs. @tazmon95 please take a look so cover this if not, but I guess it is :)

Aug 27 '25 10:08 leex279

looking forward to it as I'm dirty patching the code on my end to make it semi work for now 😅. Do you have a feature branch to test somewhere and report bugs if it can help?

Aug 28 '25 07:08 vlebourl