Feature request to use a self-hosted embedding model
Would it be possible to use a self-hosted embedding mode?
Something along the lines of: ghcr.io/huggingface/text-embeddings-inference
Where you can provide a model like: sentence-transformers/all-MiniLM-L6-v2
Let me know if this is a silly idea, but I would like to use a self hosted model without being forced to use openai/gemini, then at the same time the chat model can be from something behind litellm.
Hey, this should already work with ollama. Did not test with the model you mentioned but this would be the configs:
- Setting LLM_PROVIDER=ollama (coming soon => but works basiscally already)
- Setting EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2 (or any other model, not sure if you need to pull it yourself first at the moment)
- Running the embedding service on localhost:11434
@tazmon95 Assigned to you as i think your PR will solve this?
@Wirasm What I'm working on will enable selfhosted Ollama embedding models, but not directly from a Hugging Face model hosted locally. I'd put this in the backlog right now
I can confirm successful use of ollama, at least for embedding using nomic model. But the documentation to that regard is quite scarce.
I had to use an openai compatible url: http://<ollama_url>:11434/v1 where the v1 is mandatory.
@vlebourl a bigger update for Ollama is coming in the next days and better docs. @tazmon95 please take a look so cover this if not, but I guess it is :)
looking forward to it as I'm dirty patching the code on my end to make it semi work for now 😅. Do you have a feature branch to test somewhere and report bugs if it can help?