Integrating local LLMs

Open ogzhanolguncu opened this issue 1 year ago • 0 comments

Currently, the RAG SDK only supports hosted models. If we could enable the use of local models, similar to web-llm, that would be great. The only issue is that while they are OpenAI-compatible, they don’t provide an endpoint to query from. I believe we could write a simple HTTP server using bun.js and trigger it only if the user decides to use one of the local LLMs.

For example:

Start the model.
Hook it up to a simple server.
Generate a predefined URL and provide it to our base LLM client. And, add it like any other model.

The rest should be straightforward.

[ ] https://github.com/mlc-ai/web-llm
[x] https://ollama.com/

Aug 13 '24 08:08 ogzhanolguncu