llm
llm copied to clipboard
Update LLM docs to recommend different local solutions
These days, I think llm-ollama and llm-llama-server are the best options for local models for most people.
Mainly because they run as a separate process, which means that the model stays loaded in between llm calls.
They are also really easy to install! Ollama has an installer and llama-server can be had from Homebrew.
Some random ideas for llm-llama-server:
- As far as I can tell, it can only serve one model of time, but you can run multiple instances on different ports. It would be good if there was a convenient way to get different model IDs for different ports somehow
- Shipping this well make
llm-llama-servera lot more compelling: #1117 llm-llamafileis a separate plugin at the moment but it should be retired in favor ofllm-llama-server- It might be good to have a
llm llama-server add ...command of some sort for registering additional ports, I'm not sure what that should look like yet though - How about a
llm llama-server startcommand which starts it running for you? - A wildly ambitious solution would be to bundle the binary in a bunch of wheels… most of the releases on https://github.com/ggml-org/llama.cpp/releases/tag/b5527 would easily fit in a 100 MB wheel, with the exception of the CUDA ones
- if we do that, it should not be a required dependency of the plugin because I imagine a lot of users will be happy to download and run it separately or use homebrew.