llm
llm copied to clipboard
Add an ollama plugin
Ollama makes it easy to run models such as llama2 locally on macOS easily:
https://ollama.ai/
The user runs a server on localhost, so the architecture of the plugin could likely follow the existing replicate plugin.
from the ollama docs:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'
the output is quite straightforward and amenable to streaming:
{"model":"llama2","created_at":"2023-08-24T20:24:05.78795Z","response":" The","done":false}
{"model":"llama2","created_at":"2023-08-24T20:24:05.805889Z","response":" sky","done":false}
{"model":"llama2","created_at":"2023-08-24T20:24:05.824734Z","response":" appears","done":false}
{"model":"llama2","created_at":"2023-08-24T20:24:05.842502Z","response":" blue","done":false}
{"model":"llama2","created_at":"2023-08-24T20:24:05.860295Z","response":" because","done":false}
...