buttercup
buttercup copied to clipboard
Allow configuration for local inferencing.
What would it take to configure Buttercup to make LLM calls against a local inference engine (i.e. Ollama, Llama.cpp, vLLm, etc)? For folks who want to experiment with this project locally without using a cloud API provider, this would be a huge boon.
It should be a relatively low lift, considering the project is already using LiteLLM as a proxy for LLM requests.
We'll be working on this soon