marvin
marvin copied to clipboard
Add support for custom LLM API endpoints
First check
- [X] I added a descriptive title to this issue.
- [X] I used the GitHub search to look for a similar issue and didn't find it.
- [X] I searched the Marvin documentation for this feature.
Describe the current behavior
Currently, ai functions are limited to Anthropic and OpenAI as LLM providers. This prevents me from creating @ai_fn
which use my own hosted open source LLMs. An example AI function decorator in python:
from marvin import ai_fn
from marvin.engine.language_models.openai import OpenAIChatLLM
@ai_fn(model=OpenAIChatLLM(model="gpt-3.5-turbo-16k"))
def generate_summary(text: str) -> str:
"""
Generates a concise summary of the provided text.
"""
Describe the proposed behavior
Ideally, we would have the ability to easily write our own ChatLLM
instance in which we might define an API endpoint where the function should be called. There are several inference services (Huggingface, Replicate, Pyq, etc.) which would allow me to host an open source LLM and later fine tune that LLM for a specific function or task. At a high level, we could have
class llamaTwoChatLLM(ChatLLM):
model_endpoint: str = "https://<some_inference_endpoint>"
async def run():
...
Even supporting the replicate
SDK would be great for now
Example Use
class llamaTwoChatLLM(ChatLLM):
model: str = "llama-2" # example
model_endpoint: str = "https://<some_inference_endpoint>"
async def run():
...
Then the AI function could be
from marvin import ai_fn
from schemas.llm_chat import llamaTwoChatLLM
@ai_fn(model=llamaTwoChatLLM())
def generate_summary(text: str) -> str:
"""
Generates a concise summary of the provided text.
"""
Additional context
I'm sure this sort of open source LLM support is already on your roadmap, though I thought I'd create an issue in case other folks felt strongly as well. Big fan of marvin so far and excited to continue using the framework 😁
What about running on-premise, without accessing any API in the cloud? Would that be an option?
What about running on-premise, without accessing any API in the cloud? Would that be an option?
that should work with the above proposal if you create a simple web API wrapper (Flask, FastAPI) and use a localhost URL
Hm... aiming the model_endpoint to the local URL, right? So the model in the deepest layer, a web API wrapper over it, and Marvin on top of that... once it accepts custom endpoints. Considering we're working with C# and GRPC to call Python services, the complexity of it all skyrockets pretty fast. But if it's feasible, it will be tryed :)
Thanks!
llama.cpp includes a server which runs an OpenAI-compatible API. The docs there give an example of using the openai
python package with the client initialized with a base_url
pointing to the locally-hosted server. You might try that.
I see there's also https://github.com/PrefectHQ/marvin/pull/610 suggesting using the LiteLLM proxy for other services.
I also think that using alternative (local/remote) LLMs at least with the same API would make this project even more interesting. AnythingLLM is a very promising UI implementing this idea nicely, see: https://useanything.com, https://github.com/Mintplex-Labs/anything-llm.