marvin icon indicating copy to clipboard operation
marvin copied to clipboard

Add support for custom LLM API endpoints

Open villagab4 opened this issue 1 year ago • 5 comments

First check

  • [X] I added a descriptive title to this issue.
  • [X] I used the GitHub search to look for a similar issue and didn't find it.
  • [X] I searched the Marvin documentation for this feature.

Describe the current behavior

Currently, ai functions are limited to Anthropic and OpenAI as LLM providers. This prevents me from creating @ai_fn which use my own hosted open source LLMs. An example AI function decorator in python:

from marvin import ai_fn
from marvin.engine.language_models.openai import OpenAIChatLLM
 
@ai_fn(model=OpenAIChatLLM(model="gpt-3.5-turbo-16k"))
def generate_summary(text: str) -> str:
"""
Generates a concise summary of the provided text.
"""

Describe the proposed behavior

Ideally, we would have the ability to easily write our own ChatLLM instance in which we might define an API endpoint where the function should be called. There are several inference services (Huggingface, Replicate, Pyq, etc.) which would allow me to host an open source LLM and later fine tune that LLM for a specific function or task. At a high level, we could have

class llamaTwoChatLLM(ChatLLM):

     model_endpoint: str = "https://<some_inference_endpoint>"    

     async def run():
       ...

Even supporting the replicate SDK would be great for now

Example Use

class llamaTwoChatLLM(ChatLLM):
     model: str = "llama-2" # example
     model_endpoint: str = "https://<some_inference_endpoint>"    

     async def run():
       ...

Then the AI function could be

from marvin import ai_fn
from schemas.llm_chat import llamaTwoChatLLM
 
@ai_fn(model=llamaTwoChatLLM())
def generate_summary(text: str) -> str:
"""
Generates a concise summary of the provided text.
"""

Additional context

I'm sure this sort of open source LLM support is already on your roadmap, though I thought I'd create an issue in case other folks felt strongly as well. Big fan of marvin so far and excited to continue using the framework 😁

villagab4 avatar Aug 11 '23 18:08 villagab4

What about running on-premise, without accessing any API in the cloud? Would that be an option?

ADD-eNavarro avatar Sep 07 '23 07:09 ADD-eNavarro

What about running on-premise, without accessing any API in the cloud? Would that be an option?

that should work with the above proposal if you create a simple web API wrapper (Flask, FastAPI) and use a localhost URL

lostmygithubaccount avatar Sep 07 '23 10:09 lostmygithubaccount

Hm... aiming the model_endpoint to the local URL, right? So the model in the deepest layer, a web API wrapper over it, and Marvin on top of that... once it accepts custom endpoints. Considering we're working with C# and GRPC to call Python services, the complexity of it all skyrockets pretty fast. But if it's feasible, it will be tryed :)

Thanks!

ADD-eNavarro avatar Sep 07 '23 10:09 ADD-eNavarro

llama.cpp includes a server which runs an OpenAI-compatible API. The docs there give an example of using the openai python package with the client initialized with a base_url pointing to the locally-hosted server. You might try that.

I see there's also https://github.com/PrefectHQ/marvin/pull/610 suggesting using the LiteLLM proxy for other services.

bakkot avatar Dec 31 '23 02:12 bakkot

I also think that using alternative (local/remote) LLMs at least with the same API would make this project even more interesting. AnythingLLM is a very promising UI implementing this idea nicely, see: https://useanything.com, https://github.com/Mintplex-Labs/anything-llm.

deeplook avatar Mar 08 '24 08:03 deeplook