dspy icon indicating copy to clipboard operation
dspy copied to clipboard

Use of LiteLLM Router for load balancing and fallbacks

Open denisergashbaev opened this issue 1 year ago • 18 comments

As DSPy is using LiteLLM internally, I wonder how to use the LiteLLM router. In particular, I would like to add load balancing and fallbacks via LiteLLM.

Another example. LiteLLM provides rate limit aware routing strategy that routes the call to the deployment with the lowest tokens per minute value (see https://github.com/BerriAI/litellm/discussions/4510, https://docs.litellm.ai/docs/routing#advanced---routing-strategies-%EF%B8%8F). I would want to use the router

Thank you

denisergashbaev avatar Oct 01 '24 17:10 denisergashbaev

Thanks! Maybe just launch their server and connect to it via the client dspy.LM? i.e., DSPy doesn't need to be invovled

okhat avatar Oct 01 '24 19:10 okhat

Link: https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb

okhat avatar Oct 02 '24 00:10 okhat

Thank you, Omar! I think it makes sense to expose router of LiteLLM as well. Under some circumstances, one would not want to run a separate proxy server and prefer using the LiteLLM Router for fallbacks, load balancing, and request prioritization

denisergashbaev avatar Oct 02 '24 07:10 denisergashbaev

@denisergashbaev Sorry, I'm mistaken about the nature of the LiteLLM router. I assumed it was inherently a proxy.

It's actually just a client-side thing, indeed: https://docs.litellm.ai/docs/routing

okhat avatar Oct 02 '24 08:10 okhat

Yes, I think we should support this. It seems like we should inherit dspy.LM and just accept a list of models instead of one model. This seems cool. Do you need this soon? We'd certainly appreciate a PR.

okhat avatar Oct 02 '24 08:10 okhat

Down to work on this if needed. @okhat maybe another function in dspy.LM similar to litellm_completion? Can also do a router kwarg to pass in a litellm.Router object.

zhaohan-dong avatar Oct 08 '24 08:10 zhaohan-dong

@zhaohan-dong Thanks a lot! How do you envision the interface looking like? Let's agree on the right API before doing anything intensive :D

okhat avatar Oct 08 '24 16:10 okhat

Exactly what I hoped to ascertain I think litellm tries to have similar signature for Router as the plain generation methods. So I'm thinking it could be progressively do dependency injection as first step:

  • An argument router: Optional[litellm.Router] = None
  • If router != None, the inference call would invoke router.text_completion()
  • If model not in the router's model_list, throw error (Not 100% this is the cleanest way)

Otherwise could inherit LM with AnotherClass(model_list=model_list, **kwargs), so people who wnat to use the router would use that, and not affecting other users. Not sure what's the best naming/file to put it in.

Happy to proceed either way or do something else you'd suggest.

Fundamentally I see the plain litellm.text_generation() as a special case of Router.text_generation(), where there's only one model in the model_list.

zhaohan-dong avatar Oct 08 '24 18:10 zhaohan-dong

Thanks a lot @zhaohan-dong ! I like the idea of a class that inherits from dspy.LM

okhat avatar Oct 10 '24 08:10 okhat

Awesome! Maybe RoutedLM as name?

zhaohan-dong avatar Oct 10 '24 08:10 zhaohan-dong

Thanks for the response and your willingness to help. Let me know if I could help as well.

denisergashbaev avatar Oct 26 '24 20:10 denisergashbaev

@denisergashbaev I tried a PR here: https://github.com/stanfordnlp/dspy/pull/1611. Dunno if you could collab?

zhaohan-dong avatar Oct 29 '24 09:10 zhaohan-dong

Hi! Checking in on this, I would love to leverage this if someone has on a branch somewhere. Would be happy to write some integration tests so that it can be merged eventually.

ryanh-ai avatar Nov 23 '24 18:11 ryanh-ai

@ryanh-ai Have a branch here https://github.com/zhaohan-dong/dspy/tree/litellm-router

zhaohan-dong avatar Nov 24 '24 00:11 zhaohan-dong

@okhat any updates on this? Currently we are using litellm proxy in our product as a workaround while we wait on this. Any update on this would be appreciated as launching proxy server along with out service to communicate with the llms has become a bit of a bottleneck in our use case.

AbhishekJanapareddi avatar Jan 10 '25 12:01 AbhishekJanapareddi

any updates on this? this is definitely a useful feature.

I think we can simplify the API even more and not even mention litellm or routing, to the user.

I propose we add another argument to the LM() constructor that will use the litellm router in the back.

for example:

dspy.LM(
    model='anthropic/claude-3-opus-20240229', 
    api_key='YOUR_ANTHROPIC_API_KEY',
    fallback_models = [
    {
        # we can even make this name optional
        'name': 'fallback 1',
        'params': {
            'model': 'anthropic/claude-3-opus-20240229',
            'api_base': 'https://api.anthropic.com',
            'api_key': 'YOUR_ANTHROPIC_API_KEY',
            'temperature': 0.1,
        }
    },
    {
        'name': 'fallback 2',
        'params': {
            'model': "vertex_ai/claude-3-haiku@20240307",
            'vertex_ai_project': vertex_ai_project,
            'vertex_ai_location': vertex_ai_location,
            'temperature': 1.0,

        }
    }]

)

everything from fallback_models will be sent to a litellm Router

onel avatar Feb 03 '25 17:02 onel

Yes this would be a very useful feature, as google models tend to return errors rather frequently so fallback / routing /load balancing is an important aspect for moving it to production.

I think we can simply support passing a liteLLM object like a router into dspy.LM as a model itseld. So instead of trying to adapt to liteLLM under the hood, we can just work with router API. User defines the LiteLLM router with params and pass as model

oneryalcin avatar Mar 04 '25 19:03 oneryalcin

PR opened: https://github.com/stanfordnlp/dspy/pull/8268

oneryalcin avatar May 23 '25 20:05 oneryalcin

why it is still open?

quancore avatar Nov 26 '25 03:11 quancore