llm icon indicating copy to clipboard operation
llm copied to clipboard

Retries w/ exponential backoff

Open symbolicsorcerer opened this issue 11 months ago • 3 comments

Would be great to add this in - I fairly often see "model overloaded" types of errors with gpt3.5-turbo and this would make things a bit more resilient.

Here's how langchain is doing it, not that it's particularly unique problem to solve 🤷:

https://github.com/hwchase17/langchain/blob/c4ece52dac8fb93c38fe818d5f1b006d29539409/langchain/chat_models/openai.py#L64-L83

symbolicsorcerer avatar Jul 16 '23 05:07 symbolicsorcerer

In the meantime, in case anyone runs across this, I'm using this (although I have only tested things manually a few times and I'm not 100% sure it's working):

import llm
from tenacity import (
    retry,
    wait_random_exponential,
    stop_after_attempt,
    retry_if_exception_type,
)

@retry(
    retry=(
        retry_if_exception_type(openai.error.Timeout)
        | retry_if_exception_type(openai.error.APIError)
        | retry_if_exception_type(openai.error.RateLimitError)
        | retry_if_exception_type(openai.error.ServiceUnavailableError)
    ),
    wait=wait_random_exponential(multiplier=1, max=40),
    stop=stop_after_attempt(3),
)
def llm_summary(text, key):
    model = llm.get_model("gpt-3.5-turbo")
    model.key = key
    response = model.prompt(
        text, system="Summarize the provided content"
    )
    return response.text()

The parameters here are based on the Tenacity docs and openai docs

symbolicsorcerer avatar Jul 16 '23 06:07 symbolicsorcerer

I just found the need this (an "APIError" with "internal service error") while using the llm chat feature, where it is a bit more of a pain when it fails. OpenAI's infrastructure is fairly flaky, and I think it needs client side retries.

frabcus avatar Sep 19 '23 13:09 frabcus

Thanks @symbolicsorcerer, the above pattern is very useful when using openai models.

Currently I'm getting rate limit errors (429) with Claude:

anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of requests has exceeded your rate limit (https://docs.anthropic.com/claude/reference/rate-limits). Please try again later or contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}

Now I could circumvent this by adding anthropic.RateLimitError to my imports but this is not ideal for plugins

This is what I'm doing:

def is_rate_limit_error(exception):
    # List of fully qualified names of RateLimitError exceptions from various libraries
    rate_limit_errors = [
        "openai.error.RateLimitError",
        "anthropic.error.RateLimitError",
        # Add more as needed
    ]
    exception_full_name = f"{exception.__class__.__module__}.{exception.__class__.__name__}"
    logger.warning(f"Exception_full_name: {exception_full_name}")
    logger.warning(f"Exception: {exception}")
    return exception_full_name in rate_limit_errors


@retry(
    retry=retry_if_exception(is_rate_limit_error),
    wait=wait_random_exponential(multiplier=1, max=40),
    stop=stop_after_attempt(3),
)
def query_model(model, *args, **kwargs):
    return model.prompt(*args, **kwargs)

But it would be awesome if llm could abstract this away from me

cmungall avatar Mar 14 '24 02:03 cmungall