llm
llm copied to clipboard
Retries w/ exponential backoff
Would be great to add this in - I fairly often see "model overloaded" types of errors with gpt3.5-turbo and this would make things a bit more resilient.
Here's how langchain is doing it, not that it's particularly unique problem to solve 🤷:
https://github.com/hwchase17/langchain/blob/c4ece52dac8fb93c38fe818d5f1b006d29539409/langchain/chat_models/openai.py#L64-L83
In the meantime, in case anyone runs across this, I'm using this (although I have only tested things manually a few times and I'm not 100% sure it's working):
import llm
from tenacity import (
retry,
wait_random_exponential,
stop_after_attempt,
retry_if_exception_type,
)
@retry(
retry=(
retry_if_exception_type(openai.error.Timeout)
| retry_if_exception_type(openai.error.APIError)
| retry_if_exception_type(openai.error.RateLimitError)
| retry_if_exception_type(openai.error.ServiceUnavailableError)
),
wait=wait_random_exponential(multiplier=1, max=40),
stop=stop_after_attempt(3),
)
def llm_summary(text, key):
model = llm.get_model("gpt-3.5-turbo")
model.key = key
response = model.prompt(
text, system="Summarize the provided content"
)
return response.text()
The parameters here are based on the Tenacity docs and openai docs
I just found the need this (an "APIError" with "internal service error") while using the llm chat
feature, where it is a bit more of a pain when it fails. OpenAI's infrastructure is fairly flaky, and I think it needs client side retries.
Thanks @symbolicsorcerer, the above pattern is very useful when using openai models.
Currently I'm getting rate limit errors (429) with Claude:
anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of requests has exceeded your rate limit (https://docs.anthropic.com/claude/reference/rate-limits). Please try again later or contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
Now I could circumvent this by adding anthropic.RateLimitError
to my imports but this is not ideal for plugins
This is what I'm doing:
def is_rate_limit_error(exception):
# List of fully qualified names of RateLimitError exceptions from various libraries
rate_limit_errors = [
"openai.error.RateLimitError",
"anthropic.error.RateLimitError",
# Add more as needed
]
exception_full_name = f"{exception.__class__.__module__}.{exception.__class__.__name__}"
logger.warning(f"Exception_full_name: {exception_full_name}")
logger.warning(f"Exception: {exception}")
return exception_full_name in rate_limit_errors
@retry(
retry=retry_if_exception(is_rate_limit_error),
wait=wait_random_exponential(multiplier=1, max=40),
stop=stop_after_attempt(3),
)
def query_model(model, *args, **kwargs):
return model.prompt(*args, **kwargs)
But it would be awesome if llm
could abstract this away from me