llm copied to clipboard
Retries w/ exponential backoff
Would be great to add this in - I fairly often see "model overloaded" types of errors with gpt3.5-turbo and this would make things a bit more resilient.
Here's how langchain is doing it, not that it's particularly unique problem to solve 🤷:
In the meantime, in case anyone runs across this, I'm using this (although I have only tested things manually a few times and I'm not 100% sure it's working):
import llm
from tenacity import (
| retry_if_exception_type(openai.error.APIError)
| retry_if_exception_type(openai.error.RateLimitError)
| retry_if_exception_type(openai.error.ServiceUnavailableError)
wait=wait_random_exponential(multiplier=1, max=40),
def llm_summary(text, key):
model = llm.get_model("gpt-3.5-turbo")
model.key = key
response = model.prompt(
text, system="Summarize the provided content"
return response.text()
The parameters here are based on the Tenacity docs and openai docs
I just found the need this (an "APIError" with "internal service error") while using the llm chat
feature, where it is a bit more of a pain when it fails. OpenAI's infrastructure is fairly flaky, and I think it needs client side retries.
Thanks @symbolicsorcerer, the above pattern is very useful when using openai models.
Currently I'm getting rate limit errors (429) with Claude:
anthropic.RateLimitError: Error code: 429 - {'type': 'error', 'error': {'type': 'rate_limit_error', 'message': 'Number of requests has exceeded your rate limit (https://docs.anthropic.com/claude/reference/rate-limits). Please try again later or contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase.'}}
Now I could circumvent this by adding anthropic.RateLimitError
to my imports but this is not ideal for plugins
This is what I'm doing:
def is_rate_limit_error(exception):
# List of fully qualified names of RateLimitError exceptions from various libraries
rate_limit_errors = [
# Add more as needed
exception_full_name = f"{exception.__class__.__module__}.{exception.__class__.__name__}"
logger.warning(f"Exception_full_name: {exception_full_name}")
logger.warning(f"Exception: {exception}")
return exception_full_name in rate_limit_errors
wait=wait_random_exponential(multiplier=1, max=40),
def query_model(model, *args, **kwargs):
return model.prompt(*args, **kwargs)
But it would be awesome if llm
could abstract this away from me