Feature / Suggestion
In phi/llm/anthropic/claude.py, I think the invoke and invoke_stream methods would benifit from a try-except around the return statement to catch rate limit errors with a backoff mechanism. Something like this might work [WARNING: UNTESTED CODE]
class ExponentialBackoff:
def __init__(self, base: float = 2, max_retries: int = 5, max_backoff: float = 60):
self.base = base
self.max_retries = max_retries
self.max_backoff = max_backoff
self.retry_count = 0
def backoff(self):
backoff = self.base ** self.retry_count
self.retry_count += 1
return min(backoff, self.max_backoff)
def invoke(self, messages: List[Message]) -> AnthropicMessage:
api_kwargs: Dict[str, Any] = self.api_kwargs
api_messages: List[dict] = []
for m in messages:
if m.role == "system":
api_kwargs["system"] = m.content
else:
api_messages.append({"role": m.role, "content": m.content or ""})
backoff = ExponentialBackoff()
while True:
try:
return self.client.messages.create(
model=self.model,
messages=api_messages,
**api_kwargs,
)
except RateLimitError as e:
if backoff.retry_count > backoff.max_retries:
raise e # Maximum retries exceeded, raise the exception
delay = backoff.backoff()
print(f"Rate limit exceeded. Retrying in {delay} seconds...")
time.sleep(delay)
Really good idea, will test and probably release this week.
Another possible approach is to apply a decorator to invoke and invoke_stream functions that enforces a rate limit. This approach is more proactive but might require changes elsewhere to enable configuring (setting) the rate limit to enforce.
Great work on this repo. Everybody says it but it is worth repeating.
@jonny7737 decorators are a great idea (maybe using tenacity). Thank you for your help in making this better. im working on this :)
as for a timeline, im tinkering with a new concept and after putting that out will work on the retry logic as that seems to be a p0 for a number of use-cases
Hope I have contributed in some small way. Thanks for listening.
Keep up the great work!