lorax
lorax copied to clipboard

Published 20 hours ago •

Reame
Issues

Async client to backoff when model overloaded

Open jppgks opened this issue 10 months ago • 1 comments

Feature request

Have the (async) client automatically backoff sending requests when the deployment is overloaded.

Motivation

When the async client exceeds the deployment queue capacity / rate limits, it currently fails with

OverloadedError: Model is overloaded

Apr 12 '24 16:04 jppgks