lorax
lorax copied to clipboard
Async client to backoff when model overloaded
Feature request
Have the (async) client automatically backoff sending requests when the deployment is overloaded.
Motivation
When the async client exceeds the deployment queue capacity / rate limits, it currently fails with
OverloadedError: Model is overloaded