lorax icon indicating copy to clipboard operation
lorax copied to clipboard

Async client to backoff when model overloaded

Open jppgks opened this issue 10 months ago • 1 comments

Feature request

Have the (async) client automatically backoff sending requests when the deployment is overloaded.

Motivation

When the async client exceeds the deployment queue capacity / rate limits, it currently fails with

OverloadedError: Model is overloaded

jppgks avatar Apr 12 '24 16:04 jppgks