aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

Adding scaling down to 0 case Gateway handling

Open nwangfw opened this issue 2 months ago • 0 comments

🚀 Feature Description and Motivation

The autoscaler should support scaling down to 0. When a new request arrives, we should have an activator component intercepts the request and initializes a new pod. Right now, we will simply get the following error if the number of replicas is 0 for a model inference request.

error on getting pods for model llama2-7b

Use Case

No response

Proposed Solution

No response

nwangfw avatar Dec 04 '24 00:12 nwangfw