aibrix Adding scaling down to 0 case Gateway handling

Adding scaling down to 0 case Gateway handling

Open nwangfw opened this issue 2 months ago • 0 comments

🚀 Feature Description and Motivation

The autoscaler should support scaling down to 0. When a new request arrives, we should have an activator component intercepts the request and initializes a new pod. Right now, we will simply get the following error if the number of replicas is 0 for a model inference request.

error on getting pods for model llama2-7b

Use Case

No response

Proposed Solution

No response

Dec 04 '24 00:12 nwangfw

aibrix aibrix copied to clipboard

Adding scaling down to 0 case Gateway handling

🚀 Feature Description and Motivation

Use Case

Proposed Solution

aibrix
aibrix copied to clipboard