aibrix
aibrix copied to clipboard
Cost-efficient and pluggable Infrastructure components for GenAI inference
### 🚀 Feature Description and Motivation I feel this process has be to improved for easy development  ### Use Case _No response_ ### Proposed Solution _No response_
### 🚀 Feature Description and Motivation The autoscaler should support scaling down to 0. When a new request arrives, we should have an activator component intercepts the request and initializes...
### 🚀 Feature Description and Motivation Currently, each TP (Tensor Parallism) process in StreamLoader reads all model files, resulting in duplicate file transfers and reads, which reduces the overall loading...
### 🚀 Feature Description and Motivation In the parameter list of the loading tensor in the StreamLoader library, `device` should be refined to `device_map`. And StreamLoader library should support device_map...
### 🚀 Feature Description and Motivation The network IO of a single process may have an upper limit, and adopting a multi process and multi thread approach can better utilize...
### 🚀 Feature Description and Motivation AI Runtime should support download models by parallel downloading of multiple files ### Use Case _No response_ ### Proposed Solution _No response_
### Summary The current runtime provides the ability to download model files from different sources, but lacks management capabilities for model files. ### Motivation This issue aims to provide model...
### 🚀 Feature Description and Motivation FYI ### Use Case _No response_ ### Proposed Solution _No response_
### 🚀 Feature Description and Motivation StreamLoader can persist models to disk in bypass. This way can avoid the network transmission overhead of loading the model from the current machine...
### 🐛 Describe the bug  reproduce steps 1. define one metric name 2. update metric metric name and apply the cr 3. monitor the autoscaler logs ### Steps to...