aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

Cost-efficient and pluggable Infrastructure components for GenAI inference

Results 339 aibrix issues
Sort by recently updated
recently updated
newest added

### 🚀 Feature Description and Motivation I feel this process has be to improved for easy development ![image](https://github.com/user-attachments/assets/f9f533ad-64bd-4f2f-9d91-29a234e0d4bc) ### Use Case _No response_ ### Proposed Solution _No response_

kind/enhancement
priority/critical-urgent
area/heterogeneous

### 🚀 Feature Description and Motivation The autoscaler should support scaling down to 0. When a new request arrives, we should have an activator component intercepts the request and initializes...

area/autoscaling
area/gateway
kind/feature

### 🚀 Feature Description and Motivation Currently, each TP (Tensor Parallism) process in StreamLoader reads all model files, resulting in duplicate file transfers and reads, which reduces the overall loading...

kind/enhancement
area/acceleration
priority/critical-urgent
area/model-loader

### 🚀 Feature Description and Motivation In the parameter list of the loading tensor in the StreamLoader library, `device` should be refined to `device_map`. And StreamLoader library should support device_map...

kind/enhancement
priority/important-soon
area/model-loader

### 🚀 Feature Description and Motivation The network IO of a single process may have an upper limit, and adopting a multi process and multi thread approach can better utilize...

kind/enhancement
area/acceleration
area/model-loader

### 🚀 Feature Description and Motivation AI Runtime should support download models by parallel downloading of multiple files ### Use Case _No response_ ### Proposed Solution _No response_

priority/important-longterm
area/runtime
area/model-loader

### Summary The current runtime provides the ability to download model files from different sources, but lacks management capabilities for model files. ### Motivation This issue aims to provide model...

priority/important-longterm
area/runtime

### 🚀 Feature Description and Motivation FYI ### Use Case _No response_ ### Proposed Solution _No response_

### 🚀 Feature Description and Motivation StreamLoader can persist models to disk in bypass. This way can avoid the network transmission overhead of loading the model from the current machine...

kind/enhancement
priority/important-soon
area/model-loader

### 🐛 Describe the bug ![image](https://github.com/user-attachments/assets/470db7bd-2a31-4317-8536-b94d826da1e8) reproduce steps 1. define one metric name 2. update metric metric name and apply the cr 3. monitor the autoscaler logs ### Steps to...