Jiaxin Shan

Results 271 issues of Jiaxin Shan

### 🚀 Feature Description and Motivation Delay scheduling request to avoid over-assignment to some inference engines. We actually have the discussion on the push or pull based solutions. This would...

area/gateway
priority/important-longterm
kind/feature

### 🚀 Feature Description and Motivation RAG and Agent patterns are all multi-thread programs, those application information should be exposed to underneath system to leverage for better colocation etc. ###...

help wanted
area/gateway
priority/important-soon
kind/feature
area/inference-engine

### 🚀 Feature Description and Motivation Currently, we are leveraging the Vineyard Operator to orchestrate workloads. While it provides a foundation, we've extended the upstream operator with advanced scheduling features...

kind/enhancement
priority/important-soon

### 🚀 Feature Description and Motivation ``` apiVersion: model.aibrix.ai/v1alpha1 kind: ModelAdapter metadata: name: text2sql-lora-1 namespace: default spec: baseModel: llama2-70b podSelector: matchLabels: model.aibrix.ai: llama2-70b additionalConfig: # could be model artifact etc....

area/lora
kind/feature

### 🚀 Feature Description and Motivation Currently, existing large language model (LLM) serving engines that execute multi-turn conversations are inefficient as they need to repeatedly compute the key-value (KV) caches...

kind/enhancement
area/distributed

### 🚀 Feature Description and Motivation ``` metricsSources: - endpoint: gpu-optimizer.aibrix-system.svc.cluster.local:8080 path: /metrics/aibrix-system/simulator-llama2-7b-a100 metric: "vllm:deployment_replicas" targetValue: "1" ``` In heterogeneous story, `gpu_optimizer` expose an endpoint `/metrics/${namespace}/${scale_target_name}`. Seem here're some issues,...

kind/bug
priority/critical-urgent
area/heterogeneous

### 🚀 Feature Description and Motivation Follow up issue here. https://github.com/aibrix/aibrix/issues/600 There's a potential improvement, scheduler should pick up the new pod rather than old pod. Otherwise it will experience...

kind/enhancement
area/lora
priority/important-longterm

### 🚀 Feature Description and Motivation ![Image](https://github.com/user-attachments/assets/cebe4cf5-fc18-425b-9257-ab1e5216d597) Varun raise great point on the checking logic 1. consider number of containers 2. Second, better to have a second loop to match...

kind/feature
area/kv-cache

### 🐛 Describe the bug 1. release actions are not working ![Image](https://github.com/user-attachments/assets/ba04adf1-5f9c-46b9-aeb0-2530663e9953) 2. pushing artifacts failed ![Image](https://github.com/user-attachments/assets/5af29c4b-bbc5-434f-8849-ba7e78d31211) ### Steps to Reproduce _No response_ ### Expected behavior _No response_ ### Environment...

kind/misc

### 🐛 Describe the bug ![image](https://github.com/user-attachments/assets/0827c1ef-94ed-4cdf-afce-0068d68150a1) Let's follow up on how to better support such case. ### Steps to Reproduce _No response_ ### Expected behavior _No response_ ### Environment _No...

kind/enhancement
area/lora
priority/important-soon