Jiaxin Shan
Jiaxin Shan
### 🚀 Feature Description and Motivation Delay scheduling request to avoid over-assignment to some inference engines. We actually have the discussion on the push or pull based solutions. This would...
### 🚀 Feature Description and Motivation RAG and Agent patterns are all multi-thread programs, those application information should be exposed to underneath system to leverage for better colocation etc. ###...
### 🚀 Feature Description and Motivation Currently, we are leveraging the Vineyard Operator to orchestrate workloads. While it provides a foundation, we've extended the upstream operator with advanced scheduling features...
### 🚀 Feature Description and Motivation ``` apiVersion: model.aibrix.ai/v1alpha1 kind: ModelAdapter metadata: name: text2sql-lora-1 namespace: default spec: baseModel: llama2-70b podSelector: matchLabels: model.aibrix.ai: llama2-70b additionalConfig: # could be model artifact etc....
### 🚀 Feature Description and Motivation Currently, existing large language model (LLM) serving engines that execute multi-turn conversations are inefficient as they need to repeatedly compute the key-value (KV) caches...
### 🚀 Feature Description and Motivation ``` metricsSources: - endpoint: gpu-optimizer.aibrix-system.svc.cluster.local:8080 path: /metrics/aibrix-system/simulator-llama2-7b-a100 metric: "vllm:deployment_replicas" targetValue: "1" ``` In heterogeneous story, `gpu_optimizer` expose an endpoint `/metrics/${namespace}/${scale_target_name}`. Seem here're some issues,...
### 🚀 Feature Description and Motivation Follow up issue here. https://github.com/aibrix/aibrix/issues/600 There's a potential improvement, scheduler should pick up the new pod rather than old pod. Otherwise it will experience...
### 🚀 Feature Description and Motivation  Varun raise great point on the checking logic 1. consider number of containers 2. Second, better to have a second loop to match...
### 🐛 Describe the bug 1. release actions are not working  2. pushing artifacts failed  ### Steps to Reproduce _No response_ ### Expected behavior _No response_ ### Environment...
### 🐛 Describe the bug  Let's follow up on how to better support such case. ### Steps to Reproduce _No response_ ### Expected behavior _No response_ ### Environment _No...