Jiaxin Shan

Results 271 issues of Jiaxin Shan

### 🚀 Feature Description and Motivation - Option 1: zap (native solution), good for structure logging - Option 2: klogr + klog - most simplest solution with minimum changes no...

kind/enhancement
good first issue
help wanted
priority/important-soon
kind/misc

### 🚀 Feature Description and Motivation Kubebuilder internally uses kustomize to manage installations. However, many users prefer using Helm for managing Kubernetes manifests. Currently, Kubebuilder does not support direct generation...

good first issue
help wanted
priority/critical-urgent
kind/feature
kind/misc

We meet a few cases that single deployment needs to be deployed across different chips due to quota or resource shortage. However, in Kubernetes, most of the time we use...

kind/enhancement
priority/important-soon
area/heterogeneous

### 🚀 Feature Description and Motivation We already define a few autoscaling evaluation metrics like provision efficiency, SLO violations, resource usage etc. If would be great for controller to evaluate...

area/autoscaling
priority/important-longterm
kind/feature

### 🚀 Feature Description and Motivation This is a follow up of https://github.com/aibrix/aibrix/issues/419. We want to have more elegant implementation to disable logs from `/health` and `/metrics` in vLLM. I...

priority/important-longterm
kind/feature
area/inference-engine

### 🐛 Describe the bug in this case, if you do not commit the change, the docker tag would be always same. Sometimes, it's not that easy to debug when...

### 🚀 Feature Description and Motivation For the 33b model deployment, we have a few options, A10, V100-32GiB, L20, L40. Technically, we can launch the instance using M * N...

help wanted
priority/critical-urgent
area/benchmark

### 🚀 Feature Description and Motivation Currently, runtime picks up the work to download the model weights. If we have another replica wants to be deployed, one option is to...

priority/important-longterm
kind/feature
area/scheduling

### 🚀 Feature Description and Motivation cache locality can be leveraged to reduce model startup time. As user uses up to 128 rank which is kind of large, this feature...

area/lora
priority/important-longterm
kind/feature

### 🚀 Feature Description and Motivation ## Background Different requests have varying input/output lengths, leading to diverse resource requirements. Currently, when a batch of requests gets scheduled together, it is...