Jiaxin Shan issues

Results 271 issues of


                                            Jiaxin Shan

Build better logging experiences with zap/klogr and structure logging

### 🚀 Feature Description and Motivation - Option 1: zap (native solution), good for structure logging - Option 2: klogr + klog - most simplest solution with minimum changes no...

kind/enhancement

good first issue

help wanted

priority/important-soon

kind/misc

[CI] Generate helm package from kubebuilder manifests

### 🚀 Feature Description and Motivation Kubebuilder internally uses kustomize to manage installations. However, many users prefer using Helm for managing Kubernetes manifests. Currently, Kubebuilder does not support direct generation...

good first issue

help wanted

priority/critical-urgent

kind/feature

kind/misc

Model orchestration with heterogeneous hardwares

We meet a few cases that single deployment needs to be deployed across different chips due to quota or resource shortage. However, in Kubernetes, most of the time we use...

kind/enhancement

priority/important-soon

area/heterogeneous

Consider to integrate the LLM evaluation metrics to Autoscaling object

### 🚀 Feature Description and Motivation We already define a few autoscaling evaluation metrics like provision efficiency, SLO violations, resource usage etc. If would be great for controller to evaluate...

area/autoscaling

priority/important-longterm

kind/feature

Disable logs from specific engine path

### 🚀 Feature Description and Motivation This is a follow up of https://github.com/aibrix/aibrix/issues/419. We want to have more elegant implementation to disable logs from `/health` and `/metrics` in vLLM. I...

priority/important-longterm

kind/feature

area/inference-engine

docker build tag only takes commit into consideration and ignore the staging/dirty changes

### 🐛 Describe the bug in this case, if you do not commit the change, the docker tag would be always same. Sometimes, it's not that easy to debug when...

LLMPilot: Generate the best deployment configuration for model + GPU combination

### 🚀 Feature Description and Motivation For the 33b model deployment, we have a few options, A10, V100-32GiB, L20, L40. Technically, we can launch the instance using M * N...

help wanted

priority/critical-urgent

area/benchmark

Implement model architect aware scheduling policies

### 🚀 Feature Description and Motivation Currently, runtime picks up the work to download the model weights. If we have another replica wants to be deployed, one option is to...

priority/important-longterm

kind/feature

area/scheduling

Implement cold start manager for lora models

### 🚀 Feature Description and Motivation cache locality can be leveraged to reduce model startup time. As user uses up to 128 rank which is kind of large, this feature...

area/lora

priority/important-longterm

kind/feature

SLO-Driven Resource Management for vLLM

### 🚀 Feature Description and Motivation ## Background Different requests have varying input/output lengths, leading to diverse resource requirements. Currently, when a batch of requests gets scheduled together, it is...