Jiaxin Shan

Results 271 issues of Jiaxin Shan

### 🚀 Feature Description and Motivation AIbrix provides the distributed inference feature and it work out of box with vLLM. We should add some guidances on full weights Deepseek-R1 deployment...

kind/documentation
priority/important-longterm
area/distributed

### 🚀 Feature Description and Motivation ## Background Currently, vLLM only supports a single API key for authentication, making it difficult to share the inference engine across multiple tenants. Extending...

### 🚀 Feature Description and Motivation #### Orchestration 1. Deepseek-r1 full weights needs to be deployed using multi-node orchestration. If we adopt cross node TP, Let's make sure we unblock...

area/autoscaling
priority/critical-urgent
area/distributed

### 🐛 Describe the bug {"error":{"code":500,"message":"invalid character 'u' looking for beginning of value"}}% ![Image](https://github.com/user-attachments/assets/05cee72e-0cbd-4992-9e31-bdec664db8ab) ### Steps to Reproduce Deploy the RayClusterFleet and http route is not correctly created ### Expected...

kind/enhancement
area/gateway
priority/important-soon

### 🐛 Describe the bug ``` env: - name: ENABLE_PROBES_INJECTION value: '"false"' ``` this won't take effect. We need to remove the single quote The problem is it still inject...

kind/bug
area/distributed

### 🚀 Feature Description and Motivation Envoy gateway is the most important data plane component in AIBrix, to ensure high availability (HA) for Envoy Gateway with an external process, we...

area/gateway
kind/feature
area/stability

### 🐛 Describe the bug ![Image](https://github.com/user-attachments/assets/422997a1-d188-426d-8324-eb92524ec85e) ![Image](https://github.com/user-attachments/assets/dc7d77e5-5fd2-45bf-af7d-ea8982f6f31d) header and worker annotation should inherit from template labels ### Steps to Reproduce apply any rayclusterfleet yaml ### Expected behavior for common labels/annotations,...

kind/bug
area/distributed

### 🚀 Feature Description and Motivation Our external process in the Envoy Gateway tracks request routing using prefix cache awareness, making it stateful. To ensure consistency and availability across multiple...

kind/enhancement
area/gateway
priority/important-soon
area/stability

### 🐛 Describe the bug ``` python3 benchmark_serving.py --backend vllm --model deepseek-ai/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --base-url http://localhost:8888 --endpoint /v1/completions --num-prompts 100 --request-rate 2 --metric_percentiles '50,90,95,99' --goodput ttft:1000 tpot:100 --max-concurrency 200...

kind/bug
area/gateway
priority/critical-urgent

### 🚀 Feature Description and Motivation When we deploy deepseek 671B model using multi-node way, start up takes very long. It brings few problems 1. It's better to use `startupProbe`...

kind/documentation
kind/enhancement
priority/critical-urgent
area/performance