Jiaxin Shan issues

Results 271 issues of


                                            Jiaxin Shan

Add Deepseek R1 multi-node example

### 🚀 Feature Description and Motivation AIbrix provides the distributed inference feature and it work out of box with vLLM. We should add some guidances on full weights Deepseek-R1 deployment...

kind/documentation

priority/important-longterm

area/distributed

Support per user api-key for multi-tenant use case

### 🚀 Feature Description and Motivation ## Background Currently, vLLM only supports a single API key for authentication, making it difficult to share the inference engine across multiple tenants. Extending...

Support multi-node & autoscaling & routing together for models like Deepseek-R1

### 🚀 Feature Description and Motivation #### Orchestration 1. Deepseek-r1 full weights needs to be deployed using multi-node orchestration. If we adopt cross node TP, Let's make sure we unblock...

area/autoscaling

priority/critical-urgent

area/distributed

We still see some errors that not explainable if httpRoute is missing

### 🐛 Describe the bug {"error":{"code":500,"message":"invalid character 'u' looking for beginning of value"}}% ![Image](https://github.com/user-attachments/assets/05cee72e-0cbd-4992-9e31-bdec664db8ab) ### Steps to Reproduce Deploy the RayClusterFleet and http route is not correctly created ### Expected...

kind/enhancement

area/gateway

priority/important-soon

KubeRay dependency ENABLE_PROBES_INJECTION value is wrong

### 🐛 Describe the bug ``` env: - name: ENABLE_PROBES_INJECTION value: '"false"' ``` this won't take effect. We need to remove the single quote The problem is it still inject...

kind/bug

area/distributed

Support high availability of gateway server for production users

### 🚀 Feature Description and Motivation Envoy gateway is the most important data plane component in AIBrix, to ensure high availability (HA) for Envoy Gateway with an external process, we...

area/gateway

kind/feature

area/stability

RayClusterReplicaSet didn't populate annotations to headers and workers

### 🐛 Describe the bug ![Image](https://github.com/user-attachments/assets/422997a1-d188-426d-8324-eb92524ec85e) ![Image](https://github.com/user-attachments/assets/dc7d77e5-5fd2-45bf-af7d-ea8982f6f31d) header and worker annotation should inherit from template labels ### Steps to Reproduce apply any rayclusterfleet yaml ### Expected behavior for common labels/annotations,...

kind/bug

area/distributed

Stateful information sync for ext-proc Instances

### 🚀 Feature Description and Motivation Our external process in the Envoy Gateway tracks request routing using prefix cache awareness, making it stateful. To ensure consistency and availability across multiple...

kind/enhancement

area/gateway

priority/important-soon

area/stability

Failed to run benchmark scripts against the endpoint

### 🐛 Describe the bug ``` python3 benchmark_serving.py --backend vllm --model deepseek-ai/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --base-url http://localhost:8888 --endpoint /v1/completions --num-prompts 100 --request-rate 2 --metric_percentiles '50,90,95,99' --goodput ttft:1000 tpot:100 --max-concurrency 200...

kind/bug

area/gateway

priority/critical-urgent

Add probe usage practice for super large models, including multi-node case

### 🚀 Feature Description and Motivation When we deploy deepseek 671B model using multi-node way, start up takes very long. It brings few problems 1. It's better to use `startupProbe`...

kind/documentation

kind/enhancement

priority/critical-urgent

area/performance