Jiaxin Shan
Jiaxin Shan
### 🚀 Feature Description and Motivation AIbrix provides the distributed inference feature and it work out of box with vLLM. We should add some guidances on full weights Deepseek-R1 deployment...
### 🚀 Feature Description and Motivation ## Background Currently, vLLM only supports a single API key for authentication, making it difficult to share the inference engine across multiple tenants. Extending...
### 🚀 Feature Description and Motivation #### Orchestration 1. Deepseek-r1 full weights needs to be deployed using multi-node orchestration. If we adopt cross node TP, Let's make sure we unblock...
### 🐛 Describe the bug {"error":{"code":500,"message":"invalid character 'u' looking for beginning of value"}}%  ### Steps to Reproduce Deploy the RayClusterFleet and http route is not correctly created ### Expected...
### 🐛 Describe the bug ``` env: - name: ENABLE_PROBES_INJECTION value: '"false"' ``` this won't take effect. We need to remove the single quote The problem is it still inject...
### 🚀 Feature Description and Motivation Envoy gateway is the most important data plane component in AIBrix, to ensure high availability (HA) for Envoy Gateway with an external process, we...
### 🐛 Describe the bug   header and worker annotation should inherit from template labels ### Steps to Reproduce apply any rayclusterfleet yaml ### Expected behavior for common labels/annotations,...
### 🚀 Feature Description and Motivation Our external process in the Envoy Gateway tracks request routing using prefix cache awareness, making it stateful. To ensure consistency and availability across multiple...
### 🐛 Describe the bug ``` python3 benchmark_serving.py --backend vllm --model deepseek-ai/deepseek-r1 --trust-remote-code --served-model-name deepseek-r1-671b --base-url http://localhost:8888 --endpoint /v1/completions --num-prompts 100 --request-rate 2 --metric_percentiles '50,90,95,99' --goodput ttft:1000 tpot:100 --max-concurrency 200...
### 🚀 Feature Description and Motivation When we deploy deepseek 671B model using multi-node way, start up takes very long. It brings few problems 1. It's better to use `startupProbe`...