aibrix How to install and deploy AIBrix on a single server?

How to install and deploy AIBrix on a single server?

Oct 22 '25 01:10 Alan-D-Chen

This is the original question. https://github.com/vllm-project/aibrix/issues/1396#issuecomment-3425452314 I kindly request your help. Thank you. @Jeffwan @googs1025

Oct 22 '25 01:10 Alan-D-Chen

Thank you very much for your reply. However, for beginners who are new to LLM AIBrix, the first installation and use can be difficult. Is there a more clear and straightforward installation and usage tutorial? I have a server with 8 L40S GPUs (I have the Qwen3-32B model files here, and of course other models as well, like Llama3.2 3B).

step 1 https://aibrix.readthedocs.io/latest/getting_started/installation/lambda.html

I encountered some problems during the installation process. First, do I need to create a Docker container on the server and then install AIBrix in this container? According to the tutorial you provided, I tried to install Lambda Cloud. But on the Lambda docs official website (https://docs.lambda.ai/), I don't know how to select and install it.

step 2 I get into https://docs.lambda.ai/ go to the PUBLIC CLOUD

I get these: https://docs.lambda.ai/education/ ,

Large language models (LLMs)# 大型语言模型（LLMs）# Deploying a Llama 3 inference endpoint 部署Llama 3推理端点 Deploying Llama 3.2 3B in a Kubernetes (K8s) cluster 在Kubernetes（K8s）集群中部署Llama 3.2 3B Using KubeAI to deploy Nous Research's Hermes 3 and other LLMs 使用KubeAI部署Nous Research的Hermes 3和其他大语言模型 Serving Llama 3.1 405B on a Lambda 1-Click Cluster 在Lambda一键式集群上运行Llama 3.1 405B Serving the Llama 3.1 8B and 70B models using Lambda Cloud on-demand instances 使用Lambda Cloud按需实例部署Llama 3.1 8B和70B模型 Running DeepSeek-R1 70B using Ollama 使用Ollama运行DeepSeek-R1 70B

Which one should I choose？ Whatever，I get into https://docs.lambda.ai/education/large-language-models/k8s-ollama-llama-3-2/.

IS all right ?

Oct 22 '25 05:10 Alan-D-Chen

Is there a contradiction between creating a Docker container on the server and using "Deploying Llama 3.2 3B in a Kubernetes (K8s) cluster"? If there is no contradiction, should we first create a Docker container and then deploy Llama 3.2 3B in the Kubernetes (K8s) cluster, or first deploy Llama 3.2 3B in the Kubernetes (K8s) cluster and then create a Docker container? I am very confused.

@Jeffwan Please forgive my forwardness. I am eager to deploy AIBrix on a single server, but I have indeed encountered many problems. Thank you very much for your patience and attention.

Oct 22 '25 05:10 Alan-D-Chen

@Jeffwan At the very beginning, I followed this tutorial to perform the operations. https://aibrix.readthedocs.io/latest/getting_started/installation/installation.html#install-aibrix-in-testing-environments
First, I used the Dockerfile to create an image, then used this image to create a Docker container,

# 基于现有NVIDIA PyTorch镜像（自带GPU/CUDA支持）
FROM nvcr.io/nvidia/pytorch:24.09-py3

# 关闭交互模式，加速系统包安装
ENV DEBIAN_FRONTEND=noninteractive

# 安装系统依赖工具并清理缓存
RUN apt-get update && apt-get install -y \
    sudo \
    apt-transport-https \
    ca-certificates \
    software-properties-common \
    systemd \
    curl \
    && rm -rf /var/lib/apt/lists/*

# 安装kubectl（通过官方GitHub release，使用国内加速代理）
RUN set -eux; \
    # 使用GitHub镜像加速下载v1.28.0版本kubectl（避免404）
    curl -fL --retry 3 "https://ghproxy.com/https://github.com/kubernetes/kubernetes/releases/download/v1.28.0/kubectl-linux-amd64" -o kubectl \
    && chmod +x kubectl \
    && mv kubectl /usr/local/bin/ \
    && kubectl version --client  # 验证安装

# 安装K3s单节点集群（国内源加速，添加启动验证）
RUN set -eux; \
    curl -sfL --http1.1 https://rancher-mirror.oss-cn-beijing.aliyuncs.com/k3s/k3s-install.sh | sh -s - \
        --disable traefik \
        --write-kubeconfig-mode 644 \
        --kubeconfig /etc/rancher/k3s/k3s.yaml \
        --data-dir /var/lib/k3s \
        --system-default-registry "registry.cn-hangzhou.aliyuncs.com"; \
    # 等待K3s服务启动（最多等待60秒）
    count=0; \
    until systemctl is-active --quiet k3s; do \
        echo "等待K3s服务启动..."; \
        sleep 5; \
        if [ $((++count)) -ge 12 ]; then \
            echo "K3s启动超时！"; \
            exit 1; \
        fi; \
    done

# 配置kubectl默认连接本地K3s集群
RUN mkdir -p /root/.kube && ln -s /etc/rancher/k3s/k3s.yaml /root/.kube/config

# 复制AIBrix安装脚本并赋予执行权限
COPY install-aibrix.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/install-aibrix.sh

# 容器启动后自动执行AIBrix安装脚本
CMD ["install-aibrix.sh"]

and after that, installed the AIBrix environment according to this file -- install-aibrix.sh .

docker run -itd --name aibrix-alanchen --privileged --gpus all --network host --restart always aibrix-ready:v0.4.1

#!/bin/bash
set -e

# 等待K3s集群就绪（最多30秒）
echo "=== 等待K3s集群启动 ==="
for i in {1..30}; do
    if kubectl get nodes &> /dev/null; then
        echo "K3s集群就绪！"
        break
    fi
    sleep 1
done

# 安装AIBrix依赖组件（Envoy Gateway + KubeRay）
echo -e "\n=== 安装AIBrix依赖 ==="
kubectl apply -f https://github.com/vllm-project/aibrix/releases/download/v0.4.1/aibrix-dependency-v0.4.1.yaml --server-side

# 等待依赖组件启动（超时5分钟）
kubectl wait --for=condition=available deployment --all -n aibrix-system --timeout=300s || true
kubectl wait --for=condition=available deployment --all -n envoy-gateway-system --timeout=300s || true

# 安装AIBrix核心组件
echo -e "\n=== 安装AIBrix核心 ==="
kubectl apply -f https://github.com/vllm-project/aibrix/releases/download/v0.4.1/aibrix-core-v0.4.1.yaml

# 验证安装结果
echo -e "\n=== 安装完成！当前AIBrix组件状态 ==="
kubectl get pods -n aibrix-system
kubectl get pods -n envoy-gateway-system

# 保持容器运行（监控K3s日志）
tail -f /var/log/k3s/k3s.log

Oct 22 '25 05:10 Alan-D-Chen

But it ultimately failed. Could you please tell me if there is a quick and effective installation tutorial? If possible, I am willing to write a single-server version of the AIBrix installation tutorial based on my actual experience, hoping to make my contribution to the AIBrix community. @Jeffwan Thanks again.

Oct 22 '25 05:10 Alan-D-Chen

@googs1025 🧑🏻‍💻Thank you very much for your reminder. It is dedicated and professional contributors like you who have been continuously promoting the progress and development of the community. I will follow in your footsteps. Could you please help me solve this problem? https://github.com/vllm-project/aibrix/issues/1690 Thank you so much.

Oct 22 '25 05:10 Alan-D-Chen

@Alan-D-Chen I think you just need to follow this guidance. this page gives you everything you need. what's you followed like lambda cloud llama installation is not helpful. Seems you tried some unrelated guidances.

https://aibrix.readthedocs.io/latest/getting_started/installation/lambda.html

Let me know if you encounter other issues.

Oct 22 '25 06:10 Jeffwan

hi，goodmoring @Jeffwan . Hello, I'm sorry, but I've encountered a tricky problem here. When proceeding to this step (https://aibrix.readthedocs.io/latest/getting_started/installation/lambda.html),

I ran into an issue:

kubectl get pods -n aibrix-system

these six Pods cannot be obtained.

I got five of them (or similar versions) from other sources.

I need to modify the tag of kuberay/operator:nightly to meet the requirements of kubectl get pods -n aibrix-system.

However, aibrix/gpu-optimizer:v0.4.1 cannot be found at all with the server and the local PC.

And I change the docker daemon.json:

final, these do not work at all. Can you do me a favor? thank you !

Oct 24 '25 04:10 Alan-D-Chen

AND, I can not get good help from issues: https://github.com/vllm-project/aibrix/pull/1682 https://github.com/vllm-project/aibrix/pull/1539

Oct 24 '25 05:10 Alan-D-Chen

However, aibrix/gpu-optimizer:v0.4.1 cannot be found at all with the server and the local PC.

where did you find this image? did you follow the guidance exactly? or you fetch and retag your self?

we use runtime image for gpu-optimizer in both helm and kustomize way

https://github.com/vllm-project/aibrix/blob/dfb5b35c97c236d2ee9322df08d6d747f6aff3ad/dist/chart/stable.yaml#L90-L92

Oct 24 '25 05:10 Jeffwan

@Jeffwan My bad. Here's the situation: I tried to download aibrix/gpu-optimizer by running docker pull aibrix/gpu-optimizer:v0.4.1 both on my local PC (which I will upload to the server later) and directly on the server, but failed to find it in both places. I also searched for aibrix/gpu-optimizer:v0.4.1 on the Docker official website and Docker Desktop, but it doesn't exist at all. Additionally, when I looked for kuberay/operator:v1.1.0 on the Docker official website and Docker Desktop, it wasn't available either. The only similar version I found is kuberay/operator:nightly. How can I obtain the correct versions of kuberay/operator:v1.1.0 and aibrix/gpu-optimizer:v0.4.1?? The command kubectl get pods -n aibrix-system cannot be executed.

Oct 24 '25 06:10 Alan-D-Chen

My bad。是这样的，我分别在我本地的PC 上（我使用了VPN docker pull，稍后再上传到服务器上）和服务器上（无法使用 VPN）使用 docker pull aibrix/gpu-optimizer:v0.4.1 下载 aibrix/gpu-optimizer 但是在docker 官方网站都找不到。我在 docker 官网和 docker desktop 查找是否有 aibrix/gpu-optimizer:v0.4.1 ，发现没有 aibrix/gpu-optimizer:v0.4.1 。而且在在 docker 官网和 docker desktop 查找 kuberay/operator:v1.1.0，也没有，kuberay/operator:v1.1.0。只有一个相似的版本 kuberay/operator:nightly. 我该如何获取正确的kuberay/operator:v1.1.0 和 aibrix/gpu-optimizer:v0.4.1 呢？这条命令 kubectl get pods -n aibrix-system 无法执行。

root@a7:/home/chendong/aibrix# kubectl get pods -n aibrix-system
NAME                                         READY   STATUS                  RESTARTS   AGE
aibrix-controller-manager-57bd8857f4-v2l65   0/1     ImagePullBackOff        0          15h
aibrix-gateway-plugins-7dfc7569b-sdtlp       0/1     Init:ImagePullBackOff   0          15h
aibrix-gpu-optimizer-7d9bbf9c7c-c69zc        0/1     ImagePullBackOff        0          15h
aibrix-kuberay-operator-9b8548c98-ghtgq      0/1     ImagePullBackOff        0          15h
aibrix-metadata-service-7668b6f95d-dvgzh     0/1     Init:ImagePullBackOff   0          15h
aibrix-redis-master-56cbb99b6b-qkzkq         0/1     ImagePullBackOff        0          15h
root@a7:/home/chendong/aibrix# ps aux | grep minikube | grep tunnel
root      957570  0.1  0.0 2657904 92060 pts/18  Sl+  01:52   0:00 minikube tunnel

Oct 24 '25 06:10 Alan-D-Chen

我知道你说的那个地方：

Oct 24 '25 06:10 Alan-D-Chen

@Jeffwan 请问，这句命令kubectl get pods -n aibrix-system 是只能这样使用？还是说，可以添加一个国内源头，可以一次性方便的下载？我现在就差最后一步了。我在 AIBrix 的 issue 里面，也没有找到合理的解决办法。如果可以的话，您可以帮助我一下吗？

Oct 24 '25 06:10 Alan-D-Chen

这是在本地 PC上操作的结果：

这是在服务器上操作的结果：

Oct 24 '25 06:10 Alan-D-Chen

@Jeffwan 一句话说明白~~ 我在执行 kubectl get pods -n aibrix-system 这句命令的时候（这是安装 AIBrix 的最后一步），服务器找不到这六个东西。

我只能通过不同的途径 docker pull 一个一个下载，最终 aibrix/gpu-optimizer:v0.4.1 和 kuberay/operator:v1.1.0 都找不到或者说找不到合适的版本。如何解决这个问题呢？我真的已经使用中文和英文尽可能的说明白这个问题了。

Oct 24 '25 06:10 Alan-D-Chen

我在 docker hub 这个官网上进行了查找，根本就没有 aibrix/gpu-optimizer 这个 images ：

Oct 24 '25 07:10 Alan-D-Chen

maybe, I get it . wait a sec~~

Oct 24 '25 07:10 Alan-D-Chen

@Jeffwan 您好，很抱歉再次打搅您。这是来自来自一个心力交瘁的 coder 的问题。对于 kubectl get pods -n aibrix-system，我反复的得到一系列问题：（这是我在短短几秒钟中得到结果，在极短的时间内，一些 pods 就不能使用了）

root@a7:/home/chendong/aibrix# kubectl get pods -n aibrix-system
NAME                                         READY   STATUS              RESTARTS   AGE
aibrix-controller-manager-57bd8857f4-wxgt4   0/1     Running             0          11s
aibrix-gateway-plugins-7dfc7569b-g4q25       0/1     Init:0/1            0          11s
aibrix-gpu-optimizer-7d9bbf9c7c-rghlg        1/1     Running             0          11s
aibrix-kuberay-operator-9b8548c98-r4lsf      0/1     ContainerCreating   0          11s
aibrix-metadata-service-7668b6f95d-fjt79     0/1     Init:0/1            0          11s
aibrix-redis-master-56cbb99b6b-fbzl9         0/1     ContainerCreating   0          11s
root@a7:/home/chendong/aibrix# kubectl get pods -n aibrix-system
NAME                                         READY   STATUS              RESTARTS   AGE
aibrix-controller-manager-57bd8857f4-wxgt4   1/1     Running             0          16s
aibrix-gateway-plugins-7dfc7569b-g4q25       0/1     Init:0/1            0          16s
aibrix-gpu-optimizer-7d9bbf9c7c-rghlg        1/1     Running             0          16s
aibrix-kuberay-operator-9b8548c98-r4lsf      0/1     ContainerCreating   0          16s
aibrix-metadata-service-7668b6f95d-fjt79     0/1     Init:0/1            0          16s
aibrix-redis-master-56cbb99b6b-fbzl9         0/1     ContainerCreating   0          16s
root@a7:/home/chendong/aibrix# kubectl get pods -n aibrix-system
NAME                                         READY   STATUS              RESTARTS   AGE
aibrix-controller-manager-57bd8857f4-wxgt4   1/1     Running             0          19s
aibrix-gateway-plugins-7dfc7569b-g4q25       0/1     Init:0/1            0          19s
aibrix-gpu-optimizer-7d9bbf9c7c-rghlg        1/1     Running             0          19s
aibrix-kuberay-operator-9b8548c98-r4lsf      0/1     ContainerCreating   0          19s
aibrix-metadata-service-7668b6f95d-fjt79     0/1     Init:0/1            0          19s
aibrix-redis-master-56cbb99b6b-fbzl9         0/1     ErrImagePull        0          19s

我想我已经解决了 images 的问题：

root@a7:/home/chendong/aibrix# minikube cache list
root@a7:/home/chendong/aibrix# minikube image list
registry.k8s.io/pause:3.10.1
registry.k8s.io/kube-scheduler:v1.34.0
registry.k8s.io/kube-proxy:v1.34.0
registry.k8s.io/kube-controller-manager:v1.34.0
registry.k8s.io/kube-apiserver:v1.34.0
registry.k8s.io/etcd:3.6.4-0
registry.k8s.io/coredns/coredns:v1.12.1
nvcr.io/nvidia/k8s-device-plugin:<none>
gcr.io/k8s-minikube/storage-provisioner:v5
docker.io/library/redis:7.4
docker.io/library/busybox:stable
docker.io/kuberay/operator:v1.1.0
docker.io/kuberay/operator:nightly
docker.io/envoyproxy/gateway:v1.2.8
docker.io/envoyproxy/envoy:v1.33.2
docker.io/aibrix/runtime:v0.4.1
docker.io/aibrix/metadata-service:v0.4.1
docker.io/aibrix/gateway-plugins:v0.4.1
docker.io/aibrix/controller-manager:v0.4.1

Oct 24 '25 10:10 Alan-D-Chen

@Jeffwan 我一时情急，只能使用汉语提问。要是有什么不妥，我可以使用英文再问一遍。请问是需要所有的 pods 都是 ready 的吗？才可以进入到 AIbrix 的使用吗？

@Jeffwan 请问该如何解决这问题呢？我参考了多个方法，仍然无法解决这问题。如果可以的话，请您帮我一下，谢谢。

Oct 24 '25 11:10 Alan-D-Chen

我是严格按照 https://aibrix.readthedocs.io/latest/getting_started/installation/lambda.html# 这个教程进行操作的。

Oct 24 '25 11:10 Alan-D-Chen

镜像问题

Oct 24 '25 15:10 googs1025

镜像问题

@googs1025 ，如果可以的话，可以详细说说吗？谢谢～～初学者真的很迷茫。我已经搞了差不多 5天时间了～

Oct 24 '25 16:10 Alan-D-Chen

部署方式有很多：

可以使用 Quickstart: https://aibrix.readthedocs.io/latest/getting_started/quickstart.html 直接部署
也可以使用社区提供的 helm 部署：https://github.com/vllm-project/aibrix/tree/main/dist/chart

请自行解决网络无法拉取镜像的问题，另外如果是手动拉取镜像请使用 minikube load 把镜像 load 到 minikube 中

Oct 24 '25 16:10 googs1025

Please accept my heartfelt thanks for your strong support. I utilized the AIBrix framework to perform both aggregated/centralized and Prefill/Decode (PD) disaggregated inference tests within a single server environment. It is my hope that this contribution will aid developers and the broader coding community in their use of AIBrix. @Jeffwan @googs1025

Nov 21 '25 08:11 Alan-D-Chen

And results:

Nov 21 '25 08:11 Alan-D-Chen

这是我对 AIBrix 集中式、PD分离式推理的一点理解，不知道对不对。希望大家指正。还有，之前是不是说要推出不使用 K8s 或者 minikube 的版本吗？现在好了吗？正常来说 AIbrix 是需要部署在数十台服务器上管理成百上千个GPU的，对吗？ @Jeffwan

Nov 21 '25 08:11 Alan-D-Chen

@Alan-D-Chen awesome work!

之前是不是说要推出不使用 K8s 或者 minikube 的版本吗？现在好了吗？正常来说 AIbrix 是需要部署在数十台服务器上管理成百上千个GPU的，对吗？

it's not fully finished yet. I will keep you posted once it's done. the process orchestration takes some time, especially for P/D

Nov 21 '25 23:11 Jeffwan

And results:

@Alan-D-Chen this is awesome! but from the results perspective, I didn't see big difference between P/D and non P/D. Technically, the decoding latencieis for non P/D is much higher.

Could I know some of your your setup details?

what chips?
tp2 (4 replicas?) vs 2P2D (total 4* TP2 = 8 gpus)
what's the workload input and output size?

Seems your decoding latency starts with ~20ms and gradually drops to 10ms from the table, however, in the stylesheet, the TOPO or ITL numbers are not matched. I am not sure if there's typo or I understand incorrectly.

Nov 21 '25 23:11 Jeffwan

Answer：

Wed Nov 26 09:35:13 2025

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.148.08             Driver Version: 570.148.08     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L40S                    On  |   00000000:01:00.0 Off |                  Off |
| N/A   39C    P0            111W /  350W |     845MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

aggregation：Tensor Parallelism = 2 ， 2 GPUs for inference PD disaggregation 2 GPUs for prefilling ， 2 GPUs for decoding
the core content in the script

CONCURRENCIES=(10 20 30 40 50 60 70 80 90 100 110 120)
BASE_CMD="python3 /vllm-workspace/benchmarks/benchmark_serving.py \
    --model Qwen3-32B \
    --dataset-name sonnet \
    --port 8000 \
    --sonnet-input-len 512 \
    --sonnet-output-len 256 \
    --endpoint /v1/chat/completions \
    --tokenizer /mnt/LLM_models/Qwen3-32B/ \
    --dataset-path /vllm-workspace/benchmarks/sonnet.txt \
    --backend openai-chat \
    --trust-remote-code"

Nov 26 '25 09:11 Alan-D-Chen