aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

Support ARM64 (aarch64) Architecture for AIBrix Core Components for Deploy k8s arm arch

Open gongxianjin opened this issue 7 months ago • 1 comments

🚀 Feature Description and Motivation

pr:https://github.com/vllm-project/aibrix/pull/1090 has generate some images for k8s deploy, but there also has three questions:

  1. https://hub.docker.com/r/aibrix/kuberay-operator/tags, kuberay-operator is also need arm arch

  2. /aibrix/controller-manager image deploy logs detial:

"msg"="unable to initialize controllers" "error"="failed to validate CRD 'rayclusters.ray.io': error checking CRD "rayclusters.ray.io": customresourc ││ edefinitions.apiextensions.k8s.io "rayclusters.ray.io" is forbidden: User "system:serviceaccount:aibrix-system:aibrix-controller-manager" cannot get resource "customresourcedefinitions" in ││ API group "apiextensions.k8s.io" at the cluster scope. Please ensure that the CRD is installed and available in the cluster. You can verify this by running 'kubectl get crd rayclusters.ray.io ││ '" "logger"="setup"

"msg"="healthz check failed" "checker"="readyz" "error"="webhook certificates are not ready" "logger"="controller-runtime.healthz" ││ I0519 08:18:25.416696 1 healthz.go:128] "msg"="healthz check failed" "logger"="controller-runtime.healthz" "statuses"=[{}]

3.aibrix svc envoy-xxx extenal-ip is always pending and user port-foward to 80 port is also can't visit service

so please support to solve thie three questions for this issue, thinks!

Proposed Solution

No response

gongxianjin avatar May 16 '25 07:05 gongxianjin

https://app.slack.com/huddle/T08UKFXUETX/C08UG26UNR4 my slack address,or can anyone inviate me to slack talk about this issue?

gongxianjin avatar May 30 '25 03:05 gongxianjin

indeed, aibrix replies on kuberay and we need an arm version as well. Let me make it today.

Jeffwan avatar Jul 23 '25 22:07 Jeffwan

  1. if you do not use kuberay, you can disable the kuberay validations.
      containers:
        - name: manager
          args:
            - --leader-elect
..... 
            - --health-probe-bind-address=:8081
            - --metrics-bind-address=0
            - --controllers=model-adapter-controller,model-route-controller,pod-autoscaler-controller,kv-cache-controller

Note: do not enable distributed-inference-controller and it won't validate the ray crds.

https://github.com/vllm-project/aibrix/blob/7fe0a91d2f7e82bdc054539aa22bf6aa98dd5e13/pkg/features/features.go#L25-L31

https://github.com/vllm-project/aibrix/blob/7fe0a91d2f7e82bdc054539aa22bf6aa98dd5e13/pkg/controller/controller.go#L62-L73

Jeffwan avatar Jul 24 '25 21:07 Jeffwan

3.aibrix svc envoy-xxx extenal-ip is always pending and user port-foward to 80 port is also can't visit service

please cut a separate issue with more details. the problem is more on your cloud provider, it may fail to create public services

Jeffwan avatar Jul 24 '25 21:07 Jeffwan