aibrix
aibrix copied to clipboard
[controller] panic when apply ModelAdapter
🐛 Describe the bug
controller was panic when apply ModelAdapter
Steps to Reproduce
apply a modelAdapter:
apiVersion: model.aibrix.ai/v1alpha1
kind: ModelAdapter
metadata:
name: qwen-05b-lora-adapter
namespace: aibrix-demo
labels:
model.aibrix.ai/name: "qwen-05b-lora"
model.aibrix.ai/port: "8000"
spec:
baseModel: qwen-05b-lora
podSelector:
matchLabels:
model.aibrix.ai/name: qwen-05b-lora
artifactURL: /data/serving-model-lora/llama3.2-1b-instruct
schedulerName: default
the controller panic, controller log:
E0521 04:40:48.150985 1 modeladapter_controller.go:363] "Selected pod has been deleted and it should be removed from model adapter instance list" err="Pod \"qwen-05b-lora-584fd86759-p6cp5\" not found" modelAdapter="aibrix-demo/qwen-05b-lora-adapter"
I0521 04:40:48.151141 1 controller.go:115] "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "ModelAdapter"={"name":"qwen-05b-lora-adapter","namespace":"aibrix-demo"} "controller"="model-adapter-controller" "controllerGroup"="model.aibrix.ai" "controllerKind"="ModelAdapter" "name"="qwen-05b-lora-adapter" "namespace"="aibrix-demo" "reconcileID"="d5d065f3-4a8c-46ee-8d1c-91cbe9bf2886"
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x195a936]
goroutine 362 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1beb780?, 0x310b4b0?})
/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/aibrix/aibrix/pkg/controller/modeladapter.(*ModelAdapterReconciler).clearModelAdapterInstanceList(0xc0002de9a0, {0x21a30b0, 0xc000de88d0}, 0xc000b80000, {0xc000a1c060, 0x1e})
/workspace/pkg/controller/modeladapter/modeladapter_controller.go:500 +0x616
github.com/aibrix/aibrix/pkg/controller/modeladapter.(*ModelAdapterReconciler).DoReconcile(0xc0002de9a0, {0x21a30b0, 0xc000de88d0}, {{{0xc000a000a0?, 0x21a30b0?}, {0xc0006aa180?, 0xc000a000a0?}}}, 0xc000b80000)
/workspace/pkg/controller/modeladapter/modeladapter_controller.go:366 +0x9a5
github.com/aibrix/aibrix/pkg/controller/modeladapter.(*ModelAdapterReconciler).Reconcile(0xc0002de9a0, {0x21a30b0, 0xc000de88d0}, {{{0xc000a000a0, 0xb}, {0xc0006aa180, 0x15}}})
/workspace/pkg/controller/modeladapter/modeladapter_controller.go:321 +0x8b3
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x21a74f0?, {0x21a30b0?, 0xc000de88d0?}, {{{0xc000a000a0?, 0xb?}, {0xc0006aa180?, 0x0?}}})
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000828c80, {0x21a30e8, 0xc0002979a0}, {0x1cc4200, 0xc0007151c0})
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316 +0x3bc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000828c80, {0x21a30e8, 0xc0002979a0})
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266 +0x1be
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 371
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:223 +0x50c
Expected behavior
controller not panic
Environment
- Aibrix version: v0.2.1
It seems to have been fixed by https://github.com/vllm-project/aibrix/pull/1078
I will close this story. If you find it's still a problem, free free to reopen the issue. thanks for @googs1025's fix