aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

[controller] panic when apply ModelAdapter

Open yyzxw opened this issue 7 months ago • 1 comments

🐛 Describe the bug

controller was panic when apply ModelAdapter

Steps to Reproduce

apply a modelAdapter:

apiVersion: model.aibrix.ai/v1alpha1
kind: ModelAdapter
metadata:
  name: qwen-05b-lora-adapter
  namespace: aibrix-demo
  labels:
    model.aibrix.ai/name: "qwen-05b-lora"
    model.aibrix.ai/port: "8000"
spec:
  baseModel: qwen-05b-lora
  podSelector:
    matchLabels:
      model.aibrix.ai/name: qwen-05b-lora
  artifactURL: /data/serving-model-lora/llama3.2-1b-instruct
  schedulerName: default

the controller panic, controller log:

E0521 04:40:48.150985       1 modeladapter_controller.go:363] "Selected pod has been deleted and it should be removed from model adapter instance list" err="Pod \"qwen-05b-lora-584fd86759-p6cp5\" not found" modelAdapter="aibrix-demo/qwen-05b-lora-adapter"
I0521 04:40:48.151141       1 controller.go:115] "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "ModelAdapter"={"name":"qwen-05b-lora-adapter","namespace":"aibrix-demo"} "controller"="model-adapter-controller" "controllerGroup"="model.aibrix.ai" "controllerKind"="ModelAdapter" "name"="qwen-05b-lora-adapter" "namespace"="aibrix-demo" "reconcileID"="d5d065f3-4a8c-46ee-8d1c-91cbe9bf2886"
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x195a936]

goroutine 362 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1beb780?, 0x310b4b0?})
	/usr/local/go/src/runtime/panic.go:770 +0x132
github.com/aibrix/aibrix/pkg/controller/modeladapter.(*ModelAdapterReconciler).clearModelAdapterInstanceList(0xc0002de9a0, {0x21a30b0, 0xc000de88d0}, 0xc000b80000, {0xc000a1c060, 0x1e})
	/workspace/pkg/controller/modeladapter/modeladapter_controller.go:500 +0x616
github.com/aibrix/aibrix/pkg/controller/modeladapter.(*ModelAdapterReconciler).DoReconcile(0xc0002de9a0, {0x21a30b0, 0xc000de88d0}, {{{0xc000a000a0?, 0x21a30b0?}, {0xc0006aa180?, 0xc000a000a0?}}}, 0xc000b80000)
	/workspace/pkg/controller/modeladapter/modeladapter_controller.go:366 +0x9a5
github.com/aibrix/aibrix/pkg/controller/modeladapter.(*ModelAdapterReconciler).Reconcile(0xc0002de9a0, {0x21a30b0, 0xc000de88d0}, {{{0xc000a000a0, 0xb}, {0xc0006aa180, 0x15}}})
	/workspace/pkg/controller/modeladapter/modeladapter_controller.go:321 +0x8b3
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x21a74f0?, {0x21a30b0?, 0xc000de88d0?}, {{{0xc000a000a0?, 0xb?}, {0xc0006aa180?, 0x0?}}})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000828c80, {0x21a30e8, 0xc0002979a0}, {0x1cc4200, 0xc0007151c0})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316 +0x3bc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000828c80, {0x21a30e8, 0xc0002979a0})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266 +0x1be
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 371
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:223 +0x50c

Expected behavior

controller not panic

Environment

  • Aibrix version: v0.2.1

yyzxw avatar May 21 '25 04:05 yyzxw

It seems to have been fixed by https://github.com/vllm-project/aibrix/pull/1078

yyzxw avatar May 27 '25 13:05 yyzxw

I will close this story. If you find it's still a problem, free free to reopen the issue. thanks for @googs1025's fix

Jeffwan avatar Jul 25 '25 18:07 Jeffwan