aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

[Controller] panic when create kvcache

Open yyzxw opened this issue 7 months ago • 1 comments

🐛 Describe the bug

use example kvcache yaml create a kvcache, controller-manager was panic.

Steps to Reproduce

apply example kvcache file from: https://github.com/vllm-project/aibrix/blob/main/config/samples/orchestration_v1alpha1_kvcache.yaml

then controller-manager panic, log details:

I0520 11:21:15.502966       1 controller.go:115] "msg"="Observed a panic in reconciler: cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'" "KVCache"={"name":"aibrix-deepseek-coder-33b-kvcache","namespace":"aibrix-demo"} "controller"="kv-cache-controller" "controllerGroup"="orchestration.aibrix.ai" "controllerKind"="KVCache" "name"="aibrix-deepseek-coder-33b-kvcache" "namespace"="aibrix-demo" "reconcileID"="8881e4c3-1049-4b1f-b034-15a4604dc024"
panic: cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$' [recovered]
	panic: cannot parse '': quantities must match the regular expression '^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$'

goroutine 499 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116 +0x1e5
panic({0x1bbc0c0?, 0xc0006aa7e0?})
	/usr/local/go/src/runtime/panic.go:770 +0x132
k8s.io/apimachinery/pkg/api/resource.MustParse({0x0, 0x0})
	/go/pkg/mod/k8s.io/[email protected]/pkg/api/resource/quantity.go:139 +0x173
github.com/aibrix/aibrix/pkg/controller/kvcache.(*KVCacheReconciler).reconcileDeployment(0xc0009dc090, {0x21a30b0, 0xc00159af90}, 0xc0011aa008)
	/workspace/pkg/controller/kvcache/kvcache_controller.go:510 +0xdbc
github.com/aibrix/aibrix/pkg/controller/kvcache.(*KVCacheReconciler).Reconcile(0xc0009dc090, {0x21a30b0, 0xc00159af90}, {{{0xc00088a830?, 0x0?}, {0xc0001e86c0?, 0xc000b99d10?}}})
	/workspace/pkg/controller/kvcache/kvcache_controller.go:167 +0x15c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x21a74f0?, {0x21a30b0?, 0xc00159af90?}, {{{0xc00088a830?, 0xb?}, {0xc0001e86c0?, 0x0?}}})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:119 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0007ff9a0, {0x21a30e8, 0xc00053c690}, {0x1cc4200, 0xc00101e380})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:316 +0x3bc
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0007ff9a0, {0x21a30e8, 0xc00053c690})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266 +0x1be
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 390

Expected behavior

success create kvcache object

Environment

  • AIBrixs Version: v0.2.1
  • Deployment env: Kubebernetes

yyzxw avatar May 20 '25 11:05 yyzxw

@yyzxw could you try latest v0.3.0-rc.1 release instead of v0.2.1? They are not compatible. the sample linked above is latest but v0.2.1 still use legacy code

Jeffwan avatar May 20 '25 16:05 Jeffwan

Panic occurs when creating KV cache with AIBrix v0.4.0

kvcache yaml: https://github.com/vllm-project/aibrix/blob/main/samples/kvcache/vineyard/kvcache.yaml

logs:

E0829 03:16:42.770905       1 signal_unix.go:881] "msg"="Observed a panic" "error"=null "KVCache"={"name":"deepseek-coder-7b-kvcache","namespace":"default"} "controller"="kv-cache-controller" "controllerGroup"="orchestration.aibrix.ai" "controllerKind"="KVCache" "name"="deepseek-coder-7b-kvcache" "namespace"="default" "panic"="runtime error: invalid memory address or nil pointer dereference" "panicGoValue"="\"invalid memory address or nil pointer dereference\"" "reconcileID"="eda3563d-7adb-4eb4-b49c-e4961819da3b" "stacktrace"="goroutine 7228 [running]:\nk8s.io/apimachinery/pkg/util/runtime.logPanic({0x2906e00, 0xc004129680}, {0x1f3b2a0, 0x3b85580})\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:107 +0xbc\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile.func1()\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:105 +0x112\npanic({0x1f3b2a0?, 0x3b85580?})\n\t/usr/local/go/src/runtime/panic.go:770 +0x132\ngithub.com/vllm-project/aibrix/pkg/controller/kvcache/backends.(*VineyardReconciler).reconcileMetadataService(0xc000580808?, {0x2906e00?, 0xc004129680?}, 0x1?)\n\t/workspace/pkg/controller/kvcache/backends/vineyard.go:70 +0x61\ngithub.com/vllm-project/aibrix/pkg/controller/kvcache/backends.VineyardReconciler.Reconcile({0xc003d88100}, {0x2906e00, 0xc004129680}, 0xc00204ab40)\n\t/workspace/pkg/controller/kvcache/backends/vineyard.go:45 +0x30\ngithub.com/vllm-project/aibrix/pkg/controller/kvcache.(*KVCacheReconciler).Reconcile(0xc003c7a5f0, {0x2906e00, 0xc004129680}, {{{0xc00423cb80?, 0x1a1e35a?}, {0xc005448200?, 0x0?}}})\n\t/workspace/pkg/controller/kvcache/kvcache_controller.go:163 +0x2bd\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile(0xc001c8d440?, {0x2906e00?, 0xc004129680?}, {{{0xc00423cb80?, 0x0?}, {0xc005448200?, 0x0?}}})\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116 +0xd4\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler(0x2922240, {0x2906e38, 0xc0004e1540}, {{{0xc00423cb80, 0x7}, {0xc005448200, 0x19}}})\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303 +0x3bc\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem(0x2922240, {0x2906e38, 0xc0004e1540})\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263 +0x21d\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2()\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224 +0x8a\ncreated by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2 in goroutine 7078\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:220 +0x490\n"
E0829 03:16:42.771003       1 controller.go:316] "msg"="Reconciler error" "error"="panic: runtime error: invalid memory address or nil pointer dereference [recovered]" "KVCache"={"name":"deepseek-coder-7b-kvcache","namespace":"default"} "controller"="kv-cache-controller" "controllerGroup"="orchestration.aibrix.ai" "controllerKind"="KVCache" "name"="deepseek-coder-7b-kvcache" "namespace"="default" "reconcileID"="eda3563d-7adb-4eb4-b49c-e4961819da3b"

Possible cause:watcher is not configured, but tries to access Watcher.env:

https://github.com/vllm-project/aibrix/blob/main/pkg/controller/kvcache/backends/hpkv.go#L135 https://github.com/vllm-project/aibrix/blob/main/pkg/controller/kvcache/backends/infinistore.go#L130

Besides, I can try to fix this bug。

@Jeffwan PTAL

zhixian82 avatar Aug 29 '25 03:08 zhixian82

@zhixian82 thanks for this. you can submit pr 😄

googs1025 avatar Aug 29 '25 05:08 googs1025