volcano icon indicating copy to clipboard operation
volcano copied to clipboard

volcano restarts due to panic

Open Ramonaaaaa opened this issue 1 year ago • 7 comments

Description

When Volcano is used for scheduling, the system restarts for multiple times.Restart cause: out-of-bounds array causes panic.

error details: kubectl logs -f -n volcano-system volcaho-scheduler-xx -p 2024/08/27 10:47:39 maxprocs: Updating GOMAXPROCS=20: determined from CPU quota W0827 10:47:39.514559 1 client config go:617] Neither --kubeconfig nor --master was specified. Using the inclusterConfig.s This might not work. I0827 10:48:01.142318 1 trace.go:205] Trace[1974509093]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:169 (27-Aug-2024 10:47:39.584)(total time: 2155 7ms): Trace[1974509093]: ---"Objects listed" error: 21521ms (10:48:01.105) Trace[1974509093]: [21.55737372s1 [21.55737372s] END I0827 10:48:03.669284 1 trace.go:205] Trace[585251636]: "DeltaFIFO Pop Process" ID:kube-system/mindx-dl-deviceinfo-kcs-haerbin-agi-s-thbdc,Depth:16,Reason:slow event handlers blocking the queue (27-Aug-2024 10:48:03.472)(total time: 195ms): Trace[585251636]: [195.826968ms][195.826968ms] END E0827 10:48:10.501173 1 cache.go:1245] error occurred in updating Queue : Operation cannot be fulfilled on queues.scheduling.volcano.sh "default": the object has been modified; please apply your changes to the latest version and try again E0827 10:48:10.501199 1 session.go:216] failed to update queue status: Operation cannot be fulfilled on queues.scheduling.volcano.sh "default": the object has been modified; please ap ply your changes to the latest version and try again 10827 10:48:13.363051 1 trace.go:205] Trace[616662840]: "DeltaFIFO Pop Process" ID:kube-system/mindx-dl-deviceinfo-kcs-haerbin-agi-s-jh95z,Depth:32,Reason:slow event handlers blocking the queue (27-Aug-2024 10:48:13.181)(total time: 181ms): Trace[616662840]: [181.390665ms] [181.390665ms] END E0827 10:48:24.691004 1 cache go:1245] error occurred in updating Queue : Operation cannot be fulfilled on queues.scheduling.volcano.sh "default": the object has been modified; please apply your changes to the latest version and try again E0827 10:48:24.691046 1 session.go:216] failed to update queue status: Operation cannot be fulfilled on queues.scheduling.volcano.sh "default": the object has been modified; please ap ply your changes to the latest version and try again panic: runtime error: index out of range [4] with length 4

goroutine 81093 [running]: k8s.io/api/core/v1.(*PodStatus).DeepCopyInto(Ox402dd59be8, 0x4019da0af8) /devcloud/slavespace/slave1-new/workspace/j_CmcZc9ms/pkg/mod/k8s.io/[email protected]/core/v1/zz generated.deepcopy.go:3982 +0x5cc kgs.io/api/core/v1.(*Pod).DeepCopyInto(Ox402dd598f0, 0x4019da0800) /devcloud/slavespace/slave1-new/workspace/j_CmcZc9ms/pkg/mod/k8s.io/[email protected]/core/v1/zz_generated.deepcopy.go:3309 +0x100 k8s.io/api/core/v1.(*Pod).DeepCopy(...) /devcloud/slavespace/slave1-new/workspace/j_CmcZc9ms/pkg/mod/k8s.io/[email protected]/core/v1/zz generated deepcopy go:3319 volcano.sh/volcano/pkg/scheduler/cache.(*defaultEvictor).Evict(0x400088e2e8, 0x402dd598f0, {0x4046557f68, 0x4}) /devcloud/slavespace/slave1-new/workspace/j_CmcZc9ms/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:209 +0x1a4 volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).Evict.func1() /devcloud/slavespace/slave1-new/workspace/j_CmcZc9ms/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:751 +0x44 created by volcano.sh/volcano/pkg/scheduler/cache.(*SchedulerCache).Evict /devcloud/slavespace/slave1-new/workspace/j CmcZc9ms/src/volcano.sh/volcano/pkg/scheduler/cache/cache.go:750 +0x338

Steps to reproduce the issue

Describe the results you received and expected

solve the panic problem

What version of Volcano are you using?

1.7

Any other relevant information

No response

Ramonaaaaa avatar Sep 09 '24 01:09 Ramonaaaaa

Is this from volcano repo?

Monokaix avatar Sep 09 '24 01:09 Monokaix

Is this from volcano repo?

yes

Ramonaaaaa avatar Sep 09 '24 01:09 Ramonaaaaa

Did you install volcano-admission correctly?

hwdef avatar Sep 09 '24 03:09 hwdef

Is this from volcano repo?

yes

You add a custom plugin?

Monokaix avatar Sep 09 '24 07:09 Monokaix

Did you install volcano-admission correctly?

yes,and add a custom plugin

Ramonaaaaa avatar Sep 09 '24 13:09 Ramonaaaaa

Is this from volcano repo?

yes

You add a custom plugin?

yes. I've been using the new plugin for two years. I found this problem recently.

Ramonaaaaa avatar Sep 09 '24 13:09 Ramonaaaaa

From the picture you pasted, seems it's caused by your own plugin, maybe you can check the custom plugin first.

Monokaix avatar Sep 10 '24 08:09 Monokaix

Hello 👋 Looks like there was no activity on this issue for last 90 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for 60 days, this issue will be closed (we can always reopen an issue if we need!).

stale[bot] avatar Feb 01 '25 01:02 stale[bot]

Closing for now as there was no activity for last 90 days after marked as stale, let us know if you need this to be reopened! 🤗

stale[bot] avatar May 06 '25 02:05 stale[bot]