spark-operator icon indicating copy to clipboard operation
spark-operator copied to clipboard

[BUG] spark-operator:v1beta2-1.4.3-3.5.0 crashes on start

Open Aransh opened this issue 4 months ago • 4 comments

  • [x] ✋ I have searched the open/closed issues and my issue is not listed.

Reproduction Code [Required]

Steps to reproduce the behavior: Using the newest version of the chart (1.2.7) and the image (spark-operator:v1beta2-1.4.3-3.5.0) results in an instant crash of the operator pods. If I use the same exact configuration, but with image version v1beta2-1.4.2-3.5.0, I get no crash. Values yaml:

replicaCount: 3
affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - topologyKey: kubernetes.io/hostname
      labelSelector:
        matchLabels:
          app.kubernetes.io/name: spark-operator
serviceAccounts:
  spark:
    name: "spark-apps"
  sparkoperator:
    name: "operator-spark"
sparkJobNamespaces: ["spark-apps"]
webhook:
  enable: true
podMonitor:
  enable: true

Expected behavior

No to crash

Actual behavior

Crashes

Terminal Output Screenshot(s)

Full log:

++ id -u

  • myuid=0 ++ id -g
  • mygid=0
  • set +e ++ getent passwd 0
  • uidentry=root:x:0:0:root:/root:/bin/bash
  • set -e
  • echo 0 0
  • echo 0 0
  • echo root:x:0:0:root:/root:/bin/bash root:x:0:0:root:/root:/bin/bash
  • [[ -z root:x:0:0:root:/root:/bin/bash ]]
  • exec /usr/bin/tini -s -- /usr/bin/spark-operator -v=2 -logtostderr -namespace=spark-apps -enable-ui-service=true -ingress-url-format= -controller-threads=10 -resync-interval=30 -enable-batch-scheduler=false -label-selector-filter= -enable-metrics=true -metrics-labels=app_type -metrics-port=10254 -metrics-endpoint=/metrics -metrics-prefix= -enable-webhook=true -webhook-svc-namespace=operator-spark -webhook-port=8080 -webhook-timeout=30 -webhook-svc-name=spark-operator-devops-playground-webhook -webhook-config-name=spark-operator-devops-playground-webhook-config -webhook-namespace-selector= -enable-resource-quota-enforcement=false -leader-election=true -leader-election-lock-namespace=operator-spark -leader-election-lock-name=spark-operator-lock F0417 10:59:07.333005 10 main.go:146] Lock identity is empty

goroutine 1 [running]: github.com/golang/glog.Fatal(...) /go/pkg/mod/github.com/golang/[email protected]/glog.go:664 main.main() /workspace/main.go:146 +0x1418

SIGABRT: abort PC=0x40708e m=5 sigcode=18446744073709551610

goroutine 1 gp=0xc0000061c0 m=5 mp=0xc00007f808 [running, locked to thread]: runtime/internal/syscall.Syscall6() /usr/local/go/src/runtime/internal/syscall/asm_linux_amd64.s:36 +0xe fp=0xc0000c9a88 sp=0xc0000c9a80 pc=0x40708e syscall.RawSyscall6(0xc00033a088?, 0xc000124270?, 0xc0005a2260?, 0x2be4440?, 0x548220?, 0x2be44d8?, 0xc0000c9af0?) /usr/local/go/src/runtime/internal/syscall/syscall_linux.go:38 +0xd fp=0xc0000c9ad0 sp=0xc0000c9a88 pc=0x40706d syscall.RawSyscall(0x2be44d8?, 0x0?, 0xc0000c9b70?, 0xc0000c9b50?) /usr/local/go/src/syscall/syscall_linux.go:62 +0x15 fp=0xc0000c9b18 sp=0xc0000c9ad0 pc=0x48a8f5 syscall.Tgkill(0xba?, 0x0?, 0x0?) /usr/local/go/src/syscall/zsyscall_linux_amd64.go:894 +0x25 fp=0xc0000c9b48 sp=0xc0000c9b18 pc=0x488aa5 github.com/golang/glog.abortProcess() /go/pkg/mod/github.com/golang/[email protected]/glog_file_linux.go:35 +0x87 fp=0xc0000c9b90 sp=0xc0000c9b48 pc=0x548387 github.com/golang/glog.ctxfatalf({0x0?, 0x0?}, 0xc0004f1170?, {0x1b8e1cb?, 0x411d65?}, {0xc0004f1170?, 0x185ba80?, 0xc000596601?}) /go/pkg/mod/github.com/golang/[email protected]/glog.go:647 +0x6a fp=0xc0000c9bf8 sp=0xc0000c9b90 pc=0x54606a github.com/golang/glog.fatalf(...) /go/pkg/mod/github.com/golang/[email protected]/glog.go:657 github.com/golang/glog.FatalDepth(0x1, {0xc0004f1170, 0x1, 0x1}) /go/pkg/mod/github.com/golang/[email protected]/glog.go:670 +0x57 fp=0xc0000c9c48 sp=0xc0000c9bf8 pc=0x5461f7 github.com/golang/glog.Fatal(...) /go/pkg/mod/github.com/golang/[email protected]/glog.go:664 main.main() /workspace/main.go:146 +0x1418 fp=0xc0000c9f50 sp=0xc0000c9c48 pc=0x172efb8 runtime.main() /usr/local/go/src/runtime/proc.go:271 +0x29d fp=0xc0000c9fe0 sp=0xc0000c9f50 pc=0x4404fd runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000c9fe8 sp=0xc0000c9fe0 pc=0x473721

goroutine 2 gp=0xc000006700 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x44094e runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:408 runtime.forcegchelper() /usr/local/go/src/runtime/proc.go:326 +0xb3 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x4407b3 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x473721 created by runtime.init.6 in goroutine 1 /usr/local/go/src/runtime/proc.go:314 +0x1a

goroutine 3 gp=0xc000006c40 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x44094e runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:408 runtime.bgsweep(0xc000058070) /usr/local/go/src/runtime/mgcsweep.go:318 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x42a2bf runtime.gcenable.gowrap1() /usr/local/go/src/runtime/mgc.go:203 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x41ebc5 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x473721 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:203 +0x66

goroutine 4 gp=0xc000006e00 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x1e0abb8?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x44094e runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:408 runtime.(*scavengerState).park(0x2be48a0) /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x427c69 runtime.bgscavenge(0xc000058070) /usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x428219 runtime.gcenable.gowrap2() /usr/local/go/src/runtime/mgc.go:204 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x41eb65 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x473721 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:204 +0xa5

goroutine 17 gp=0xc000102380 m=nil [finalizer wait]: runtime.gopark(0xc000084660?, 0x42713c?, 0x80?, 0x7f?, 0x550011?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000084620 sp=0xc000084600 pc=0x44094e runtime.runfinq() /usr/local/go/src/runtime/mfinal.go:194 +0x107 fp=0xc0000847e0 sp=0xc000084620 pc=0x41dc07 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x473721 created by runtime.createfing in goroutine 1 /usr/local/go/src/runtime/mfinal.go:164 +0x3d

goroutine 18 gp=0xc000103880 m=nil [select]: runtime.gopark(0xc000080780?, 0x2?, 0x40?, 0x6?, 0xc000080774?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000080618 sp=0xc0000805f8 pc=0x44094e runtime.selectgo(0xc000080780, 0xc000080770, 0x0?, 0x0, 0x0?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000080738 sp=0xc000080618 pc=0x451e65 github.com/golang/glog.(*fileSink).flushDaemon(0x2be44d8) /go/pkg/mod/github.com/golang/[email protected]/glog_file.go:351 +0xb9 fp=0xc0000807c8 sp=0xc000080738 pc=0x547df9 github.com/golang/glog.init.1.gowrap1() /go/pkg/mod/github.com/golang/[email protected]/glog_file.go:166 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x546e85 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x473721 created by github.com/golang/glog.init.1 in goroutine 1 /go/pkg/mod/github.com/golang/[email protected]/glog_file.go:166 +0x126

goroutine 34 gp=0xc000268c40 m=nil [GC worker (idle)]: runtime.gopark(0xc000080fa8?, 0x40a20b?, 0x17?, 0x96?, 0x1?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000080f50 sp=0xc000080f30 pc=0x44094e runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000080fe0 sp=0xc000080f50 pc=0x420ca5 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x473721 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 35 gp=0xc0003416c0 m=nil [GC worker (idle)]: runtime.gopark(0x557d01c66dd1?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0004b0750 sp=0xc0004b0730 pc=0x44094e runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0004b07e0 sp=0xc0004b0750 pc=0x420ca5 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0004b07e8 sp=0xc0004b07e0 pc=0x473721 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 49 gp=0xc000500000 m=nil [GC worker (idle)]: runtime.gopark(0x557d01c4abd0?, 0xc0000560a0?, 0x1a?, 0xa?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0004ac750 sp=0xc0004ac730 pc=0x44094e runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0004ac7e0 sp=0xc0004ac750 pc=0x420ca5 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0004ac7e8 sp=0xc0004ac7e0 pc=0x473721 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 50 gp=0xc0005001c0 m=nil [GC worker (idle)]: runtime.gopark(0x557d01c6fcb5?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0004acf50 sp=0xc0004acf30 pc=0x44094e runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0004acfe0 sp=0xc0004acf50 pc=0x420ca5 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0004acfe8 sp=0xc0004acfe0 pc=0x473721 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 20 gp=0xc000007a40 m=nil [select, locked to thread]: runtime.gopark(0xc0004affa8?, 0x2?, 0xe9?, 0xb?, 0xc0004aff94?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0004afe38 sp=0xc0004afe18 pc=0x44094e runtime.selectgo(0xc0004affa8, 0xc0004aff90, 0x0?, 0x0, 0x0?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc0004aff58 sp=0xc0004afe38 pc=0x451e65 runtime.ensureSigM.func1() /usr/local/go/src/runtime/signal_unix.go:1034 +0x19f fp=0xc0004affe0 sp=0xc0004aff58 pc=0x46aadf runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0004affe8 sp=0xc0004affe0 pc=0x473721 created by runtime.ensureSigM in goroutine 1 /usr/local/go/src/runtime/signal_unix.go:1017 +0xc8

goroutine 5 gp=0xc000500380 m=7 mp=0xc0000bc008 [syscall]: runtime.notetsleepg(0x2c472a0, 0xffffffffffffffff) /usr/local/go/src/runtime/lock_futex.go:246 +0x29 fp=0xc000082fa0 sp=0xc000082f78 pc=0x410389 os/signal.signal_recv() /usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc000082fc0 sp=0xc000082fa0 pc=0x46ffe9 os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc000082fe0 sp=0xc000082fc0 pc=0x515d73 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000082fe8 sp=0xc000082fe0 pc=0x473721 created by os/signal.Notify.func1.1 in goroutine 1 /usr/local/go/src/os/signal/signal.go:151 +0x1f

goroutine 6 gp=0xc000500540 m=nil [chan receive]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0000835d0 sp=0xc0000835b0 pc=0x44094e runtime.chanrecv(0xc00059a120, 0xc000083718, 0x1) /usr/local/go/src/runtime/chan.go:583 +0x3bf fp=0xc000083648 sp=0xc0000835d0 pc=0x40a71f runtime.chanrecv2(0x0?, 0x0?) /usr/local/go/src/runtime/chan.go:447 +0x12 fp=0xc000083670 sp=0xc000083648 pc=0x40a352 k8s.io/apimachinery/pkg/watch.(*Broadcaster).loop(0xc00059ccd0) /go/pkg/mod/k8s.io/[email protected]/pkg/watch/mux.go:268 +0x66 fp=0xc0000837c8 sp=0xc000083670 pc=0x8f6b66 k8s.io/apimachinery/pkg/watch.NewLongQueueBroadcaster.gowrap1() /go/pkg/mod/k8s.io/[email protected]/pkg/watch/mux.go:93 +0x25 fp=0xc0000837e0 sp=0xc0000837c8 pc=0x8f5ce5 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000837e8 sp=0xc0000837e0 pc=0x473721 created by k8s.io/apimachinery/pkg/watch.NewLongQueueBroadcaster in goroutine 1 /go/pkg/mod/k8s.io/[email protected]/pkg/watch/mux.go:93 +0x125

rax 0x0 rbx 0xa rcx 0x40708e rdx 0x6 rdi 0xa rsi 0xe rbp 0xc0000c9ac0 rsp 0xc0000c9a80 r8 0x0 r9 0x0 r10 0x0 r11 0x216 r12 0x0 r13 0x0 r14 0xc0000061c0 r15 0x3ffffffffff rip 0x40708e rflags 0x216 cs 0x33 fs 0x0 gs 0x0

Environment & Versions

  • Spark Operator App version: v1beta2-1.4.3-3.5.0
  • Helm Chart Version: 1.2.7
  • Kubernetes Version: 1.27
  • Apache Spark version:

Additional context

Aransh avatar Apr 17 '24 11:04 Aransh

We just released a new image update with important registry fixes. Check it out:

Image tag: https://github.com/kubeflow/spark-operator/tree/v1beta2-1.4.5-3.5.0 Helm chart: https://github.com/kubeflow/spark-operator/releases/tag/spark-operator-chart-1.2.14

Please give it a try and let us know if you encounter any issues. We're working on a new KubeFlow Spark Operator release and your testing will help make it stable! Feel free to share feedback on the Kubeflow Spark operator channel.

vara-bonthu avatar Apr 26 '24 17:04 vara-bonthu

@vara-bonthu Just tested docker.io/kubeflow/spark-operator:v1beta2-1.4.5-3.5.0 and seeing the same exact issue

++ id -u
+ myuid=0 ++ id -g + mygid=0 + set +e ++ getent passwd 0 + uidentry=root:x:0:0:root:/root:/bin/bash + set -e + echo 0 0 0 + echo 0 + echo root:x:0:0:root:/root:/bin/bash root:x:0:0:root:/root:/bin/bash + [[ -z root:x:0:0:root:/root:/bin/bash ]] + exec /usr/bin/tini -s -- /usr/bin/spark-operator -v=2 -logtostderr -namespace=spark-apps -enable-ui-service=true -ingress-url-format= -controller-threads=10 -resync-interval=30 -enable-batch-scheduler=false -label-selector-filter= -enable-metrics=true -metrics-labels=app_type -metrics-port=10254 -metrics-endpoint=/metrics -metrics-prefix= -enable-webhook=true -webhook-svc-namespace=operator-spark -webhook-port=8080 -webhook-timeout=30 -webhook-svc-name=spark-operator-devops-playground-webhook -webhook-config-name=spark-operator-devops-playground-webhook-config -webhook-namespace-selector= -enable-resource-quota-enforcement=false -leader-election=true -leader-election-lock-namespace=operator-spark -leader-election-lock-name=spark-operator-lock F0428 09:50:15.367621 10 main.go:146] Lock identity is empty goroutine 1 [running]: github.com/golang/glog.Fatal(...) /go/pkg/mod/github.com/golang/[email protected]/glog.go:664 main.main() /workspace/main.go:146 +0x1418 SIGABRT: abort PC=0x40708e m=3 sigcode=18446744073709551610 goroutine 1 gp=0xc0000061c0 m=3 mp=0xc00007f008 [running, locked to thread]: runtime/internal/syscall.Syscall6() /usr/local/go/src/runtime/internal/syscall/asm_linux_amd64.s:36 +0xe fp=0xc000023a88 sp=0xc000023a80 pc=0x40708e syscall.RawSyscall6(0xc000012048?, 0xc000000030?, 0xc0001c0060?, 0x2be5440?, 0x548220?, 0x2be54d8?, 0xc000023af0?) /usr/local/go/src/runtime/internal/syscall/syscall_linux.go:38 +0xd fp=0xc000023ad0 sp=0xc000023a88 pc=0x40706d syscall.RawSyscall(0x2be54d8?, 0x0?, 0xc000023b70?, 0xc000023b50?) /usr/local/go/src/syscall/syscall_linux.go:62 +0x15 fp=0xc000023b18 sp=0xc000023ad0 pc=0x48a8f5 syscall.Tgkill(0xba?, 0x0?, 0x0?) /usr/local/go/src/syscall/zsyscall_linux_amd64.go:894 +0x25 fp=0xc000023b48 sp=0xc000023b18 pc=0x488aa5 github.com/golang/glog.abortProcess() /go/pkg/mod/github.com/golang/[email protected]/glog_file_linux.go:35 +0x87 fp=0xc000023b90 sp=0xc000023b48 pc=0x548387 github.com/golang/glog.ctxfatalf({0x0?, 0x0?}, 0xc0004460a0?, {0x1b8f1eb?, 0x411d65?}, {0xc0004460a0?, 0x185ca80?, 0xc00016a001?}) /go/pkg/mod/github.com/golang/[email protected]/glog.go:647 +0x6a fp=0xc000023bf8 sp=0xc000023b90 pc=0x54606a github.com/golang/glog.fatalf(...) /go/pkg/mod/github.com/golang/[email protected]/glog.go:657 github.com/golang/glog.FatalDepth(0x1, {0xc0004460a0, 0x1, 0x1}) /go/pkg/mod/github.com/golang/[email protected]/glog.go:670 +0x57 fp=0xc000023c48 sp=0xc000023bf8 pc=0x5461f7 github.com/golang/glog.Fatal(...) /go/pkg/mod/github.com/golang/[email protected]/glog.go:664 main.main() /workspace/main.go:146 +0x1418 fp=0xc000023f50 sp=0xc000023c48 pc=0x172f418 runtime.main() /usr/local/go/src/runtime/proc.go:271 +0x29d fp=0xc000023fe0 sp=0xc000023f50 pc=0x4404fd runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000023fe8 sp=0xc000023fe0 pc=0x473721 goroutine 2 gp=0xc000006700 m=nil [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000084fa8 sp=0xc000084f88 pc=0x44094e runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:408 runtime.forcegchelper() /usr/local/go/src/runtime/proc.go:326 +0xb3 fp=0xc000084fe0 sp=0xc000084fa8 pc=0x4407b3 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000084fe8 sp=0xc000084fe0 pc=0x473721 created by runtime.init.6 in goroutine 1 /usr/local/go/src/runtime/proc.go:314 +0x1a goroutine 3 gp=0xc000006c40 m=nil [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000085780 sp=0xc000085760 pc=0x44094e runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:408 runtime.bgsweep(0xc000058070) /usr/local/go/src/runtime/mgcsweep.go:318 +0xdf fp=0xc0000857c8 sp=0xc000085780 pc=0x42a2bf runtime.gcenable.gowrap1() /usr/local/go/src/runtime/mgc.go:203 +0x25 fp=0xc0000857e0 sp=0xc0000857c8 pc=0x41ebc5 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000857e8 sp=0xc0000857e0 pc=0x473721 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:203 +0x66 goroutine 4 gp=0xc000006e00 m=nil [GC scavenge wait]: runtime.gopark(0x10000?, 0x1e0bc58?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000085f78 sp=0xc000085f58 pc=0x44094e runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:408 runtime.(*scavengerState).park(0x2be58a0) /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000085fa8 sp=0xc000085f78 pc=0x427c69 runtime.bgscavenge(0xc000058070) /usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000085fc8 sp=0xc000085fa8 pc=0x428219 runtime.gcenable.gowrap2() /usr/local/go/src/runtime/mgc.go:204 +0x25 fp=0xc000085fe0 sp=0xc000085fc8 pc=0x41eb65 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000085fe8 sp=0xc000085fe0 pc=0x473721 created by runtime.gcenable in goroutine 1 /usr/local/go/src/runtime/mgc.go:204 +0xa5 goroutine 17 gp=0xc0001b0000 m=nil [finalizer wait]: runtime.gopark(0xc000084660?, 0x42713c?, 0x80?, 0x8f?, 0x550011?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000084620 sp=0xc000084600 pc=0x44094e runtime.runfinq() /usr/local/go/src/runtime/mfinal.go:194 +0x107 fp=0xc0000847e0 sp=0xc000084620 pc=0x41dc07 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000847e8 sp=0xc0000847e0 pc=0x473721 created by runtime.createfing in goroutine 1 /usr/local/go/src/runtime/mfinal.go:164 +0x3d goroutine 18 gp=0xc0001b1500 m=nil [select]: runtime.gopark(0xc000080780?, 0x2?, 0x40?, 0x6?, 0xc000080774?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000080618 sp=0xc0000805f8 pc=0x44094e runtime.selectgo(0xc000080780, 0xc000080770, 0x0?, 0x0, 0x0?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000080738 sp=0xc000080618 pc=0x451e65 github.com/golang/glog.(*fileSink).flushDaemon(0x2be54d8) /go/pkg/mod/github.com/golang/[email protected]/glog_file.go:351 +0xb9 fp=0xc0000807c8 sp=0xc000080738 pc=0x547df9 github.com/golang/glog.init.1.gowrap1() /go/pkg/mod/github.com/golang/[email protected]/glog_file.go:166 +0x25 fp=0xc0000807e0 sp=0xc0000807c8 pc=0x546e85 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000807e8 sp=0xc0000807e0 pc=0x473721 created by github.com/golang/glog.init.1 in goroutine 1 /go/pkg/mod/github.com/golang/[email protected]/glog_file.go:166 +0x126 goroutine 33 gp=0xc0002ea8c0 m=nil [GC worker (idle)]: runtime.gopark(0x1c89a40?, 0xc00015ec20?, 0x1a?, 0xa?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000080f50 sp=0xc000080f30 pc=0x44094e runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000080fe0 sp=0xc000080f50 pc=0x420ca5 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000080fe8 sp=0xc000080fe0 pc=0x473721 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1234 +0x1c goroutine 21 gp=0xc0002eae00 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000081750 sp=0xc000081730 pc=0x44094e runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000817e0 sp=0xc000081750 pc=0x420ca5 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000817e8 sp=0xc0000817e0 pc=0x473721 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1234 +0x1c goroutine 5 gp=0xc0000076c0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000086750 sp=0xc000086730 pc=0x44094e runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000867e0 sp=0xc000086750 pc=0x420ca5 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000867e8 sp=0xc0000867e0 pc=0x473721 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1234 +0x1c goroutine 22 gp=0xc0002eafc0 m=nil [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000081f50 sp=0xc000081f30 pc=0x44094e runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000081fe0 sp=0xc000081f50 pc=0x420ca5 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000081fe8 sp=0xc000081fe0 pc=0x473721 created by runtime.gcBgMarkStartWorkers in goroutine 1 /usr/local/go/src/runtime/mgc.go:1234 +0x1c goroutine 23 gp=0xc0002eb880 m=nil [select, locked to thread]: runtime.gopark(0xc000557fa8?, 0x2?, 0xe9?, 0xb?, 0xc000557f94?) /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000557e38 sp=0xc000557e18 pc=0x44094e runtime.selectgo(0xc000557fa8, 0xc000557f90, 0x0?, 0x0, 0x0?, 0x1) /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000557f58 sp=0xc000557e38 pc=0x451e65 runtime.ensureSigM.func1() /usr/local/go/src/runtime/signal_unix.go:1034 +0x19f fp=0xc000557fe0 sp=0xc000557f58 pc=0x46aadf runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000557fe8 sp=0xc000557fe0 pc=0x473721 created by runtime.ensureSigM in goroutine 1 /usr/local/go/src/runtime/signal_unix.go:1017 +0xc8 goroutine 6 gp=0xc000007880 m=6 mp=0xc00007f808 [syscall]: runtime.notetsleepg(0x2c482a0, 0xffffffffffffffff) /usr/local/go/src/runtime/lock_futex.go:246 +0x29 fp=0xc000553fa0 sp=0xc000553f78 pc=0x410389 os/signal.signal_recv() /usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc000553fc0 sp=0xc000553fa0 pc=0x46ffe9 os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc000553fe0 sp=0xc000553fc0 pc=0x515d73 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000553fe8 sp=0xc000553fe0 pc=0x473721 created by os/signal.Notify.func1.1 in goroutine 1 /usr/local/go/src/os/signal/signal.go:151 +0x1f goroutine 7 gp=0xc000007a40 m=nil [runnable]: k8s.io/apimachinery/pkg/watch.NewLongQueueBroadcaster.gowrap1() /go/pkg/mod/k8s.io/[email protected]/pkg/watch/mux.go:93 fp=0xc0005547e0 sp=0xc0005547d8 pc=0x8f5cc0 runtime.goexit({}) /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005547e8 sp=0xc0005547e0 pc=0x473721 created by k8s.io/apimachinery/pkg/watch.NewLongQueueBroadcaster in goroutine 1 /go/pkg/mod/k8s.io/[email protected]/pkg/watch/mux.go:93 +0x125 rax 0x0 rbx 0xa rcx 0x40708e rdx 0x6 rdi 0xa rsi 0xd rbp 0xc000023ac0 rsp 0xc000023a80 r8 0x0 r9 0x0 r10 0x0 r11 0x216 r12 0x0 r13 0x0 r14 0xc0000061c0 r15 0xffffffff800074 rip 0x40708e rflags 0x216 cs 0x33 fs 0x0 gs 0x0

Aransh avatar Apr 28 '24 09:04 Aransh

I recently moved from 3.4 to 3.5 spark and it worked for me.

spark-operator: v1beta2-1.4.2-3.5.0 helm: 1.2.5 k8: 1.27

I faced same issue above but I had to make sure my internal library and docker were pointing to same spark 3.5.0 version.

MJFND avatar May 02 '24 23:05 MJFND

I recently moved from 3.4 to 3.5 spark and it worked for me.

spark-operator: v1beta2-1.4.2-3.5.0 helm: 1.2.5 k8: 1.27

I faced same issue above but I had to make sure my internal library and docker were pointing to same spark 3.5.0 version.

@MJFND The problem we encountered is with operator version 1.4.3 (you've used operator version 1.4.2 - which also works for us), plus - the spark version is not relevant here, since it is the operator container itself which fails to start.

YanivKunda avatar May 05 '24 08:05 YanivKunda

Just updating that this issue occurs also in image tag v1beta2-1.4.6-3.5.0

Aransh avatar May 09 '24 11:05 Aransh

Trying to revisit this to see what actually went wrong, from the crash logs the issue seems to originate in the leader election process:

F0428 09:50:15.367621 10 main.go:146] Lock identity is empty

Could this be related to the K8s changes in done in #1983 ? I see it is was done after #1968 which aligns APIs to 1.29.3 - Maybe this is a K8s API backward compatibility? (we are using K8s 1.28.9)

YanivKunda avatar May 21 '24 07:05 YanivKunda