milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]:Upgrade v2.4.0-rc.1, querycoord can`t start

Open yunurs opened this issue 2 years ago • 5 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Upgrade v2.4.0-rc.1, querycoord can`t start

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

[channel=by-online-rootcoord-dml_15_446170277212890085v0] [from=-1] [to=1910] [2024/03/27 10:41:53.148 +08:00] [INFO] [balance/utils.go:70] ["create segment task"] [collection=446170277212890085] [segmentID=448191748096577109] [replica=446170315782881503] [channel=by-online-rootcoord-dml_15_446170277212890085v0] [from=-1] [to=1929] [2024/03/27 10:41:53.148 +08:00] [INFO] [balance/utils.go:70] ["create segment task"] [collection=446170277212890085] [segmentID=448191747786612370] [replica=446170315782881503] [channel=by-online-rootcoord-dml_15_446170277212890085v0] [from=-1] [to=1927] [2024/03/27 10:41:53.148 +08:00] [INFO] [balance/utils.go:70] ["create segment task"] [collection=446170277212890085] [segmentID=448191748096602365] [replica=446170315782881503] [channel=by-online-rootcoord-dml_15_446170277212890085v0] [from=-1] [to=1912] [2024/03/27 10:41:53.148 +08:00] [INFO] [balance/utils.go:70] ["create segment task"] [collection=446170277212890085] [segmentID=448191748096601268] [replica=446170315782881503] [channel=by-online-rootcoord-dml_15_446170277212890085v0] [from=-1] [to=1930] [2024/03/27 10:41:53.148 +08:00] [INFO] [balance/utils.go:70] ["create segment task"] [collection=446170277212890085] [segmentID=448191748096592763] [replica=446170315782881503] [channel=by-online-rootcoord-dml_15_446170277212890085v0] [from=-1] [to=1923] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x46e9f6f]

goroutine 742 [running]: panic({0x4d8b0c0, 0x7871930}) /usr/local/go/src/runtime/panic.go:987 +0x3bb fp=0xc001aa1a98 sp=0xc001aa19d8 pc=0x1b96ddb runtime.panicmem(...) /usr/local/go/src/runtime/panic.go:260 runtime.sigpanic() /usr/local/go/src/runtime/signal_unix.go:841 +0x37d fp=0xc001aa1af8 sp=0xc001aa1a98 pc=0x1baf0dd github.com/milvus-io/milvus/internal/querycoordv2/checkers.(*SegmentChecker).createSegmentLoadTasks(0xc001a97940, {0x59b6ab0, 0xc0020da370}, {0xc002840830?, 0x2, 0x2}, 0xc0009134a0) /go/src/github.com/milvus-io/milvus/internal/querycoordv2/checkers/segment_checker.go:380 +0x38f fp=0xc001aa1c50 sp=0xc001aa1af8 pc=0x46e9f6f github.com/milvus-io/milvus/internal/querycoordv2/checkers.(*SegmentChecker).checkReplica(0xc0001d0e80?, {0x59b6ab0, 0xc0020da370}, 0xc0009134a0) /go/src/github.com/milvus-io/milvus/internal/querycoordv2/checkers/segment_checker.go:113 +0x117 fp=0xc001aa1d58 sp=0xc001aa1c50 pc=0x46e73f7 github.com/milvus-io/milvus/internal/querycoordv2/checkers.(*SegmentChecker).Check(0xc001a97940, {0x59b6ab0, 0xc0020da370}) /go/src/github.com/milvus-io/milvus/internal/querycoordv2/checkers/segment_checker.go:92 +0x40a fp=0xc001aa1e58 sp=0xc001aa1d58 pc=0x46e722a github.com/milvus-io/milvus/internal/querycoordv2/checkers.(*CheckerController).check(0xc0015d6f00, {0x59b6ab0, 0xc0020da370}, 0x0?) /go/src/github.com/milvus-io/milvus/internal/querycoordv2/checkers/controller.go:159 +0x53 fp=0xc001aa1ea8 sp=0xc001aa1e58 pc=0x46e2dd3 github.com/milvus-io/milvus/internal/querycoordv2/checkers.(*CheckerController).startChecker(0xc0015d6f00, {0x59b6ab0, 0xc0020da370}, 0x0?) /go/src/github.com/milvus-io/milvus/internal/querycoordv2/checkers/controller.go:129 +0x16c fp=0xc001aa1fb0 sp=0xc001aa1ea8 pc=0x46e29ec github.com/milvus-io/milvus/internal/querycoordv2/checkers.(*CheckerController).Start.func1() /go/src/github.com/milvus-io/milvus/internal/querycoordv2/checkers/controller.go:95 +0x31 fp=0xc001aa1fe0 sp=0xc001aa1fb0 pc=0x46e2711 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc001aa1fe8 sp=0xc001aa1fe0 pc=0x1bd0c21 created by github.com/milvus-io/milvus/internal/querycoordv2/checkers.(*CheckerController).Start /go/src/github.com/milvus-io/milvus/internal/querycoordv2/checkers/controller.go:95 +0xb3

goroutine 1 [chan receive]: runtime.gopark(0x7a07110?, 0xc001663770?, 0xa?, 0x9b?, 0xc001fb06f0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00219b700 sp=0xc00219b6e0 pc=0x1b9a356 runtime.chanrecv(0xc00052c540, 0x0, 0x1) /usr/local/go/src/runtime/chan.go:583 +0x49d fp=0xc00219b790 sp=0xc00219b700 pc=0x1b626fd runtime.chanrecv1(0x79efe08?, 0x52eb340?) /usr/local/go/src/runtime/chan.go:442 +0x18 fp=0xc00219b7b8 sp=0xc00219b790 pc=0x1b621f8 github.com/milvus-io/milvus/cmd/roles.(*MilvusRoles).Run(0xc000268f00) /go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:399 +0xafa fp=0xc00219bc78 sp=0xc00219b7b8 pc=0x4880f9a github.com/milvus-io/milvus/cmd/milvus.(*run).execute(0x0?, {0xc000052180?, 0x3, 0x3}, 0xc0002624e0) /go/src/github.com/milvus-io/milvus/cmd/milvus/run.go:47 +0x2e5 fp=0xc00219bd48 sp=0xc00219bc78 pc=0x488be05 github.com/milvus-io/milvus/cmd/milvus.RunMilvus({0xc000052180?, 0x3, 0x3}) /go/src/github.com/milvus-io/milvus/cmd/milvus/milvus.go:60 +0x20e fp=0xc00219bdc0 sp=0xc00219bd48 pc=0x488ba8e main.main() /go/src/github.com/milvus-io/milvus/cmd/main.go:95 +0x3e5 fp=0xc00219bf80 sp=0xc00219bdc0 pc=0x48907c5 runtime.main() /usr/local/go/src/runtime/proc.go:250 +0x207 fp=0xc00219bfe0 sp=0xc00219bf80 pc=0x1b99f27 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00219bfe8 sp=0xc00219bfe0 pc=0x1bd0c21

goroutine 2 [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00019efb0 sp=0xc00019ef90 pc=0x1b9a356 runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:387 runtime.forcegchelper() /usr/local/go/src/runtime/proc.go:305 +0xb0 fp=0xc00019efe0 sp=0xc00019efb0 pc=0x1b9a190 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00019efe8 sp=0xc00019efe0 pc=0x1bd0c21 created by runtime.init.6 /usr/local/go/src/runtime/proc.go:293 +0x25

goroutine 3 [GC sweep wait]: runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00019f780 sp=0xc00019f760 pc=0x1b9a356 runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:387 runtime.bgsweep(0x0?) /usr/local/go/src/runtime/mgcsweep.go:319 +0xde fp=0xc00019f7c8 sp=0xc00019f780 pc=0x1b82a1e runtime.gcenable.func1() /usr/local/go/src/runtime/mgc.go:178 +0x26 fp=0xc00019f7e0 sp=0xc00019f7c8 pc=0x1b77ae6 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00019f7e8 sp=0xc00019f7e0 pc=0x1bd0c21 created by runtime.gcenable /usr/local/go/src/runtime/mgc.go:178 +0x6b

goroutine 4 [GC scavenge wait]: runtime.gopark(0xc0001ce000?, 0x59659f0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00019ff70 sp=0xc00019ff50 pc=0x1b9a356 runtime.goparkunlock(...) /usr/local/go/src/runtime/proc.go:387 runtime.(*scavengerState).park(0x79ba540) /usr/local/go/src/runtime/mgcscavenge.go:400 +0x53 fp=0xc00019ffa0 sp=0xc00019ff70 pc=0x1b80853 runtime.bgscavenge(0x0?) /usr/local/go/src/runtime/mgcscavenge.go:633 +0x65 fp=0xc00019ffc8 sp=0xc00019ffa0 pc=0x1b80e65 runtime.gcenable.func2() /usr/local/go/src/runtime/mgc.go:179 +0x26 fp=0xc00019ffe0 sp=0xc00019ffc8 pc=0x1b77a86 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00019ffe8 sp=0xc00019ffe0 pc=0x1bd0c21 created by runtime.gcenable /usr/local/go/src/runtime/mgc.go:179 +0xaa

goroutine 5 [finalizer wait]: runtime.gopark(0x0?, 0x5513308?, 0x80?, 0xa5?, 0x2000000020?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00019e628 sp=0xc00019e608 pc=0x1b9a356 runtime.runfinq() /usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc00019e7e0 sp=0xc00019e628 pc=0x1b76aa7 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00019e7e8 sp=0xc00019e7e0 pc=0x1bd0c21 created by runtime.createfing /usr/local/go/src/runtime/mfinal.go:163 +0x45

goroutine 6 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0001a0750 sp=0xc0001a0730 pc=0x1b9a356 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0001a07e0 sp=0xc0001a0750 pc=0x1b79851 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0001a07e8 sp=0xc0001a07e0 pc=0x1bd0c21 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 18 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00019a750 sp=0xc00019a730 pc=0x1b9a356 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc00019a7e0 sp=0xc00019a750 pc=0x1b79851 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00019a7e8 sp=0xc00019a7e0 pc=0x1bd0c21 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 19 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00019af50 sp=0xc00019af30 pc=0x1b9a356 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc00019afe0 sp=0xc00019af50 pc=0x1b79851 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00019afe8 sp=0xc00019afe0 pc=0x1bd0c21 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 20 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00019b750 sp=0xc00019b730 pc=0x1b9a356 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc00019b7e0 sp=0xc00019b750 pc=0x1b79851 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00019b7e8 sp=0xc00019b7e0 pc=0x1bd0c21 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 34 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc000506750 sp=0xc000506730 pc=0x1b9a356 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0005067e0 sp=0xc000506750 pc=0x1b79851 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0005067e8 sp=0xc0005067e0 pc=0x1bd0c21 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 21 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00019bf50 sp=0xc00019bf30 pc=0x1b9a356 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc00019bfe0 sp=0xc00019bf50 pc=0x1b79851 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00019bfe8 sp=0xc00019bfe0 pc=0x1bd0c21 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 22 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00019c750 sp=0xc00019c730 pc=0x1b9a356 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc00019c7e0 sp=0xc00019c750 pc=0x1b79851 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00019c7e8 sp=0xc00019c7e0 pc=0x1bd0c21 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 7 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0001a0f50 sp=0xc0001a0f30 pc=0x1b9a356 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0001a0fe0 sp=0xc0001a0f50 pc=0x1b79851 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0001a0fe8 sp=0xc0001a0fe0 pc=0x1bd0c21 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 23 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00019cf50 sp=0xc00019cf30 pc=0x1b9a356 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc00019cfe0 sp=0xc00019cf50 pc=0x1b79851 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00019cfe8 sp=0xc00019cfe0 pc=0x1bd0c21 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 8 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0001a1750 sp=0xc0001a1730 pc=0x1b9a356 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0001a17e0 sp=0xc0001a1750 pc=0x1b79851 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0001a17e8 sp=0xc0001a17e0 pc=0x1bd0c21 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 24 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc00019d750 sp=0xc00019d730 pc=0x1b9a356 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc00019d7e0 sp=0xc00019d750 pc=0x1b79851 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc00019d7e8 sp=0xc00019d7e0 pc=0x1bd0c21 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25

goroutine 9 [GC worker (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/local/go/src/runtime/proc.go:381 +0xd6 fp=0xc0001a1f50 sp=0xc0001a1f30 pc=0x1b9a356 runtime.gcBgMarkWorker() /usr/local/go/src/runtime/mgc.go:1275 +0xf1 fp=0xc0001a1fe0 sp=0xc0001a1f50 pc=0x1b79851 runtime.goexit() /usr/local/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0001a1fe8 sp=0xc0001a1fe0 pc=0x1bd0c21 created by runtime.gcBgMarkStartWorkers /usr/local/go/src/runtime/mgc.go:1199 +0x25

Anything else?

No response

yunurs avatar Mar 27 '24 03:03 yunurs

@yunurs quick questions:

  1. upgrade from which version?
  2. Could you please refer this doc to export the whole Milvus logs for investigation? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs. /assign @yunurs

yanliang567 avatar Mar 27 '24 07:03 yanliang567

@yunurs quick questions:

  1. upgrade from which version?
  2. Could you please refer this doc to export the whole Milvus logs for investigation? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs. /assign @yunurs

upgrade from v2.2.3 ,because of the online service is really important, i rollback immediately and don`t export the logs

yunurs avatar Mar 27 '24 08:03 yunurs

@yunurs

Don't think you can upgrade to 2.4 directly from 2.2.3(Seems to be a very old user).

Try to upgrade from 2.2.3 to 2.2.16(Please do that, 2.2.3 is not stable enough for production) Then upgrade from 2.2.16 to 2.3.12 Then to upgrade from 2.3.12 to 2.4

xiaofan-luan avatar Mar 27 '24 20:03 xiaofan-luan

@yunurs

Don't think you can upgrade to 2.4 directly from 2.2.3(Seems to be a very old user).

Try to upgrade from 2.2.3 to 2.2.16(Please do that, 2.2.3 is not stable enough for production) Then upgrade from 2.2.16 to 2.3.12 Then to upgrade from 2.3.12 to 2.4

i`m sorry ,i was wrong, upgrade from v2.3.3 to v2.4,not v2.2.3

yunurs avatar Mar 28 '24 00:03 yunurs

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar May 01 '24 12:05 stale[bot]