milvus
milvus copied to clipboard
[Bug]: Querynode terminated with log: failed to Deserialize index, cardinal inner error
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: cardinal-milvus-io-2.3-ef086dc-20240222
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
- collection
laion_stable_4
has 58m-768d+ data, and the schema is:
{'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 768}}, {'name': 'int64_pk_5b', 'description': '', 'type': <DataType.INT64: 5>, 'is_partition_key': True}, {'name': 'varchar_caption', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 8192}}, {'name': 'varchar_NSFW', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 8192}}, {'name': 'float64_similarity', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'int64_width', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'int64_height', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'int64_original_width', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'int64_original_height', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'varchar_md5', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 8192}}], 'enable_dynamic_field': True}
-
reload collection (64 segments) -> concurrent requests: insert + delete + search + query
-
One querynode of total 4 terminated 134 error with error logs: (Since the cardinal is private, please get in touch with me for more detailed querynode terminated logs)
E20240226 16:13:52.003870 607 FileIo.cpp:25] [CARDINAL][FileReader][milvus] Failed to open file : /var/lib/milvus/data/querynode/index_files/447990444064979058/1/_mem.index.bin
E20240226 16:13:52.005385 607 cardinal.cc:368] [KNOWHERE][Deserialize][milvus] Cardinal Inner Exception: std::exception
I20240226 16:13:52.005625 607 time_recorder.cc:49] [KNOWHERE][PrintTimeRecord][milvus] Load index: done (2.135270 ms)
=> failed to Deserialize index, cardinal inner error
non-Go function
pc=0x7f58fbc2003b
non-Go function
pc=0x7f58fbbff858
non-Go function
pc=0x7f58fba998d0
non-Go function
pc=0x7f58fbaa537b
non-Go function
pc=0x7f58fbaa4358
non-Go function
pc=0x7f58fbaa4d10
non-Go function
pc=0x7f58fbde1bfe
runtime.cgocall(0x4749090, 0xc001774cd0)
/usr/local/go/src/runtime/cgocall.go:157 +0x5c fp=0xc001774ca8 sp=0xc001774c70 pc=0x1a627bc
github.com/milvus-io/milvus/internal/querynodev2/segments._Cfunc_DeleteSegment(0x7f58f68d1700)
_cgo_gotypes.go:475 +0x45 fp=0xc001774cd0 sp=0xc001774ca8 pc=0x4522a45
github.com/milvus-io/milvus/internal/querynodev2/segments.(*LocalSegment).Release.func1(0xc00175b2d8?)
/go/src/github.com/milvus-io/milvus/internal/querynodev2/segments/segment.go:1037 +0x3a fp=0xc001774d08 sp=0xc001774cd0 pc=0x453d07a
github.com/milvus-io/milvus/internal/querynodev2/segments.(*LocalSegment).Release(0xc00175b290)
/go/src/github.com/milvus-io/milvus/internal/querynodev2/segments/segment.go:1037 +0xa6 fp=0xc001774f48 sp=0xc001774d08 pc=0x453c826
github.com/milvus-io/milvus/internal/querynodev2/segments.remove({0x5744620, 0xc00175b290})
/go/src/github.com/milvus-io/milvus/internal/querynodev2/segments/manager.go:542 +0x42 fp=0xc001775010 sp=0xc001774f48 pc=0x452e5c2
github.com/milvus-io/milvus/internal/querynodev2/segments.(*segmentManager).Remove(0xc001620a80, 0x5158d57?, 0x3)
/go/src/github.com/milvus-io/milvus/internal/querynodev2/segments/manager.go:447 +0x2d5 fp=0xc0017750b0 sp=0xc001775010 pc=0x452d5b5
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
-
argo: laion1b-test-new-1
-
pods:
laion1b-test-2-etcd-0 1/1 Running 1 (46d ago) 74d 10.104.25.31 4am-node30 <none> <none>
laion1b-test-2-etcd-1 1/1 Running 0 74d 10.104.30.94 4am-node38 <none> <none>
laion1b-test-2-etcd-2 1/1 Running 0 74d 10.104.34.225 4am-node37 <none> <none>
laion1b-test-2-milvus-datanode-7b7f99b8d4-g8v8q 1/1 Running 0 20h 10.104.16.187 4am-node21 <none> <none>
laion1b-test-2-milvus-datanode-7b7f99b8d4-t7lfp 1/1 Running 0 20h 10.104.30.131 4am-node38 <none> <none>
laion1b-test-2-milvus-indexnode-c8c8f4584-2kbqd 1/1 Running 0 15h 10.104.14.112 4am-node18 <none> <none>
laion1b-test-2-milvus-indexnode-c8c8f4584-d6q6m 1/1 Running 0 15h 10.104.9.46 4am-node14 <none> <none>
laion1b-test-2-milvus-indexnode-c8c8f4584-lg4k7 1/1 Running 0 15h 10.104.34.47 4am-node37 <none> <none>
laion1b-test-2-milvus-indexnode-c8c8f4584-q9hcx 1/1 Running 0 15h 10.104.17.50 4am-node23 <none> <none>
laion1b-test-2-milvus-indexnode-c8c8f4584-vtvx5 1/1 Running 0 15h 10.104.29.240 4am-node35 <none> <none>
laion1b-test-2-milvus-mixcoord-74b896d49d-ljz4l 1/1 Running 0 20h 10.104.18.222 4am-node25 <none> <none>
laion1b-test-2-milvus-proxy-5cdb5b7d6b-w5h29 1/1 Running 0 20h 10.104.19.6 4am-node28 <none> <none>
laion1b-test-2-milvus-querynode-0-7977c8fdbf-8pfz2 1/1 Running 0 15h 10.104.17.49 4am-node23 <none> <none>
laion1b-test-2-milvus-querynode-0-7977c8fdbf-cb9mn 1/1 Running 0 15h 10.104.28.101 4am-node33 <none> <none>
laion1b-test-2-milvus-querynode-0-7977c8fdbf-dd9hn 1/1 Running 0 15h 10.104.33.74 4am-node36 <none> <none>
laion1b-test-2-milvus-querynode-0-7977c8fdbf-zfhhz 1/1 Running 1 (15h ago) 15h 10.104.32.10 4am-node39 <none> <none>
laion1b-test-2-pulsar-bookie-0 1/1 Running 0 74d 10.104.33.107 4am-node36 <none> <none>
laion1b-test-2-pulsar-bookie-1 1/1 Running 0 50d 10.104.18.240 4am-node25 <none> <none>
laion1b-test-2-pulsar-bookie-2 1/1 Running 0 74d 10.104.25.32 4am-node30 <none> <none>
laion1b-test-2-pulsar-broker-0 1/1 Running 0 68d 10.104.1.69 4am-node10 <none> <none>
laion1b-test-2-pulsar-proxy-0 1/1 Running 0 74d 10.104.4.218 4am-node11 <none> <none>
laion1b-test-2-pulsar-recovery-0 1/1 Running 0 74d 10.104.14.151 4am-node18 <none> <none>
laion1b-test-2-pulsar-zookeeper-0 1/1 Running 0 74d 10.104.29.87 4am-node35 <none> <none>
laion1b-test-2-pulsar-zookeeper-1 1/1 Running 0 74d 10.104.21.124 4am-node24 <none> <none>
laion1b-test-2-pulsar-zookeeper-2 1/1 Running 0 74d 10.104.34.229 4am-node37 <none> <none>
- core dump file:
/tmp/cores/core-laion1b-test-2-milvus-querynode-0-7977c8fdbf-zfhhz-milvus-8-1708964037
of 4am-node39
Anything else?
No response
Perhaps it is because the dataCoord.channel.watchTimeoutInterval
configuration is modified and the milvus is restarted. I mean when the qn restarts it looks like the tests haven't started yet
/assign @liliu-z /unassign
/assign @foxspy /unassign @liliu-z
The root cause seems to be a concurrency bug between release and load. The qn release a segments while the index engine loading the index from file concurrently. And the index engine throwing an exception is as expected for the file not exist, which will not cause the qn coredump, and the actual cause of the coredump is the release operation.
/assign @yanliang567 /unassign
/assign @chyezh
Index 447990444064979058
belongs to Segment 447990444064723266
.
Node `` start to release segment while new load request is incoming.
[2024/02/26 16:13:34.948 +00:00] [INFO] [querynodev2/services.go:595] ["start to release segments"] [traceID=06cd8e504609e669a530c57535e33631] [collectionID=447902879639453431] [shard=laion1b-test-2-rootcoord-dml_5_447902879639453431v1] [segmentIDs="[447990444064723266]"] [currentNodeID=1681]
[2024/02/26 16:13:38.996 +00:00] [INFO] [querynodev2/services.go:433] ["received load segments request"] [traceID=7c9ecfdda5b9a070dc760feb81f2bf64] [collectionID=447902879639453431] [partitionID=447902879639453437] [shard=laion1b-test-2-rootcoord-dml_5_447902879639453431v1] [segmentID=447990444064723266] [currentNodeID=1681] [version=1708964018906870595] [needTransfer=false] [loadScope=Full]
Load repeat segment is checked by SegmentManager.
...
if len(loader.manager.Segment.GetBy(WithType(segmentType), WithID(segment.GetSegmentID()))) == 0 &&
!loader.loadingSegments.Contain(segment.GetSegmentID()) {
...
Release segment is remove the segment from SegmentManager then release the memory.
case querypb.DataScope_Historical:
sealed = mgr.removeSegmentWithType(SegmentTypeSealed, segmentID)
if sealed != nil {
removeSealed = 1
}
mgr.updateMetric()
mgr.mu.Unlock()
if sealed != nil {
remove(sealed)
}
Concurrent load and release happens.
Short-term fix: Implement mutual exclusivity between Release and Load on QN; Long-term, it is necessary to implement lifecycle controls such as Loading, Loaded, Release state of Segment on QueryCoord.
/unassign
@chyezh Loading segment will not be released in segment manager. In my opinion, concurrent load&release shall not happen for same segment. Could you please explain the detail how it went?
@chyezh Loading segment will not be released in segment manager. In my opinion, concurrent load&release shall not happen for same segment. Could you please explain the detail how it went?
Load is triggered when segment is releasing, but not release is triggered when segment is loading.
The release segment operation is divided into two steps on query node.
- Removing the segment from the segmentManager (after this, the SegmentLoader is allowed to reload this segment),
- Releasing the actual segment.
@chyezh Loading segment will not be released in segment manager. In my opinion, concurrent load&release shall not happen for same segment. Could you please explain the detail how it went?
Load is triggered when segment is releasing, but not release is triggered when segment is loading.
The release segment operation is divided into two steps on query node.
1. Removing the segment from the segmentManager (after this, the SegmentLoader is allowed to reload this segment), 2. Releasing the actual segment.
@chyezh got it, thanks!
After some offline discussion, the final solution shall be separating the disk resource for different segment life-cycle.
One more thing, it's looks weird that a segment is released than loaded back. Maybe the segment was bouncing between querynode?
After some offline discussion, the final solution shall be separating the disk resource for different segment life-cycle.
One more thing, it's looks weird that a segment is released than loaded back. Maybe the segment was bouncing between querynode?
- segment is released on QN for collection released.
- segment is reloaded for segment checker(lack of segment), updated by
Distribution
?
[2024/02/26 16:13:34.498 +00:00] [INFO] [task/scheduler.go:269] ["task added"] [task="[id=1708948067586] [type=Reduce] [source=segment_checker] [reason=collection released] [collectionID=447902879639453431] [replicaID=-1] [priority=Normal] [actionsCount=1] [actions={[type=Reduce][node=1681][streaming=false]}] [segmentID=447990444064723266]"]
[2024/02/26 16:13:38.501 +00:00] [INFO] [task/scheduler.go:269] ["task added"] [task="[id=1708948067608] [type=Grow] [source=segment_checker] [reason=lacks of segment] [collectionID=447902879639453431] [replicaID=447990457955778562] [priority=Normal] [actionsCount=1] [actions={[type=Grow][node=1681][streaming=false]}] [segmentID=447990444064723266]"]
Release then load collection. Or concurrent release and load collection can reproduce it.
2024-02-27 00:13:34.487 [2024/02/26 16:13:34.487 +00:00] [INFO] [querycoordv2/services.go:254] ["release collection request received"] [traceID=458948ca161f98b29f6d8118b6001ae5] [collectionID=447902879639453431]
2024-02-27 00:13:34.498 [2024/02/26 16:13:34.498 +00:00] [INFO] [task/scheduler.go:269] ["task added"] [task="[id=1708948067586] [type=Reduce] [source=segment_checker] [reason=collection released] [collectionID=447902879639453431] [replicaID=-1] [priority=Normal] [actionsCount=1] [actions={[type=Reduce][node=1681][streaming=false]}] [segmentID=447990444064723266]"]
2024-02-27 00:13:34.976 [2024/02/26 16:13:34.976 +00:00] [INFO] [task/executor.go:104] ["execute the action of task"] [taskID=1708948067586] [collectionID=447902879639453431] [replicaID=-1] [step=0] [source=segment_checker]
2024-02-27 00:13:34.977 [2024/02/26 16:13:34.976 +00:00] [INFO] [task/executor.go:298] ["release segment..."] [taskID=1708948067586] [collectionID=447902879639453431] [replicaID=-1] [segmentID=447990444064723266] [node=1681] [source=segment_checker]
2024-02-27 00:13:35.469 [2024/02/26 16:13:35.469 +00:00] [INFO] [task/scheduler.go:768] ["task removed"] [taskID=1708948067586] [collectionID=447902879639453431] [replicaID=-1] [status=succeeded] [segmentID=447990444064723266]
2024-02-27 00:13:35.470 [2024/02/26 16:13:35.470 +00:00] [WARN] [task/executor.go:301] ["failed to release segment, it may be a false failure"] [taskID=1708948067586] [collectionID=447902879639453431] [replicaID=-1] [segmentID=447990444064723266] [node=1681] [source=segment_checker] [error="stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:550 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:564 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:87 github.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]\n/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:192 github.com/milvus-io/milvus/internal/distributed/querynode/client.(*Client).ReleaseSegments\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:164 github.com/milvus-io/milvus/internal/querycoordv2/session.(*QueryCluster).ReleaseSegments.func1\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:271 github.com/milvus-io/milvus/internal/querycoordv2/session.(*QueryCluster).send\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:161 github.com/milvus-io/milvus/internal/querycoordv2/session.(*QueryCluster).ReleaseSegments\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/executor.go:299 github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).releaseSegment\n/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/executor.go:135 github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).executeSegmentAction: attempt #0: rpc error: code = Canceled desc = context canceled: context canceled"] [errorVerbose="stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace: attempt #0: rpc error: code = Canceled desc = context canceled: context canceled\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n | \t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:550\n | github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n | \t/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:564\n | github.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]\n | \t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:87\n | github.com/milvus-io/milvus/internal/distributed/querynode/client.(*Client).ReleaseSegments\n | \t/go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:192\n | github.com/milvus-io/milvus/internal/querycoordv2/session.(*QueryCluster).ReleaseSegments.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:164\n | github.com/milvus-io/milvus/internal/querycoordv2/session.(*QueryCluster).send\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:271\n | github.com/milvus-io/milvus/internal/querycoordv2/session.(*QueryCluster).ReleaseSegments\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:161\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).releaseSegment\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/executor.go:299\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).executeSegmentAction\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/executor.go:135\n | github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).Execute.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/executor.go:107\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (2) stack trace: /go/src/github.com/milvus-io/milvus/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/tracer.StackTrace\n | /go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:550 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call\n | /go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:564 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall\n | /go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:87 github.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]\n | /go/src/github.com/milvus-io/milvus/internal/distributed/querynode/client/client.go:192 github.com/milvus-io/milvus/internal/distributed/querynode/client.(*Client).ReleaseSegments\n | /go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:164 github.com/milvus-io/milvus/internal/querycoordv2/session.(*QueryCluster).ReleaseSegments.func1\n | /go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:271 github.com/milvus-io/milvus/internal/querycoordv2/session.(*QueryCluster).send\n | /go/src/github.com/milvus-io/milvus/internal/querycoordv2/session/cluster.go:161 github.com/milvus-io/milvus/internal/querycoordv2/session.(*QueryCluster).ReleaseSegments\n | /go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/executor.go:299 github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).releaseSegment\n | /go/src/github.com/milvus-io/milvus/internal/querycoordv2/task/executor.go:135 github.com/milvus-io/milvus/internal/querycoordv2/task.(*Executor).executeSegmentAction\nWraps: (3) attempt #0: rpc error: code = Canceled desc = context canceled\nWraps: (4) context canceled\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) merr.multiErrors (4) *errors.errorString"]
2024-02-27 00:13:35.790 [2024/02/26 16:13:35.790 +00:00] [INFO] [querycoordv2/services.go:197] ["load collection request received"] [traceID=6019913291c1f2be024e5909f5edd21f] [collectionID=447902879639453431] [replicaNumber=1] [resourceGroups="[]"] [refreshMode=false] [schema="name:\"laion_stable_4\" fields:<fieldID:100 name:\"id\" is_primary_key:true data_type:Int64 > fields:<fieldID:101 name:\"float_vector\" data_type:FloatVector type_params:<key:\"dim\" value:\"768\" > > fields:<fieldID:102 name:\"int64_pk_5b\" data_type:Int64 is_partition_key:true > fields:<fieldID:103 name:\"varchar_caption\" data_type:VarChar type_params:<key:\"max_length\" value:\"8192\" > > fields:<fieldID:104 name:\"varchar_NSFW\" data_type:VarChar type_params:<key:\"max_length\" value:\"8192\" > > fields:<fieldID:105 name:\"float64_similarity\" data_type:Float > fields:<fieldID:106 name:\"int64_width\" data_type:Int64 > fields:<fieldID:107 name:\"int64_height\" data_type:Int64 > fields:<fieldID:108 name:\"int64_original_width\" data_type:Int64 > fields:<fieldID:109 name:\"int64_original_height\" data_type:Int64 > fields:<fieldID:110 name:\"varchar_md5\" data_type:VarChar type_params:<key:\"max_length\" value:\"8192\" > > fields:<fieldID:111 name:\"$meta\" description:\"dynamic schema\" data_type:JSON is_dynamic:true > enable_dynamic_field:true "] [fieldIndexes="[447902879639453513,447902879639453519,447902879639453502,447902879639453508]"]
2024-02-27 00:13:38.501 [2024/02/26 16:13:38.501 +00:00] [INFO] [task/scheduler.go:269] ["task added"] [task="[id=1708948067608] [type=Grow] [source=segment_checker] [reason=lacks of segment] [collectionID=447902879639453431] [replicaID=447990457955778562] [priority=Normal] [actionsCount=1] [actions={[type=Grow][node=1681][streaming=false]}] [segmentID=447990444064723266]"]
2024-02-27 00:13:38.608 [2024/02/26 16:13:38.608 +00:00] [INFO] [task/executor.go:104] ["execute the action of task"] [taskID=1708948067608] [collectionID=447902879639453431] [replicaID=447990457955778562] [step=0] [source=segment_checker]
2024-02-27 00:13:38.906 [2024/02/26 16:13:38.906 +00:00] [INFO] [task/executor.go:230] ["load segments..."] [taskID=1708948067608] [collectionID=447902879639453431] [replicaID=447990457955778562] [segmentID=447990444064723266] [node=1681] [source=segment_checker] [shardLeader=1679]
2024-02-27 00:14:02.610 [2024/02/26 16:14:02.609 +00:00] [WARN] [task/executor.go:238] ["failed to load segment"] [taskID=1708948067608] [collectionID=447902879639453431] [replicaID=447990457955778562] [segmentID=447990444064723266] [node=1681] [source=segment_checker] [shardLeader=1679] [error="unrecoverable error"]
2024-02-27 00:14:02.610 [2024/02/26 16:14:02.609 +00:00] [INFO] [task/executor.go:119] ["execute action done, remove it"] [taskID=1708948067608] [step=0] [error="unrecoverable error"]
2024-02-27 00:14:02.623 [2024/02/26 16:14:02.623 +00:00] [WARN] [task/scheduler.go:727] ["task scheduler recordSegmentTaskError"] [taskID=1708948067608] [collectionID=447902879639453431] [replicaID=447990457955778562] [segmentID=447990444064723266] [status=failed] [error="unrecoverable error"]
2024-02-27 00:14:02.623 [2024/02/26 16:14:02.623 +00:00] [INFO] [task/scheduler.go:768] ["task removed"] [taskID=1708948067608] [collectionID=447902879639453431] [replicaID=447990457955778562] [status=failed] [segmentID=447990444064723266]
@chyezh
- image: cardinal-milvus-io-2.3-3c90475-20240311
- queryNode
laion1b-test-2-milvus-querynode-1-86cfff6f5d-7b2lv
terminated with 134 error at 2024-03-12 16:02:40.814(UTC)
Short-term fix: Implement mutual exclusivity between Release and Load on QN;
should be fixed at 2.4.5, please verify it.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.