milvus
milvus copied to clipboard
[Bug]: panic when ack of broadcaster
Is there an existing issue for this?
- [x] I have searched the existing issues
Environment
- Milvus version: f8c972a102d82878fdfadbbacf23f2127fb29d20
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
panic: close of closed channel | panic: close of closed channel |
-- | -- | --
panic: close of closed channel |
(no unique labels)(no unique labels)goroutine 1180 gp=0xc00337f880 m=8 mp=0xc000a80008 [running]:(no unique labels)panic({0x6f776e0?, 0x81e65a0?})(no unique labels) /go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:811 +0x168 fp=0xc002dcf2e0 sp=0xc002dcf230 pc=0x28589e8(no unique labels)runtime.closechan(0xc000f4df10)(no unique labels) /go/pkg/mod/golang.org/[email protected]/src/runtime/chan.go:422 +0x3cf fp=0xc002dcf338 sp=0xc002dcf2e0 pc=0x27ec48f(no unique labels)github.com/milvus-io/milvus/internal/streamingcoord/server/broadcaster.(*broadcastTask).ack(0xc000fa7080, {0x8254818, 0xc003f81c50}, {0xc0029a54f0?, 0xffffffffffffffff?, 0x100c0029a53e8?})(no unique labels) /workspace/source/internal/streamingcoord/server/broadcaster/broadcast_task.go:251 +0xf7 fp=0xc002dcf380 sp=0xc002dcf338 pc=0x5d9e157(no unique labels)github.com/milvus-io/milvus/internal/streamingcoord/server/broadcaster.(*broadcastTask).Ack(0xc000feadc0?, {0x8254818?, 0xc003f81c50?}, {0xc0029a54f0?, 0x831f900?, 0xc0029a5458?})(no unique labels) /workspace/source/internal/streamingcoord/server/broadcaster/broadcast_task.go:228 +0xe5 fp=0xc002dcf3f8 sp=0xc002dcf380 pc=0x5d9df65(no unique labels)github.com/milvus-io/milvus/internal/streamingcoord/server/broadcaster.(*broadcastTaskManager).Ack(0xc000feadc0, {0x8254818, 0xc003f81c50}, {0x82d9280, 0xc003fab5c0})(no unique labels) /workspace/source/internal/streamingcoord/server/broadcaster/broadcast_manager.go:217 +0x330 fp=0xc002dcf530 sp=0xc002dcf3f8 pc=0x5d9a310(no unique labels)github.com/milvus-io/milvus/internal/streamingcoord/server/service.(*broadcastServceImpl).Ack(0xb51b590?, {0x82547a8, 0xc003fab530}, 0xc0040960a0)(no unique labels) /workspace/source/internal/streamingcoord/server/service/broadcast.go:67 +0x18a fp=0xc002dcf5c8 sp=0xc002dcf530 pc=0x6184aaa(no unique labels)github.com/milvus-io/milvus/pkg/v2/proto/streamingpb._StreamingCoordBroadcastService_Ack_Handler.func1({0x82547a8?, 0xc003fab530?}, {0x74c3b20?, 0xc0040960a0?})(no unique labels) /workspace/source/pkg/proto/streamingpb/streaming_grpc.pb.go:216 +0xcb fp=0xc002dcf600 sp=0xc002dcf5c8 pc=0x37b9b8b(no unique labels)github.com/milvus-io/milvus/internal/distributed/mixcoord.(*Server).startGrpcLoop.ServerIDValidationUnaryServerInterceptor.func8({0x82547a8, 0xc003fab530}, {0x74c3b20, 0xc0040960a0}, 0x38b8c54?, 0xc003f9e648) | | | (no unique labels) | | | | | (no unique labels) | goroutine 1180 gp=0xc00337f880 m=8 mp=0xc000a80008 [running]: | | | | (no unique labels) | panic({0x6f776e0?, 0x81e65a0?}) | | | | (no unique labels) | /go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:811 +0x168 fp=0xc002dcf2e0 sp=0xc002dcf230 pc=0x28589e8 | | | | (no unique labels) | runtime.closechan(0xc000f4df10) | | | | (no unique labels) | /go/pkg/mod/golang.org/[email protected]/src/runtime/chan.go:422 +0x3cf fp=0xc002dcf338 sp=0xc002dcf2e0 pc=0x27ec48f | | | | (no unique labels) | github.com/milvus-io/milvus/internal/streamingcoord/server/broadcaster.(*broadcastTask).ack(0xc000fa7080, {0x8254818, 0xc003f81c50}, {0xc0029a54f0?, 0xffffffffffffffff?, 0x100c0029a53e8?}) | | | | (no unique labels) | /workspace/source/internal/streamingcoord/server/broadcaster/broadcast_task.go:251 +0xf7 fp=0xc002dcf380 sp=0xc002dcf338 pc=0x5d9e157 | | | | (no unique labels) | github.com/milvus-io/milvus/internal/streamingcoord/server/broadcaster.(*broadcastTask).Ack(0xc000feadc0?, {0x8254818?, 0xc003f81c50?}, {0xc0029a54f0?, 0x831f900?, 0xc0029a5458?}) | | | | (no unique labels) | /workspace/source/internal/streamingcoord/server/broadcaster/broadcast_task.go:228 +0xe5 fp=0xc002dcf3f8 sp=0xc002dcf380 pc=0x5d9df65 | | | | (no unique labels) | github.com/milvus-io/milvus/internal/streamingcoord/server/broadcaster.(*broadcastTaskManager).Ack(0xc000feadc0, {0x8254818, 0xc003f81c50}, {0x82d9280, 0xc003fab5c0}) | | | | (no unique labels) | /workspace/source/internal/streamingcoord/server/broadcaster/broadcast_manager.go:217 +0x330 fp=0xc002dcf530 sp=0xc002dcf3f8 pc=0x5d9a310 | | | | (no unique labels) | github.com/milvus-io/milvus/internal/streamingcoord/server/service.(*broadcastServceImpl).Ack(0xb51b590?, {0x82547a8, 0xc003fab530}, 0xc0040960a0) | | | | (no unique labels) | /workspace/source/internal/streamingcoord/server/service/broadcast.go:67 +0x18a fp=0xc002dcf5c8 sp=0xc002dcf530 pc=0x6184aaa | | | | (no unique labels) | github.com/milvus-io/milvus/pkg/v2/proto/streamingpb._StreamingCoordBroadcastService_Ack_Handler.func1({0x82547a8?, 0xc003fab530?}, {0x74c3b20?, 0xc0040960a0?}) | | | | (no unique labels) | /workspace/source/pkg/proto/streamingpb/streaming_grpc.pb.go:216 +0xcb fp=0xc002dcf600 sp=0xc002dcf5c8 pc=0x37b9b8b | | | | (no unique labels) | github.com/milvus-io/milvus/internal/distributed/mixcoord.(*Server).startGrpcLoop.ServerIDValidationUnaryServerInterceptor.func8({0x82547a8, 0xc003fab530}, {0x74c3b20, 0xc0040960a0}, 0x38b8c54?, 0xc003f9e648) |
| | (no unique labels) | |
| | (no unique labels) | goroutine 1180 gp=0xc00337f880 m=8 mp=0xc000a80008 [running]: |
| | (no unique labels) | panic({0x6f776e0?, 0x81e65a0?}) |
| | (no unique labels) | /go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:811 +0x168 fp=0xc002dcf2e0 sp=0xc002dcf230 pc=0x28589e8 |
| | (no unique labels) | runtime.closechan(0xc000f4df10) |
| | (no unique labels) | /go/pkg/mod/golang.org/[email protected]/src/runtime/chan.go:422 +0x3cf fp=0xc002dcf338 sp=0xc002dcf2e0 pc=0x27ec48f |
| | (no unique labels) | github.com/milvus-io/milvus/internal/streamingcoord/server/broadcaster.(*broadcastTask).ack(0xc000fa7080, {0x8254818, 0xc003f81c50}, {0xc0029a54f0?, 0xffffffffffffffff?, 0x100c0029a53e8?}) |
| | (no unique labels) | /workspace/source/internal/streamingcoord/server/broadcaster/broadcast_task.go:251 +0xf7 fp=0xc002dcf380 sp=0xc002dcf338 pc=0x5d9e157 |
| | (no unique labels) | github.com/milvus-io/milvus/internal/streamingcoord/server/broadcaster.(*broadcastTask).Ack(0xc000feadc0?, {0x8254818?, 0xc003f81c50?}, {0xc0029a54f0?, 0x831f900?, 0xc0029a5458?}) |
| | (no unique labels) | /workspace/source/internal/streamingcoord/server/broadcaster/broadcast_task.go:228 +0xe5 fp=0xc002dcf3f8 sp=0xc002dcf380 pc=0x5d9df65 |
| | (no unique labels) | github.com/milvus-io/milvus/internal/streamingcoord/server/broadcaster.(*broadcastTaskManager).Ack(0xc000feadc0, {0x8254818, 0xc003f81c50}, {0x82d9280, 0xc003fab5c0}) |
| | (no unique labels) | /workspace/source/internal/streamingcoord/server/broadcaster/broadcast_manager.go:217 +0x330 fp=0xc002dcf530 sp=0xc002dcf3f8 pc=0x5d9a310 |
| | (no unique labels) | github.com/milvus-io/milvus/internal/streamingcoord/server/service.(*broadcastServceImpl).Ack(0xb51b590?, {0x82547a8, 0xc003fab530}, 0xc0040960a0) |
| | (no unique labels) | /workspace/source/internal/streamingcoord/server/service/broadcast.go:67 +0x18a fp=0xc002dcf5c8 sp=0xc002dcf530 pc=0x6184aaa |
| | (no unique labels) | github.com/milvus-io/milvus/pkg/v2/proto/streamingpb._StreamingCoordBroadcastService_Ack_Handler.func1({0x82547a8?, 0xc003fab530?}, {0x74c3b20?, 0xc0040960a0?}) |
| | (no unique labels) | /workspace/source/pkg/proto/streamingpb/streaming_grpc.pb.go:216 +0xcb fp=0xc002dcf600 sp=0xc002dcf5c8 pc=0x37b9b8b |
| | (no unique labels) | github.com/milvus-io/milvus/internal/distributed/mixcoord.(*Server).startGrpcLoop.ServerIDValidationUnaryServerInterceptor.func8({0x82547a8, 0xc003fab530}, {0x74c3b20, 0xc0040960a0}, 0x38b8c54?, 0xc003f9e648) |
<br class="Apple-interchange-newline">
Expected Behavior
No response
Steps To Reproduce
Milvus Log
No response
Anything else?
No response
/assign @chyezh
Meanwhile, the tombstone kept of downstream is too less. we need more tombstone to avoid double acked.
2025-11-18 11:14:49.063 [2025/11/18 03:14:49.063 +00:00] [INFO] [broadcaster/broadcast_task.go:425] ["save broadcast task done"] [module=streamingcoord] [component=broadcaster] [message="{type=CreateCollection,vchannel=cdc-test-downstream-390-rootcoord-dml_12_462274027807775233v0,timetick=462274085208981522,broadcastID=462274027807573005,broadcastVChannels=cdc-test-downstream-390-rootcoord-dml_0_vcchan,cdc-test-downstream-390-rootcoord-dml_12_462274027807775233v0,cdc-test-downstream-390-rootcoord-dml_13_462274027807775233v1,rClusterID=cdc-test-upstream-390,rMessageID=4,rLastConfirmedMessageID=2,rTimeTick=462274085172019219,rVchannel=cdc-test-upstream-390-rootcoord-dml_12_462274027807775233v0,size=1084,collectionID=462274027807775233}"] [state=BROADCAST_TASK_STATE_REPLICATED] [ackedVChannelCount=1]
2025-11-18 11:14:49.069 [2025/11/18 03:14:49.069 +00:00] [INFO] [broadcaster/broadcast_task.go:425] ["save broadcast task done"] [module=streamingcoord] [component=broadcaster] [message="{type=CreateCollection,vchannel=cdc-test-downstream-390-rootcoord-dml_12_462274027807775233v0,timetick=462274085208981522,broadcastID=462274027807573005,broadcastVChannels=cdc-test-downstream-390-rootcoord-dml_0_vcchan,cdc-test-downstream-390-rootcoord-dml_12_462274027807775233v0,cdc-test-downstream-390-rootcoord-dml_13_462274027807775233v1,rClusterID=cdc-test-upstream-390,rMessageID=4,rLastConfirmedMessageID=2,rTimeTick=462274085172019219,rVchannel=cdc-test-upstream-390-rootcoord-dml_12_462274027807775233v0,size=1084,collectionID=462274027807775233}"] [state=BROADCAST_TASK_STATE_REPLICATED] [ackedVChannelCount=2]
2025-11-18 11:14:49.301 [2025/11/18 03:14:49.301 +00:00] [INFO] [broadcaster/broadcast_task.go:425] ["save broadcast task done"] [module=streamingcoord] [component=broadcaster] [message="{type=CreateCollection,vchannel=cdc-test-downstream-390-rootcoord-dml_12_462274027807775233v0,timetick=462274085208981522,broadcastID=462274027807573005,broadcastVChannels=cdc-test-downstream-390-rootcoord-dml_0_vcchan,cdc-test-downstream-390-rootcoord-dml_12_462274027807775233v0,cdc-test-downstream-390-rootcoord-dml_13_462274027807775233v1,rClusterID=cdc-test-upstream-390,rMessageID=4,rLastConfirmedMessageID=2,rTimeTick=462274085172019219,rVchannel=cdc-test-upstream-390-rootcoord-dml_12_462274027807775233v0,size=1084,collectionID=462274027807775233}"] [state=BROADCAST_TASK_STATE_REPLICATED] [ackedVChannelCount=3]
2025-11-18 11:14:49.301 [2025/11/18 03:14:49.301 +00:00] [INFO] [broadcaster/ack_callback_scheduler.go:145] ["start to execute ack callback"] [module=streamingcoord] [component=broadcaster] [broadcastID=462274027807573005]
2025-11-18 11:14:49.301 [2025/11/18 03:14:49.301 +00:00] [DEBUG] [broadcaster/ack_callback_scheduler.go:149] ["all vchannels are acked"] [module=streamingcoord] [component=broadcaster] [broadcastID=462274027807573005]
2025-11-18 11:14:49.329 [2025/11/18 03:14:49.329 +00:00] [DEBUG] [broadcaster/ack_callback_scheduler.go:167] ["ack callback done"] [module=streamingcoord] [component=broadcaster] [broadcastID=462274027807573005]
2025-11-18 11:14:49.331 [2025/11/18 03:14:49.330 +00:00] [INFO] [broadcaster/broadcast_task.go:425] ["save broadcast task done"] [module=streamingcoord] [component=broadcaster] [message="{type=CreateCollection,vchannel=cdc-test-downstream-390-rootcoord-dml_12_462274027807775233v0,timetick=462274085208981522,broadcastID=462274027807573005,broadcastVChannels=cdc-test-downstream-390-rootcoord-dml_0_vcchan,cdc-test-downstream-390-rootcoord-dml_12_462274027807775233v0,cdc-test-downstream-390-rootcoord-dml_13_462274027807775233v1,rClusterID=cdc-test-upstream-390,rMessageID=4,rLastConfirmedMessageID=2,rTimeTick=462274085172019219,rVchannel=cdc-test-upstream-390-rootcoord-dml_12_462274027807775233v0,size=1084,collectionID=462274027807775233}"] [state=BROADCAST_TASK_STATE_TOMBSTONE] [ackedVChannelCount=3]
2025-11-18 11:14:49.331 [2025/11/18 03:14:49.330 +00:00] [INFO] [broadcaster/ack_callback_scheduler.go:140] ["execute ack callback done"] [module=streamingcoord] [component=broadcaster] [broadcastID=462274027807573005]
2025-11-18 11:17:02.551 [2025/11/18 03:17:02.551 +00:00] [DEBUG] [broadcaster/broadcast_manager.go:211] ["task is tombstone, ignored the ack request"] [module=streamingcoord] [component=broadcaster] [broadcastID=462274027807573005] [vchannel=cdc-test-downstream-390-rootcoord-dml_0_vcchan]
2025-11-18 11:21:36.902 [2025/11/18 03:21:36.902 +00:00] [INFO] [broadcaster/broadcast_task.go:425] ["save broadcast task done"] [module=streamingcoord] [component=broadcaster] [message="{type=CreateCollection,vchannel=cdc-test-downstream-390-rootcoord-dml_12_462274027807775233v0,timetick=462274085208981522,broadcastID=462274027807573005,broadcastVChannels=cdc-test-downstream-390-rootcoord-dml_0_vcchan,cdc-test-downstream-390-rootcoord-dml_12_462274027807775233v0,cdc-test-downstream-390-rootcoord-dml_13_462274027807775233v1,rClusterID=cdc-test-upstream-390,rMessageID=4,rLastConfirmedMessageID=2,rTimeTick=462274085172019219,rVchannel=cdc-test-upstream-390-rootcoord-dml_12_462274027807775233v0,size=1084,collectionID=462274027807775233}"] [state=BROADCAST_TASK_STATE_DONE] [ackedVChannelCount=3]
2025-11-18 11:24:46.391 [2025/11/18 03:24:46.391 +00:00] [INFO] [broadcaster/broadcast_task.go:425] ["save broadcast task done"] [module=streamingcoord] [component=broadcaster] [message="{type=CreateCollection,vchannel=cdc-test-downstream-390-rootcoord-dml_0_vcchan,timetick=462274085261410334,broadcastID=462274027807573005,broadcastVChannels=cdc-test-downstream-390-rootcoord-dml_0_vcchan,cdc-test-downstream-390-rootcoord-dml_12_462274027807775233v0,cdc-test-downstream-390-rootcoord-dml_13_462274027807775233v1,rClusterID=cdc-test-upstream-390,rMessageID=67,rLastConfirmedMessageID=65,rTimeTick=462274085172019220,rVchannel=cdc-test-upstream-390-rootcoord-dml_0_vcchan,size=1084,collectionID=462274027807775233}"] [state=BROADCAST_TASK_STATE_REPLICATED] [ackedVChannelCount=1]
2025-11-18 11:24:46.391 [2025/11/18 03:24:46.391 +00:00] [INFO] [broadcaster/ack_callback_scheduler.go:145] ["start to execute ack callback"] [module=streamingcoord] [component=broadcaster] [broadcastID=462274027807573005]
2025-11-18 11:26:35.280 [2025/11/18 03:26:35.280 +00:00] [WARN] [broadcaster/ack_callback_scheduler.go:142] ["execute ack callback failed"] [module=streamingcoord] [component=broadcaster] [broadcastID=462274027807573005] [error="context canceled"]
2025-11-18 11:26:39.551 [2025/11/18 03:26:39.536 +00:00] [INFO] [broadcaster/ack_callback_scheduler.go:145] ["start to execute ack callback"] [module=streamingcoord] [component=broadcaster] [broadcastID=462274027807573005]
verification passed.
https://qa-jenkins.milvus.io/job/milvus_cdc_chaos_test/392/
should be fixed.