milvus
milvus copied to clipboard
[Bug]: service exited with a WARN [datanode/flush_manager.go:853] ["failed to SaveBinlogPaths"]
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: 2.2.5
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
I found milvus-standalone exited from docker compose.
# docker compose ps
NAME COMMAND SERVICE STATUS PORTS
milvus-etcd "etcd -advertise-cli…" etcd running 2379-2380/tcp
milvus-minio "/usr/bin/docker-ent…" minio running (healthy) 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp
milvus-standalone "/tini -- milvus run…" standalone exited (2)
It work nearly three days from startup to exited. I don't think I did anything special to the program other than use the SDK's compact data manually method. I just followed the official documentation and used the milvus-sdk-java 2.2.4 to operate Milvus. For example, manage collections, manage Partitions, manage data. I drop a partition with 2000 rows of data 3 days ago. Before the program exited, I inserted several pieces of data from Minio in bulk and used the SDK to delete some of them with primary key. Finally, there were about 2000 pieces of data left. I tried the compression method provided by the SDK and wanted to see the effect.
final ManualCompactParam param = ManualCompactParam.newBuilder()
.withCollectionName(collectionName)
.build();
final R<ManualCompactionResponse> responseR = milvusClient.manualCompact(param);
return responseR.getData().getCompactionID();
Then I found that the service exits out of docker compose. And I'm not sure if that's what caused it.
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
[2023/04/06 03:22:21.589 +00:00] [DEBUG] [rootcoord/task.go:68] ["tr/ddl request"] [msg="enqueue done"] [duration=6.672µs] [2023/04/06 03:22:21.589 +00:00] [DEBUG] [rootcoord/task.go:70] ["tr/ddl request"] [msg="start to process"] [duration=92.53µs] [2023/04/06 03:22:21.590 +00:00] [DEBUG] [etcd/etcd_kv.go:769] [tr/getEtcdMeta] [msg="getEtcdMeta done"] [duration=1.336416ms] [2023/04/06 03:22:21.593 +00:00] [DEBUG] [proxy/impl.go:3918] [Proxy.GetProxyMetrics] [traceID=7345293fd9c6c8d2] [nodeID=8] [req="{"metric_type":"system_info"}"] [metricType=system_info] [2023/04/06 03:22:21.598 +00:00] [INFO] [proxy/impl.go:4748] ["current rates in proxy"] [proxyNodeID=8] [rates="[{"rt":7,"r":1.7976931348623157e+308},{"rt":9,"r":1.7976931348623157e+308},{"rt":5,"r":1.7976931348623157e+308},{"rt":6,"r":1.7976931348623157e+308},{"rt":8,"r":1.7976931348623157e+308}]"] [2023/04/06 03:22:21.645 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=174.646µs] [2023/04/06 03:22:21.645 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=261.492µs] [2023/04/06 03:22:21.645 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=226.429µs] [2023/04/06 03:22:21.645 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=413.232µs] [2023/04/06 03:22:21.645 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=136.745µs] [2023/04/06 03:22:21.646 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=304.659µs] [2023/04/06 03:22:21.646 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=729.799µs] [2023/04/06 03:22:21.646 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=682.875µs] [2023/04/06 03:22:21.646 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=107.259µs] [2023/04/06 03:22:21.646 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=67.294µs] [2023/04/06 03:22:21.646 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=133.503µs] [2023/04/06 03:22:21.646 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=1.418087ms] [2023/04/06 03:22:21.647 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=411.959µs] [2023/04/06 03:22:21.647 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=564.66µs] [2023/04/06 03:22:21.647 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=435.996µs] [2023/04/06 03:22:21.647 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=873.59µs] [2023/04/06 03:22:21.745 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=234.509µs] [2023/04/06 03:22:21.745 +00:00] [DEBUG] [rmq/rmq_producer.go:47] ["tr/send msg to stream"] [msg="send msg to stream done"] [duration=379.266µs] [2023/04/06 03:22:21.788 +00:00] [DEBUG] [rootcoord/task.go:68] ["tr/ddl request"] [msg="enqueue done"] [duration=3.573µs] [2023/04/06 03:22:21.788 +00:00] [DEBUG] [rootcoord/task.go:70] ["tr/ddl request"] [msg="start to process"] [duration=81.48µs] [2023/04/06 03:22:21.831 +00:00] [WARN] [datanode/flush_manager.go:853] ["failed to SaveBinlogPaths"] ["segment ID"=440556259765922844] [error="All attempts results:\nattempt #1:segment 440556259765922844 not found\nattempt #2:segment 440556259765922844 not found\nattempt #3:segment 440556259765922844 not found\nattempt #4:segment 440556259765922844 not found\nattempt #5:segment 440556259765922844 not found\nattempt #6:segment 440556259765922844 not found\nattempt #7:segment 440556259765922844 not found\nattempt #8:segment 440556259765922844 not found\nattempt #9:segment 440556259765922844 not found\nattempt #10:segment 440556259765922844 not found\nattempt #11:segment 440556259765922844 not found\n"] milvus-standalone | panic: All attempts results: milvus-standalone | attempt #1:segment 440556259765922844 not found milvus-standalone | attempt #2:segment 440556259765922844 not found milvus-standalone | attempt #3:segment 440556259765922844 not found milvus-standalone | attempt #4:segment 440556259765922844 not found milvus-standalone | attempt #5:segment 440556259765922844 not found milvus-standalone | attempt #6:segment 440556259765922844 not found milvus-standalone | attempt #7:segment 440556259765922844 not found milvus-standalone | attempt #8:segment 440556259765922844 not found milvus-standalone | attempt #9:segment 440556259765922844 not found milvus-standalone | attempt #10:segment 440556259765922844 not found milvus-standalone | attempt #11:segment 440556259765922844 not found milvus-standalone | milvus-standalone | milvus-standalone | goroutine 19679713 [running]: milvus-standalone | github.com/milvus-io/milvus/internal/datanode.flushNotifyFunc.func1(0xc005c259f0) milvus-standalone | /go/src/github.com/milvus-io/milvus/internal/datanode/flush_manager.go:857 +0x13a8 milvus-standalone | github.com/milvus-io/milvus/internal/datanode.(*flushTaskRunner).waitFinish(0xc008b07200, 0xc006fe5e60, 0xc005a40eb0) milvus-standalone | /go/src/github.com/milvus-io/milvus/internal/datanode/flush_task.go:204 +0xdb milvus-standalone | created by github.com/milvus-io/milvus/internal/datanode.(*flushTaskRunner).init.func1 milvus-standalone | /go/src/github.com/milvus-io/milvus/internal/datanode/flush_task.go:121 +0xf1
Anything else?
No response
Drop partition only drop segment on DataCoord, but DataNode still load this segment. Delete or insert may trigger this dropped segment flush, and then panic.
/assign @aoiasd /unassign
Drop partition only drop segment on DataCoord, but DataNode still load this segment. Delete or insert may trigger this dropped segment flush, and then panic.
drop partition doesn't need to release a collection? WTF
@bigsheeper will this be solved in 2.3?
@bigsheeper will this be solved in 2.3?
2.3 will not panic because SaveBinLog path will return as success if segment was dropped, but will still use more resources because DataNode still keep this dropped segments.
Fixed.
Has this issue been resolved? I still have this error in milvus version 2.2.10. Can you help me resolve it
dataNode:
[2023/07/29 15:43:27.374 +00:00] [INFO] [datanode/flow_graph_insert_buffer_node.go:469] ["insertBufferNode syncing BufferData"] [segmentID=442776935130016392] [flushed=false] [dropped=false] [auto=false] [position="channel_name:\"by-dev-rootcoord-dml_444_442776935130016343v0\" msgID:\"\\010\\225\\227\\202\\001\\020\\214\\230\\001\\030\\000 \\000\" msgGroup:\"by-dev-dataNode-55-by-dev-rootcoord-dml_444_442776935130016343v0\" timestamp:443192549629755393 "] [channel=by-dev-rootcoord-dml_444_442776935130016343v0]
[2023/07/29 15:43:27.374 +00:00] [INFO] [datanode/flush_manager.go:287] ["handling insert task"] ["segment ID"=442776935130016392] [flushed=false] [dropped=false] [position="channel_name:\"by-dev-rootcoord-dml_444_442776935130016343v0\" msgID:\"\\010\\225\\227\\202\\001\\020\\214\\230\\001\\030\\000 \\000\" msgGroup:\"by-dev-dataNode-55-by-dev-rootcoord-dml_444_442776935130016343v0\" timestamp:443192549629755393 "]
[2023/07/29 15:43:27.374 +00:00] [INFO] [datanode/flush_manager.go:139] ["new flush task runner created and initialized"] ["segment ID"=442776935130016392] ["pos message ID"="\u0008\ufffd\ufffd\ufffd\u0001\u0010\ufffd\ufffd\u0001\u0018\u0000 \u0000"]
[2023/07/29 15:43:27.374 +00:00] [INFO] [datanode/flush_task.go:134] ["running flush insert task"] ["segment ID"=442776935130016392] [flushed=false] [dropped=false] [position="channel_name:\"by-dev-rootcoord-dml_444_442776935130016343v0\" msgID:\"\\010\\225\\227\\202\\001\\020\\214\\230\\001\\030\\000 \\000\" msgGroup:\"by-dev-dataNode-55-by-dev-rootcoord-dml_444_442776935130016343v0\" timestamp:443192549629755393 "] [PosTime=2023/07/29 15:43:27.218 +00:00]
[2023/07/29 15:43:27.374 +00:00] [INFO] [datanode/channel_meta.go:453] ["roll pk stats"] ["segment id"=442776935130016392]
[2023/07/29 15:43:27.374 +00:00] [INFO] [datanode/flow_graph_delete_node.go:118] ["DeleteNode receives flush message"] [segIDs="[442776935130016392]"] [vChannelName=by-dev-rootcoord-dml_444_442776935130016343v0] [posTime=2023/07/29 15:43:27.218 +00:00]
[2023/07/29 15:43:27.374 +00:00] [INFO] [datanode/flush_manager.go:314] ["handling delete task"] ["segment ID"=442776935130016392]
[2023/07/29 15:43:27.451 +00:00] [WARN] [datanode/flush_manager.go:853] ["failed to SaveBinlogPaths"] ["segment ID"=442776935130016392] [error="All attempts results:\nattempt #1:segment 442776935130016392 not found\nattempt #2:segment 442776935130016392 not found\nattempt #3:segment 442776935130016392 not found\nattempt #4:segment 442776935130016392 not found\nattempt #5:segment 442776935130016392 not found\nattempt #6:segment 442776935130016392 not found\nattempt #7:segment 442776935130016392 not found\nattempt #8:segment 442776935130016392 not found\nattempt #9:segment 442776935130016392 not found\nattempt #10:segment 442776935130016392 not found\nattempt #11:segment 442776935130016392 not found\n"]
panic: All attempts results:
attempt #1:segment 442776935130016392 not found
attempt #2:segment 442776935130016392 not found
attempt #3:segment 442776935130016392 not found
attempt #4:segment 442776935130016392 not found
attempt #5:segment 442776935130016392 not found
attempt #6:segment 442776935130016392 not found
attempt #7:segment 442776935130016392 not found
attempt #8:segment 442776935130016392 not found
attempt #9:segment 442776935130016392 not found
attempt #10:segment 442776935130016392 not found
attempt #11:segment 442776935130016392 not found
goroutine 6092910 [running]:
github.com/milvus-io/milvus/internal/datanode.flushNotifyFunc.func1(0xc09a773ae0)
/go/src/github.com/milvus-io/milvus/internal/datanode/flush_manager.go:857 +0x13f6
github.com/milvus-io/milvus/internal/datanode.(*flushTaskRunner).waitFinish(0xc0c102e180, 0xc011be0a80, 0xc0a69764d0)
/go/src/github.com/milvus-io/milvus/internal/datanode/flush_task.go:204 +0xdb
created by github.com/milvus-io/milvus/internal/datanode.(*flushTaskRunner).init.func1
/go/src/github.com/milvus-io/milvus/internal/datanode/flush_task.go:121 +0xf1
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.