milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: Flush hang after kafka pod kill chaos test

Open zhuwenxing opened this issue 1 year ago • 2 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.2.0-20230427-a7c44b29
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):kafka   
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

image

[2023/04/27 23:05:51.231 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:51.301 +00:00] [DEBUG] [proxy/impl.go:3956] [Proxy.GetProxyMetrics] [traceID=47c919f2ff367a93] [nodeID=11] [req="{\"metric_type\":\"system_info\"}"] [metricType=system_info]
[2023/04/27 23:05:51.305 +00:00] [INFO] [proxy/impl.go:4798] ["current rates in proxy"] [proxyNodeID=11] [rates="[{\"rt\":5,\"r\":1.7976931348623157e+308},{\"rt\":7,\"r\":1.7976931348623157e+308},{\"rt\":9,\"r\":1.7976931348623157e+308},{\"rt\":6,\"r\":1.7976931348623157e+308},{\"rt\":8,\"r\":1.7976931348623157e+308}]"]
[2023/04/27 23:05:51.733 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:51.733 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:52.235 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:52.235 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:52.737 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:52.737 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:53.238 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:53.238 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:53.740 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:53.740 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:54.257 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:54.258 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:54.302 +00:00] [DEBUG] [proxy/impl.go:3956] [Proxy.GetProxyMetrics] [traceID=b1f8afe7092c883] [nodeID=11] [req="{\"metric_type\":\"system_info\"}"] [metricType=system_info]
[2023/04/27 23:05:54.306 +00:00] [INFO] [proxy/impl.go:4798] ["current rates in proxy"] [proxyNodeID=11] [rates="[{\"rt\":8,\"r\":1.7976931348623157e+308},{\"rt\":5,\"r\":1.7976931348623157e+308},{\"rt\":7,\"r\":1.7976931348623157e+308},{\"rt\":9,\"r\":1.7976931348623157e+308},{\"rt\":6,\"r\":1.7976931348623157e+308}]"]
[2023/04/27 23:05:54.759 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:54.760 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:55.261 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:55.261 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:55.763 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:55.763 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:56.265 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:56.265 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:56.767 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:56.767 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:57.268 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:57.268 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:57.301 +00:00] [DEBUG] [proxy/impl.go:3956] [Proxy.GetProxyMetrics] [traceID=3adfc9bba99d939b] [nodeID=11] [req="{\"metric_type\":\"system_info\"}"] [metricType=system_info]
[2023/04/27 23:05:57.319 +00:00] [INFO] [proxy/impl.go:4798] ["current rates in proxy"] [proxyNodeID=11] [rates="[{\"rt\":5,\"r\":1.7976931348623157e+308},{\"rt\":7,\"r\":1.7976931348623157e+308},{\"rt\":9,\"r\":1.7976931348623157e+308},{\"rt\":6,\"r\":1.7976931348623157e+308},{\"rt\":8,\"r\":1.7976931348623157e+308}]"]
[2023/04/27 23:05:57.770 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:57.770 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:58.272 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]
[2023/04/27 23:05:58.272 +00:00] [INFO] [proxy/impl.go:4116] ["received get flush state response"] [response="status:<> "]
[2023/04/27 23:05:58.454 +00:00] [INFO] [proxy/impl.go:4082] ["received ManualCompaction response"] [collectionID=441092752585725582] [resp="status:<> compactionID:441092752585740867 "] []
[2023/04/27 23:05:58.774 +00:00] [INFO] [proxy/impl.go:4102] ["received get flush state request"] [request="segmentIDs:441092752585726580 segmentIDs:441092752585726582 segmentIDs:441092752585727007 segmentIDs:441092752585727030 "]

Expected Behavior

all test cases passed

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-for-release-cron/detail/chaos-test-kafka-for-release-cron/3791/pipeline log: artifacts-kafka-pod-kill-3791-server-logs (1).tar.gz

artifacts-kafka-pod-kill-3791-pytest-logs.tar.gz

Anything else?

it works well in 2.2.0-20230426-8745ee25

image

zhuwenxing avatar Apr 28 '23 03:04 zhuwenxing

Not a stable issue. It succeeded when retrying image

zhuwenxing avatar Apr 28 '23 03:04 zhuwenxing

/assign @sunby /unassign

yanliang567 avatar Apr 28 '23 06:04 yanliang567

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Aug 03 '23 03:08 stale[bot]