milvus
milvus copied to clipboard
search timeout for a while
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:2.2.2
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): centos
- CPU/Memory: 6c32gb
- GPU:
- Others:
Current Behavior
The search is timeout for 3 mins when my cluster reach the business peak period.
- Here is the Search Latency dashboard:
- Here are parts of querynode metrics and Search Segment Latency is quite well:
- Here is Search Group NQ dashboard which is shaked at the same time:
This cluster is my online environment and I hope to providing a solution and reasons as soon as possible.Thanks a lot~
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
Uploading milvus.log.tar.gz… Uploading milvus.log.tar.gz…
Anything else?
No response
- Here are parts of runtime metrics
- Here are parts of runtime metrics
![]()
![]()
If you look at your metrics, seems that queue time is very large, which triggered search merge logic.
Did you have chance to try 2.2.8? I remember we have some fix on the search merge logic
- Here are parts of runtime metrics
![]()
![]()
If you look at your metrics, seems that queue time is very large, which triggered search merge logic.
Did you have chance to try 2.2.8? I remember we have some fix on the search merge logic Can I know which specific merge logic is causing it?
/assign @liliu-z
I encountered this problem again today .><.
Search is timeout for 2 min and at the same while, the "search reduce latency" metrics is quite high as you see in the following:
I wonder to know if this problem has been resolved in the latest version(2.2.8 or 2.2.9).
Looking forward to your reply~
I found that at the same time,the goroutine of this querynode is pretty high as you see:
I encountered this issue again when I upgraded to version 2.2.9...
any insertion or delete requests at that time? could you please share the cpu usage of querynodes
I have no insertion or delete requests at the same time. The cpu usage is showed as you see in the pic.
"Search Group NQ" at the same time.
seems that the nq group size increased during the business peak period. a known issue that we are trying to fix? @liliu-z
Hi @dzqoo , Looks like that we don't have any QPS bumping during this time, and from the info we have right now, I don't have too much ideas. Can you help screenshot all graph looks abnormal during that time? Also can you check if there is any growing segment that time? Thanks
More info is needed, especially for querynode. Also resource usage graph you provided, like CPU, Memeory, I didn't querynode inside them. Can you help provide some infos about this? Thanks
There are all metrics.FYI~
@dzqoo Can I get more infos like:
- How much data we have and in what dimension?
- How many querynode we have and what type are they?
- How many collections we have?
- How many segments for each collections and each querynode. From the grafana page all data stack together and make it hard to tell. Thanks
@dzqoo Can I get more infos like:
- How much data we have and in what dimension?
- How many querynode we have and what type are they?
- How many collections we have?
- How many segments for each collections and each querynode. From the grafana page all data stack together and make it hard to tell. Thanks
- We have ~1 billion data in about 10 collections which dimensions are range from 128 to 768;
- We have 15 querynode and 32gb6c for each querynode;
- the segments in each querynode is showed as you see:
@liliu-z
@dzqoo Can I have the segment distribution of the specific collection you queried? Thanks! And also what dim is that collection and how many rows.
@dzqoo Can I have the segment distribution of the specific collection you queried? Thanks! And also what dim is that collection and how many rows.
All tables have timed out retrieval... @liliu-z
could you try the latest version? I fix many performance issues in the last 6 months. We'd like to init a meeting if the newest version still not stable, but we probably don't want to spend time on investigating staled issues.
2.2.11 should be at least 50% faster to 2.2.2 and we solved many balanced issues and stability issues
Thank you for your reply~And when does the version 2.3 come out?I want to go directly to version 2.3.
I will be released next week ~
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.