milvus
milvus copied to clipboard
[Bug]: When I changed milvus from v2.0.1 to v2.1.0,I got a worse performance(rt) in my scene.
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:2.1.0
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus v2.1.0
- OS(Ubuntu or CentOS): centos
- CPU/Memory: 32 cores/756GBi
- GPU:
- Others:
Current Behavior
(1)When I test the performance of milvus search in my scene,I got the following result of rt :
- Client performance nq topk rt 10 10 0.5071010589599609 10 10 0.5250134468078613 10 10 0.5025763511657715 10 10 0.5275528430938721 10 10 0.5605242252349854 10 10 0.3852427005767822 10 10 0.44965672492980957 10 10 0.18056321144104004
- server performance (from promethus grafana)
Search Latency:
Search in Queue Latency:
Search Segment Latency:
CpuUsage:
Segment Loaded Num:
(2)Here are the milvus‘s deployment topology and node's dispose:
(3)Here are the hnsw configuration:
HNSW.(m=32,efconstruction=512)
(4)Here are the search configuration:
ef=512,nq=10,TOPK=10,concurrency=20
(5)Here are the data configuration:
float16 10M embedding,400 dimension
Expected Behavior
When I do this test,I expected the rt can not higher than 100ms in my scene,which is higher then 500ms now.
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response
@dzqoo thank you for the issue. 2 quick quetsions:
- how is the performance when running v2.0.1?
- how did you deploy milvus cluster v2.1.0? Also Could you please refer this script to export the whole Milvus logs for investigation?
/assign @dzqoo /unassign
@yanliang567 thank for applying.
- v2.0.1 is better than v2.1.0,but not that better;
- Yes,it's a clutser deploying by helm. Here is whole logs in milvus. logs.tar.gz
/assign @czs007 could you please help to take a look?
/unassign @dzqoo
- There seems to be a issue a loadbalancer, only 3 of the 5 node is working. @sunby
- each node takes only 6 cpu cores, from attu each machine seems to have more than 20 cores, does that make sense?
@dzqoo pls do not use multiple replicas. Use 1 replica instead and try
@dzqoo I saw you have 5 querynodes, does they share same machines? From the attu, 2 of the nodes has 20 cores, the other three has 36 cores, do you know the reason?
@xiaofan-luan Yes, my 5 querynodes share the three machines which two have 32 cores and another has 20 cores.
1 replica will cause qps decreased. I got that increasing replica will increase the thoughtouput.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.