milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: When I changed milvus from v2.0.1 to v2.1.0,I got a worse performance(rt) in my scene.

Open dzqoo opened this issue 2 years ago • 9 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.1.0 
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus v2.1.0
- OS(Ubuntu or CentOS): centos
- CPU/Memory: 32 cores/756GBi 
- GPU: 
- Others:

Current Behavior

(1)When I test the performance of milvus search in my scene,I got the following result of rt :

  • Client performance nq topk rt 10 10 0.5071010589599609 10 10 0.5250134468078613 10 10 0.5025763511657715 10 10 0.5275528430938721 10 10 0.5605242252349854 10 10 0.3852427005767822 10 10 0.44965672492980957 10 10 0.18056321144104004
  • server performance (from promethus grafana) Search Latency:image Search in Queue Latency:image Search Segment Latency:image CpuUsage: image Segment Loaded Num:image

(2)Here are the milvus‘s deployment topology and node's dispose: image 部署拓扑 (3)Here are the hnsw configuration: HNSW.(m=32,efconstruction=512) (4)Here are the search configuration: ef=512,nq=10,TOPK=10,concurrency=20 (5)Here are the data configuration: float16 10M embedding,400 dimension

Expected Behavior

When I do this test,I expected the rt can not higher than 100ms in my scene,which is higher then 500ms now.

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

dzqoo avatar Aug 15 '22 02:08 dzqoo

@dzqoo thank you for the issue. 2 quick quetsions:

  1. how is the performance when running v2.0.1?
  2. how did you deploy milvus cluster v2.1.0? Also Could you please refer this script to export the whole Milvus logs for investigation?

/assign @dzqoo /unassign

yanliang567 avatar Aug 15 '22 04:08 yanliang567

@yanliang567 thank for applying.

  1. v2.0.1 is better than v2.1.0,but not that better;
  2. Yes,it's a clutser deploying by helm. Here is whole logs in milvus. logs.tar.gz

dzqoo avatar Aug 15 '22 05:08 dzqoo

/assign @czs007 could you please help to take a look?

/unassign @dzqoo

yanliang567 avatar Aug 15 '22 08:08 yanliang567

  1. There seems to be a issue a loadbalancer, only 3 of the 5 node is working. @sunby
  2. each node takes only 6 cpu cores, from attu each machine seems to have more than 20 cores, does that make sense?

xiaofan-luan avatar Aug 20 '22 08:08 xiaofan-luan

@dzqoo pls do not use multiple replicas. Use 1 replica instead and try

xiaofan-luan avatar Aug 20 '22 08:08 xiaofan-luan

@dzqoo I saw you have 5 querynodes, does they share same machines? From the attu, 2 of the nodes has 20 cores, the other three has 36 cores, do you know the reason?

xiaofan-luan avatar Aug 20 '22 08:08 xiaofan-luan

@xiaofan-luan Yes, my 5 querynodes share the three machines which two have 32 cores and another has 20 cores.

dzqoo avatar Aug 22 '22 01:08 dzqoo

1 replica will cause qps decreased. I got that increasing replica will increase the thoughtouput.

dzqoo avatar Aug 22 '22 03:08 dzqoo

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Sep 21 '22 14:09 stale[bot]