milvus [Bug]: When I changed milvus from v2.0.1 to v2.1.0，I got a worse performance(rt) in my scene.

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version:2.1.0 
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus v2.1.0
- OS(Ubuntu or CentOS): centos
- CPU/Memory: 32 cores/756GBi 
- GPU: 
- Others:

Current Behavior

(1)When I test the performance of milvus search in my scene,I got the following result of rt :

Client performance nq topk rt 10 10 0.5071010589599609 10 10 0.5250134468078613 10 10 0.5025763511657715 10 10 0.5275528430938721 10 10 0.5605242252349854 10 10 0.3852427005767822 10 10 0.44965672492980957 10 10 0.18056321144104004
server performance (from promethus grafana) Search Latency： Search in Queue Latency： Search Segment Latency： CpuUsage： Segment Loaded Num：

(2)Here are the milvus‘s deployment topology and node's dispose: 部署拓扑 (3)Here are the hnsw configuration： HNSW.(m=32，efconstruction=512) (4)Here are the search configuration： ef=512，nq=10，TOPK=10，concurrency=20 (5)Here are the data configuration： float16 10M embedding，400 dimension

Expected Behavior

When I do this test，I expected the rt can not higher than 100ms in my scene，which is higher then 500ms now.

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

Aug 15 '22 02:08 dzqoo

@dzqoo thank you for the issue. 2 quick quetsions:

how is the performance when running v2.0.1?
how did you deploy milvus cluster v2.1.0? Also Could you please refer this script to export the whole Milvus logs for investigation?

/assign @dzqoo /unassign

Aug 15 '22 04:08 yanliang567

@yanliang567 thank for applying.

v2.0.1 is better than v2.1.0，but not that better;
Yes，it's a clutser deploying by helm. Here is whole logs in milvus. logs.tar.gz

Aug 15 '22 05:08 dzqoo

/assign @czs007 could you please help to take a look?

/unassign @dzqoo

Aug 15 '22 08:08 yanliang567

There seems to be a issue a loadbalancer, only 3 of the 5 node is working. @sunby
each node takes only 6 cpu cores, from attu each machine seems to have more than 20 cores, does that make sense?

Aug 20 '22 08:08 xiaofan-luan

@dzqoo pls do not use multiple replicas. Use 1 replica instead and try

Aug 20 '22 08:08 xiaofan-luan

@dzqoo I saw you have 5 querynodes, does they share same machines? From the attu, 2 of the nodes has 20 cores, the other three has 36 cores, do you know the reason?

Aug 20 '22 08:08 xiaofan-luan

@xiaofan-luan Yes, my 5 querynodes share the three machines which two have 32 cores and another has 20 cores.

Aug 22 '22 01:08 dzqoo

1 replica will cause qps decreased. I got that increasing replica will increase the thoughtouput.

Aug 22 '22 03:08 dzqoo

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

Sep 21 '22 14:09 stale[bot]

milvus milvus copied to clipboard

[Bug]: When I changed milvus from v2.0.1 to v2.1.0，I got a worse performance(rt) in my scene.

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

milvus
milvus copied to clipboard