milvus
milvus copied to clipboard
[Bug]: [benchmark][cluster]High initial query latency in Milvus multi-replica
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20220601-63a31ccb
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2):pymilvus-2.1.0.dev67
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
agro benchmark-backup-psdpt-1 server-configmap server-cluster-8c64m-querynode5 client-configmap client-random-locust-search-filter-100m-ddl-r8-w2-replica5-2h
server:
NAME READY STATUS RESTARTS AGE IP NODE
NOMINATED
NODE READINESS GATES
benchmark-backup-psdpt-1-etcd-0 1/1 Running 0 6m3s 10.97.16.147 qa-node013.zilliz.local <none> <none>
benchmark-backup-psdpt-1-etcd-1 1/1 Running 0 6m2s 10.97.17.139 qa-node014.zilliz.local <none> <none>
benchmark-backup-psdpt-1-etcd-2 1/1 Running 0 6m1s 10.97.16.149 qa-node013.zilliz.local <none> <none>
benchmark-backup-psdpt-1-milvus-datacoord-56bdcf467b-nxk5k 1/1 Running 1 6m3s 10.97.5.249 qa-node003.zilliz.local <none> <none>
benchmark-backup-psdpt-1-milvus-datanode-55b8d7c849-k22sb 1/1 Running 1 6m3s 10.97.16.144 qa-node013.zilliz.local <none> <none>
benchmark-backup-psdpt-1-milvus-indexcoord-68d78bcccf-f2mtm 1/1 Running 1 6m3s 10.97.5.246 qa-node003.zilliz.local <none> <none>
benchmark-backup-psdpt-1-milvus-indexnode-6cd54dbb9d-p55pz 1/1 Running 0 6m3s 10.97.17.125 qa-node014.zilliz.local <none> <none>
benchmark-backup-psdpt-1-milvus-proxy-67979b957f-qtbkl 1/1 Running 1 6m3s 10.97.5.244 qa-node003.zilliz.local <none> <none>
benchmark-backup-psdpt-1-milvus-querycoord-7999dccb44-89rf9 1/1 Running 1 6m3s 10.97.5.245 qa-node003.zilliz.local <none> <none>
benchmark-backup-psdpt-1-milvus-querynode-846468b77b-d7w6l 1/1 Running 0 6m3s 10.97.17.133 qa-node014.zilliz.local <none> <none>
benchmark-backup-psdpt-1-milvus-querynode-846468b77b-g4szm 1/1 Running 0 6m3s 10.97.17.134 qa-node014.zilliz.local <none> <none>
benchmark-backup-psdpt-1-milvus-querynode-846468b77b-hpvxq 1/1 Running 0 6m3s 10.97.11.237 qa-node009.zilliz.local <none> <none>
benchmark-backup-psdpt-1-milvus-querynode-846468b77b-t594w 1/1 Running 0 6m3s 10.97.17.136 qa-node014.zilliz.local <none> <none>
benchmark-backup-psdpt-1-milvus-querynode-846468b77b-z8j9v 1/1 Running 0 6m3s 10.97.17.137 qa-node014.zilliz.local <none> <none>
benchmark-backup-psdpt-1-milvus-rootcoord-84cf758b76-lxqtn 1/1 Running 1 6m3s 10.97.5.250 qa-node003.zilliz.local <none> <none>
benchmark-backup-psdpt-1-minio-0 1/1 Running 0 6m3s 10.97.19.222 qa-node016.zilliz.local <none> <none>
benchmark-backup-psdpt-1-minio-1 1/1 Running 0 6m3s 10.97.19.224 qa-node016.zilliz.local <none> <none>
benchmark-backup-psdpt-1-minio-2 1/1 Running 0 6m3s 10.97.19.238 qa-node016.zilliz.local <none> <none>
benchmark-backup-psdpt-1-minio-3 1/1 Running 0 6m2s 10.97.19.239 qa-node016.zilliz.local <none> <none>
benchmark-backup-psdpt-1-pulsar-bookie-0 1/1 Running 0 6m3s 10.97.5.254 qa-node003.zilliz.local <none> <none>
benchmark-backup-psdpt-1-pulsar-bookie-1 1/1 Running 0 6m3s 10.97.19.237 qa-node016.zilliz.local <none> <none>
benchmark-backup-psdpt-1-pulsar-bookie-2 1/1 Running 0 6m1s 10.97.5.5 qa-node003.zilliz.local <none> <none>
benchmark-backup-psdpt-1-pulsar-bookie-init-5tmmd 0/1 Completed 0 6m3s 10.97.5.243 qa-node003.zilliz.local <none> <none>
benchmark-backup-psdpt-1-pulsar-broker-0 1/1 Running 0 6m3s 10.97.19.219 qa-node016.zilliz.local <none> <none>
benchmark-backup-psdpt-1-pulsar-proxy-0 1/1 Running 0 6m3s 10.97.19.220 qa-node016.zilliz.local <none> <none>
benchmark-backup-psdpt-1-pulsar-pulsar-init-2vnj7 0/1 Completed 0 6m3s 10.97.5.247 qa-node003.zilliz.local <none> <none>
benchmark-backup-psdpt-1-pulsar-recovery-0 1/1 Running 0 6m3s 10.97.5.248 qa-node003.zilliz.local <none> <none>
benchmark-backup-psdpt-1-pulsar-zookeeper-0 1/1 Running 0 6m3s 10.97.5.2 qa-node003.zilliz.local <none> <none>
benchmark-backup-psdpt-1-pulsar-zookeeper-1 1/1 Running 0 5m26s 10.97.9.5 qa-node007.zilliz.local <none> <none>
benchmark-backup-psdpt-1-pulsar-zookeeper-2 1/1 Running 0 4m27s 10.97.3.209 qa-node001.zilliz.local <none> <none>
argo2: server-instance benchmark-backup-lcjsc-1 server-configmap server-cluster-8c64m-querynode2 client-configmap client-random-locust-search-filter-100m-ddl-r8-w2-replica2-2h NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
benchmark-backup-lcjsc-1-etcd-0 1/1 Running 0 6m17s 10.97.17.142 qa-node014.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-etcd-1 1/1 Running 0 6m17s 10.97.16.151 qa-node013.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-etcd-2 1/1 Running 0 6m17s 10.97.17.144 qa-node014.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-milvus-datacoord-55f5dcbf89-8w8gw 1/1 Running 1 6m17s 10.97.4.182 qa-node002.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-milvus-datanode-6bb69cf4b-dqkd9 1/1 Running 1 6m17s 10.97.19.232 qa-node016.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-milvus-indexcoord-54f4898f56-x8vlh 1/1 Running 1 6m17s 10.97.4.184 qa-node002.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-milvus-indexnode-fcc4d8bc9-6bvbj 1/1 Running 0 6m17s 10.97.20.8 qa-node018.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-milvus-proxy-5c6f48644-h6q68 1/1 Running 1 6m17s 10.97.4.185 qa-node002.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-milvus-querycoord-5f45d9456c-8lmw5 1/1 Running 1 6m17s 10.97.4.186 qa-node002.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-milvus-querynode-748c994485-r27lm 1/1 Running 0 6m17s 10.97.16.145 qa-node013.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-milvus-querynode-748c994485-v2n4k 1/1 Running 0 6m17s 10.97.12.77 qa-node015.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-milvus-rootcoord-5c8f8b85-5xg6b 1/1 Running 1 6m17s 10.97.4.183 qa-node002.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-minio-0 1/1 Running 0 6m17s 10.97.19.242 qa-node016.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-minio-1 1/1 Running 0 6m17s 10.97.19.243 qa-node016.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-minio-2 1/1 Running 0 6m17s 10.97.19.246 qa-node016.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-minio-3 1/1 Running 0 6m17s 10.97.19.247 qa-node016.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-pulsar-bookie-0 1/1 Running 0 6m17s 10.97.3.205 qa-node001.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-pulsar-bookie-1 1/1 Running 0 6m17s 10.97.5.8 qa-node003.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-pulsar-bookie-2 1/1 Running 0 6m17s 10.97.18.201 qa-node017.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-pulsar-bookie-init-q7w66 0/1 Completed 0 6m17s 10.97.3.199 qa-node001.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-pulsar-broker-0 1/1 Running 0 6m17s 10.97.9.3 qa-node007.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-pulsar-proxy-0 1/1 Running 0 6m17s 10.97.18.196 qa-node017.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-pulsar-pulsar-init-jjsjw 0/1 Completed 0 6m17s 10.97.9.2 qa-node007.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-pulsar-recovery-0 1/1 Running 0 6m17s 10.97.19.226 qa-node016.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-pulsar-zookeeper-0 1/1 Running 0 6m17s 10.97.3.201 qa-node001.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-pulsar-zookeeper-1 1/1 Running 0 5m24s 10.97.3.207 qa-node001.zilliz.local <none> <none>
benchmark-backup-lcjsc-1-pulsar-zookeeper-2 1/1 Running 0 4m42s 10.97.9.7 qa-node007.zilliz.local <none> <none>

Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
client-random-locust-search-filter-100m-ddl-r8-w2-replica5-2h:
config.yaml: |
locust_random_performance:
collections:
-
collection_name: sift_100m_128_l2
# collection_name: sift_10w_128_l2
other_fields: float1
ni_per: 50000
build_index: true
index_type: ivf_sq8
index_param:
nlist: 2048
load_param:
replica_number: 5
task:
types:
-
type: query
weight: 20
params:
top_k: 10
nq: 10
search_param:
nprobe: 16
filters:
-
range: "{'range': {'float1': {'GT': -1.0, 'LT': collection_size * 0.5}}}"
-
type: load
weight: 1
params:
replica_number: 5
-
type: get
weight: 10
params:
ids_length: 10
-
type: scene_test
weight: 2
connection_num: 1
clients_num: 20
spawn_rate: 2
during_time: 2h
@jingkl you won't get performance boost if collection is large and data can be seperated evenly to 5 nodes? Not sure this is a issue because originally data is already spread evenly to 5 nodes
multi replica only help on small data set case, for example you ingest 1m data to a huge cluster and want to double your performance
you won't get performance boost if collection is large and data can be seperated evenly to 5 nodes?
Why, shouldn't the query time be faster for 5 replicas of the same size dataset @xiaofan-luan
you won't get performance boost if collection is large and data can be seperated evenly to 5 nodes?
Why, shouldn't the query time be faster for 5 replicas of the same size dataset @xiaofan-luan
Because even if you have 1 replica, all 5 querynodes loaded data. See if you have 10 segments, then each querynodes load 2 segments. Once you changed to 5 replica, then each node will load 10 segments. The case multi replica would help is you only have 1 segment but 5 querynodes, only one querynodes can load the data while rest of the querynodes didn't have any data to serve
Because even if you have 1 replica, all 5 querynodes loaded data. See if you have 10 segments, then each querynodes load 2 segments. Once you changed to 5 replica, then each node will load 10 segments. The case multi replica would help is you only have 1 segment but 5 querynodes, only one querynodes can load the data while rest of the querynodes didn't have any data to serve
I understand, the latency of the query is too high at first, what is the reason for this?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.