milvus
milvus copied to clipboard
[Bug]: [perf][cluster] Milvus insert 1m data and build hnsw index, then concurrent search error“fail to search on all shard leaders”
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20230401-3b9716bb
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):2.3.0.dev45
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
release_name_prefix: perf-cluster-master-1680393600 deploy_config: fouramf-server-cluster-8c16m case_params: fouramf-client-sift1m-concurrent-hnsw other_params: --milvus_tag_prefix=master -s --deploy_mode=cluster case_name: test_concurrent_locust_custom_parameters
perf-cluster-ma93600-1-27-5970-etcd-0 1/1 Running 0 4h14m 10.104.9.60 4am-node14 <none> <none>
perf-cluster-ma93600-1-27-5970-etcd-1 1/1 Running 0 4h14m 10.104.4.111 4am-node11 <none> <none>
perf-cluster-ma93600-1-27-5970-etcd-2 1/1 Running 0 4h14m 10.104.5.101 4am-node12 <none> <none>
perf-cluster-ma93600-1-27-5970-milvus-datacoord-6659b5dcd67kglm 1/1 Running 2 (4h6m ago) 4h14m 10.104.14.172 4am-node18 <none> <none>
perf-cluster-ma93600-1-27-5970-milvus-datanode-5748cf4ddb-7xp8w 1/1 Running 2 (4h6m ago) 4h14m 10.104.14.173 4am-node18 <none> <none>
perf-cluster-ma93600-1-27-5970-milvus-indexcoord-7d44b5b4c5zgbc 1/1 Running 0 4h14m 10.104.14.171 4am-node18 <none> <none>
perf-cluster-ma93600-1-27-5970-milvus-indexnode-56d4965bf55tlg5 1/1 Running 1 (4h10m ago) 4h14m 10.104.12.233 4am-node17 <none> <none>
perf-cluster-ma93600-1-27-5970-milvus-proxy-849db6f44b-4b4k2 1/1 Running 3 (4h3m ago) 4h14m 10.104.12.234 4am-node17 <none> <none>
perf-cluster-ma93600-1-27-5970-milvus-querycoord-745b47f7b4szmj 1/1 Running 3 (4h3m ago) 4h14m 10.104.12.235 4am-node17 <none> <none>
perf-cluster-ma93600-1-27-5970-milvus-querynode-68bc4f58656fgp6 1/1 Running 2 (3h58m ago) 4h14m 10.104.13.210 4am-node16 <none> <none>
perf-cluster-ma93600-1-27-5970-milvus-rootcoord-96bb67675-wwtr5 1/1 Running 2 (4h6m ago) 4h14m 10.104.12.232 4am-node17 <none> <none>
perf-cluster-ma93600-1-27-5970-minio-0 1/1 Running 0 4h14m 10.104.4.110 4am-node11 <none> <none>
perf-cluster-ma93600-1-27-5970-minio-1 1/1 Running 0 4h14m 10.104.6.241 4am-node13 <none> <none>
perf-cluster-ma93600-1-27-5970-minio-2 1/1 Running 0 4h14m 10.104.5.102 4am-node12 <none> <none>
perf-cluster-ma93600-1-27-5970-minio-3 1/1 Running 0 4h14m 10.104.1.49 4am-node10 <none> <none>
perf-cluster-ma93600-1-27-5970-pulsar-bookie-0 1/1 Running 0 4h14m 10.104.5.110 4am-node12 <none> <none>
perf-cluster-ma93600-1-27-5970-pulsar-bookie-1 1/1 Running 0 4h14m 10.104.6.2 4am-node13 <none> <none>
perf-cluster-ma93600-1-27-5970-pulsar-bookie-2 1/1 Running 0 4h14m 10.104.1.70 4am-node10 <none> <none>
perf-cluster-ma93600-1-27-5970-pulsar-bookie-init-9kz4j 0/1 Completed 0 4h14m 10.104.1.27 4am-node10 <none> <none>
perf-cluster-ma93600-1-27-5970-pulsar-broker-0 1/1 Running 0 4h14m 10.104.6.5 4am-node13 <none> <none>
perf-cluster-ma93600-1-27-5970-pulsar-proxy-0 1/1 Running 0 4h14m 10.104.5.108 4am-node12 <none> <none>
perf-cluster-ma93600-1-27-5970-pulsar-pulsar-init-m5wh2 0/1 Completed 0 4h14m 10.104.5.76 4am-node12 <none> <none>
perf-cluster-ma93600-1-27-5970-pulsar-recovery-0 1/1 Running 0 4h14m 10.104.6.214 4am-node13 <none> <none>
perf-cluster-ma93600-1-27-5970-pulsar-zookeeper-0 1/1 Running 0 4h14m 10.104.4.107 4am-node11 <none> <none>
perf-cluster-ma93600-1-27-5970-pulsar-zookeeper-1 1/1 Running 0 4h11m 10.104.9.73 4am-node14 <none> <none>
perf-cluster-ma93600-1-27-5970-pulsar-zookeeper-2 1/1 Running 0 4h6m 10.104.1.74 4am-node10 <none> <none>
querynode:

client log:
2023-04-02 00:17:07,100 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=5, reason=target node id not match target id = 5, node id = 11)>, <Time:{'RPC start': '2023-04-02 00:17:07.097074', 'RPC error': '2023-04-02 00:17:07.099947'}> (decorators.py:108)
[2023-04-02 00:17:07,100 - ERROR - fouram]: Traceback (most recent call last):
File "/src/fouram/client/util/api_request.py", line 33, in inner_wrapper
res = func(*args, **kwargs)
File "/src/fouram/client/util/api_request.py", line 70, in api_request
return func(*arg, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 660, in search
res = conn.search(self._name, data, anns_field, param, limit, expr,
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
ret = func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 518, in search
return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 487, in _execute_search_requests
raise pre_err
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 478, in _execute_search_requests
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=5, reason=target node id not match target id = 5, node id = 11)>
(api_request.py:48)
[2023-04-02 00:17:07,100 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=5, reason=target node id not match target id = 5, node id = 11)> (api_request.py:49)
[2023-04-02 00:17:07,100 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=5, reason=target node id not match target id = 5, node id = 11)> (func_check.py:43)
[2023-04-02 00:17:07,103 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=5, reason=target node id not match target id = 5, node id = 11)>, <Time:{'RPC start': '2023-04-02 00:17:07.101270', 'RPC error': '2023-04-02 00:17:07.103818'}> (decorators.py:108)
[2023-04-02 00:17:07,104 - ERROR - fouram]: Traceback (most recent call last):
File "/src/fouram/client/util/api_request.py", line 33, in inner_wrapper
res = func(*args, **kwargs)
File "/src/fouram/client/util/api_request.py", line 70, in api_request
return func(*arg, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 660, in search
res = conn.search(self._name, data, anns_field, param, limit, expr,
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
ret = func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 518, in search
return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 487, in _execute_search_requests
raise pre_err
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 478, in _execute_search_requests
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=5, reason=target node id not match target id = 5, node id = 11)>
(api_request.py:48)
Expected Behavior
No response
Steps To Reproduce
1. create a collection
2. build hsnw index on vector column
3. insert 1m of vectors
4. flush collection
5. build index on vector column with the same parameters
6. count the total number of rows
7. load collection
8. perform concurrent operations
9. clean all collections or not
Milvus Log
No response
Anything else?
'client': {'test_case_type': 'ConcurrentClientBase',
'test_case_name': 'test_concurrent_locust_custom_parameters',
'test_case_params': {'dataset_params': {'dim': 128,
'dataset_name': 'sift',
'dataset_size': 1000000,
'ni_per': 50000,
'metric_type': 'L2'},
'collection_params': {'other_fields': []},
'load_params': {},
'search_params': {},
'index_params': {'index_type': 'HNSW',
'index_param': {'M': 8,
'efConstruction': 200}},
'concurrent_params': {'concurrent_number': 1,
'during_time': 3600,
'interval': 20,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'search',
'weight': 1,
'params': {'nq': 1,
'top_k': 1,
'search_param': {'ef': 16},
'random_data': True}}]},
/assign
/unassign @yanliang567

release_name_prefix: perf-cluster-master-1680566400 deploy_config fouramf-server-cluster-8c16m case_params fouramf-client-sift1m-concurrent-hnsw other_params --milvus_tag_prefix=master -s --deploy_mode=cluster
case_name test_concurrent_locust_custom_parameters
server:
perf-cluster-ma66400-1-71-9223-etcd-0 1/1 Running 0 4h16m 10.104.4.148 4am-node11 <none> <none>
perf-cluster-ma66400-1-71-9223-etcd-1 1/1 Running 0 4h16m 10.104.1.44 4am-node10 <none> <none>
perf-cluster-ma66400-1-71-9223-etcd-2 1/1 Running 0 4h16m 10.104.5.223 4am-node12 <none> <none>
perf-cluster-ma66400-1-71-9223-milvus-datacoord-85998c68f6jnbjl 1/1 Running 3 (4h4m ago) 4h16m 10.104.14.121 4am-node18 <none> <none>
perf-cluster-ma66400-1-71-9223-milvus-datanode-5cb67d48c8-xffwv 1/1 Running 3 (4h6m ago) 4h16m 10.104.13.229 4am-node16 <none> <none>
perf-cluster-ma66400-1-71-9223-milvus-indexcoord-f496cb49dfwh2n 1/1 Running 0 4h16m 10.104.14.120 4am-node18 <none> <none>
perf-cluster-ma66400-1-71-9223-milvus-indexnode-6858478959pqf5j 1/1 Running 0 4h16m 10.104.14.122 4am-node18 <none> <none>
perf-cluster-ma66400-1-71-9223-milvus-proxy-6fbd6645f4-xcndx 1/1 Running 3 (4h5m ago) 4h16m 10.104.12.201 4am-node17 <none> <none>
perf-cluster-ma66400-1-71-9223-milvus-querycoord-8677fc54dqt2rs 1/1 Running 3 (4h5m ago) 4h16m 10.104.14.119 4am-node18 <none> <none>
perf-cluster-ma66400-1-71-9223-milvus-querynode-876d8b94f-487v7 1/1 Running 2 (3h55m ago) 4h16m 10.104.12.202 4am-node17 <none> <none>
perf-cluster-ma66400-1-71-9223-milvus-rootcoord-bd46dd987-m6x4g 1/1 Running 3 (4h5m ago) 4h16m 10.104.14.118 4am-node18 <none> <none>
perf-cluster-ma66400-1-71-9223-minio-0 1/1 Running 0 4h16m 10.104.4.151 4am-node11 <none> <none>
perf-cluster-ma66400-1-71-9223-minio-1 1/1 Running 0 4h16m 10.104.9.213 4am-node14 <none> <none>
perf-cluster-ma66400-1-71-9223-minio-2 1/1 Running 0 4h16m 10.104.5.225 4am-node12 <none> <none>
perf-cluster-ma66400-1-71-9223-minio-3 1/1 Running 0 4h16m 10.104.1.46 4am-node10 <none> <none>
perf-cluster-ma66400-1-71-9223-pulsar-bookie-0 1/1 Running 0 4h16m 10.104.5.3 4am-node12 <none> <none>
perf-cluster-ma66400-1-71-9223-pulsar-bookie-1 1/1 Running 0 4h16m 10.104.6.49 4am-node13 <none> <none>
perf-cluster-ma66400-1-71-9223-pulsar-bookie-2 1/1 Running 0 4h16m 10.104.4.178 4am-node11 <none> <none>
perf-cluster-ma66400-1-71-9223-pulsar-bookie-init-5rp69 0/1 Completed 0 4h16m 10.104.1.23 4am-node10 <none> <none>
perf-cluster-ma66400-1-71-9223-pulsar-broker-0 1/1 Running 0 4h16m 10.104.1.64 4am-node10 <none> <none>
perf-cluster-ma66400-1-71-9223-pulsar-proxy-0 1/1 Running 0 4h16m 10.104.5.236 4am-node12 <none> <none>
perf-cluster-ma66400-1-71-9223-pulsar-pulsar-init-4pqbw 0/1 Completed 0 4h16m 10.104.9.191 4am-node14 <none> <none>
perf-cluster-ma66400-1-71-9223-pulsar-recovery-0 1/1 Running 0 4h16m 10.104.4.174 4am-node11 <none> <none>
perf-cluster-ma66400-1-71-9223-pulsar-zookeeper-0 1/1 Running 0 4h16m 10.104.4.149 4am-node11 <none> <none>
perf-cluster-ma66400-1-71-9223-pulsar-zookeeper-1 1/1 Running 0 4h14m 10.104.5.242 4am-node12 <none> <none>
perf-cluster-ma66400-1-71-9223-pulsar-zookeeper-2 1/1 Running 0 4h6m 10.104.1.92 4am-node10 <none> <none>
client log:
[2023-04-04 00:19:10,991 - ERROR - fouram]: RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=2, reason=target node id not match target id = 2, node id = 14)>, <Time:{'RPC start': '2023-04-04 00:19:04.369938', 'RPC error': '2023-04-04 00:19:10.990938'}> (decorators.py:108)
[2023-04-04 00:19:10,992 - ERROR - fouram]: Traceback (most recent call last):
File "/src/fouram/client/util/api_request.py", line 33, in inner_wrapper
res = func(*args, **kwargs)
File "/src/fouram/client/util/api_request.py", line 70, in api_request
return func(*arg, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 660, in search
res = conn.search(self._name, data, anns_field, param, limit, expr,
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
ret = func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 518, in search
return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 487, in _execute_search_requests
raise pre_err
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 478, in _execute_search_requests
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=2, reason=target node id not match target id = 2, node id = 14)>
(api_request.py:48)
querynode panic lead to the restart of it. related with #23338 pr has merged. plz help check it @jingkl
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.