milvus
milvus copied to clipboard
[Bug]: [benchmark][cluster]Milvus search failed,rasie an error"Invalid shard leader"
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20220921-24ec3547
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2):2.2.0dev30
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
server-instance fouram-cron-1663776000-5 server-configmap server-cluster-8c16m client-configmap client-acc-glove-ivf-flat
fouram-cron-1663776000-5-minio-0 1/1 Running 0 5m54s 10.104.1.121 4am-nod
e10 <none> <none>
fouram-cron-1663776000-5-minio-1 1/1 Running 0 5m53s 10.104.5.207 4am-nod
e12 <none> <none>
fouram-cron-1663776000-5-minio-2 1/1 Running 0 5m53s 10.104.6.214 4am-nod
e13 <none> <none>
fouram-cron-1663776000-5-minio-3 1/1 Running 0 5m53s 10.104.9.150 4am-nod
e14 <none> <none>
fouram-cron-1663776000-5-pulsar-bookie-0 1/1 Running 0 5m53s 10.104.5.224 4am-nod
e12 <none> <none>
fouram-cron-1663776000-5-pulsar-bookie-1 1/1 Running 0 5m52s 10.104.6.228 4am-nod
e13 <none> <none>
fouram-cron-1663776000-5-pulsar-bookie-2 1/1 Running 0 5m52s 10.104.1.136 4am-nod
e10 <none> <none>
fouram-cron-1663776000-5-pulsar-bookie-init-w65pp 0/1 Completed 0 5m57s 10.104.1.117 4am-nod
e10 <none> <none>
fouram-cron-1663776000-5-pulsar-broker-0 1/1 Running 0 5m54s 10.104.1.122 4am-nod
e10 <none> <none>
fouram-cron-1663776000-5-pulsar-proxy-0 1/1 Running 0 5m54s 10.104.9.148 4am-nod
e14 <none> <none>
fouram-cron-1663776000-5-pulsar-pulsar-init-g7wd8 0/1 Completed 0 5m57s 10.104.1.116 4am-nod
e10 <none> <none>
fouram-cron-1663776000-5-pulsar-recovery-0 1/1 Running 0 5m55s 10.104.9.147 4am-nod
e14 <none> <none>
fouram-cron-1663776000-5-pulsar-zookeeper-0 1/1 Running 0 5m54s 10.104.9.163 4am-nod
e14 <none> <none>
fouram-cron-1663776000-5-pulsar-zookeeper-1 1/1 Running 0 4m15s 10.104.5.226 4am-nod
e12 <none> <none>
fouram-cron-1663776000-5-pulsar-zookeeper-2 1/1 Running 0 3m39s 10.104.1.138 4am-nod
e10 <none> <none>
[2022-09-21 16:16:06,922] [ ERROR] - Traceback (most recent call last):
File "main.py", line 95, in run_suite
result = runner.run_case(case_metric, **case)
File "/src/milvus_benchmark/runners/accuracy.py", line 292, in run_case
self.milvus.query(case_param["vector_query"], filter_query=case_param["filter_query"],
File "/src/milvus_benchmark/client.py", line 53, in wrapper
result = func(*args, **kwargs)
File "/src/milvus_benchmark/client.py", line 346, in query
result = self._milvus.search(tmp_collection_name, **params)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/stub.py", line 844, in search
return handler.search(collection_name, data, anns_field, param, limit, expression,
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 113, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 139, in handler
ret = func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 89, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 51, in handler
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 472, in search
return self._execute_search_requests(requests, timeout, **_kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 436, in _execute_search_requests
raise pre_err
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 427, in _execute_search_requests
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to Search, QueryNode ID=2, reason=query node 0 is not r
eady)>
(milvus_benchmark.main:98)
File "/src/milvus_benchmark/client.py", line 346, in query
result = self._milvus.search(tmp_collection_name, **params)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/stub.py", line 844, in search
return handler.search(collection_name, data, anns_field, param, limit, expression,
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 113, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 139, in handler
ret = func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 89, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 51, in handler
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 472, in search
return self._execute_search_requests(requests, timeout, **_kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 436, in _execute_search_requests
raise pre_err
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 427, in _execute_search_requests
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Invalid shard leader)>
(milvus_benchmark.main:98)
Expected Behavior
No response
Steps To Reproduce
1.create an collection
2.insert 1m glove data
3.create ivf flat index
4.search raise an error
Milvus Log
No response
Anything else?
client-acc-glove-ivf-flat:
{
"config.yaml": "ann_accuracy:
collections:
-
milvus:
cache_config.cpu_cache_capacity: 16GB
engine_config.use_blas_threshold: 1100
server:
cpus: 12
source_file: /test/milvus/ann_hdf5/glove-200-angular.hdf5
collection_name: glove_200_angular
index_types: ['ivf_flat']
index_params:
nlist: [1024]
top_ks: [10]
nqs: [10000]
search_params:
nprobe: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
"
}
/assign @sunby /unassign
server-instance fouram-cron-1664121600-5 server-configmap server-cluster-8c16m client-configmap client-acc-glove-ivf-flat
master-20220925-91df8f2d 2.2.0dev30
fouram-cron-1664121600-5-etcd-0 1/1 Running 0 5m53s 10.104.5.36 4am-node12 <none> <none>
fouram-cron-1664121600-5-etcd-1 1/1 Running 0 5m53s 10.104.4.88 4am-node11 <none> <none>
fouram-cron-1664121600-5-etcd-2 1/1 Running 0 5m52s 10.104.9.39 4am-node14 <none> <none>
fouram-cron-1664121600-5-milvus-datacoord-998b5c6b5-8l2w9 1/1 Running 1 (112s ago) 5m53s 10.104.5.15 4am-node12 <none> <none>
fouram-cron-1664121600-5-milvus-datanode-5fdc5b6b7f-2bdtk 1/1 Running 1 (111s ago) 5m53s 10.104.5.20 4am-node12 <none> <none>
fouram-cron-1664121600-5-milvus-indexcoord-7c6bfff8c5-9bmr7 1/1 Running 1 (111s ago) 5m52s 10.104.9.13 4am-node14 <none> <none>
fouram-cron-1664121600-5-milvus-indexnode-8447df45f8-sqb5p 1/1 Running 0 5m53s 10.104.1.94 4am-node10 <none> <none>
fouram-cron-1664121600-5-milvus-proxy-54447d7685-sdqgb 1/1 Running 1 (111s ago) 5m52s 10.104.9.10 4am-node14 <none> <none>
fouram-cron-1664121600-5-milvus-querycoord-57dfc86544-mftmr 1/1 Running 1 (112s ago) 5m53s 10.104.4.75 4am-node11 <none> <none>
fouram-cron-1664121600-5-milvus-querynode-57669979d8-vp52l 1/1 Running 0 5m52s 10.104.4.76 4am-node11 <none> <none>
fouram-cron-1664121600-5-milvus-rootcoord-6955dff79d-4qw2g 1/1 Running 1 (111s ago) 5m53s 10.104.5.19 4am-node12 <none> <none>
fouram-cron-1664121600-5-minio-0 1/1 Running 0 5m53s 10.104.5.17 4am-node12 <none> <none>
fouram-cron-1664121600-5-minio-1 1/1 Running 0 5m53s 10.104.4.74 4am-node11 <none> <none>
fouram-cron-1664121600-5-minio-2 1/1 Running 0 5m53s 10.104.9.14 4am-node14 <none> <none>
fouram-cron-1664121600-5-minio-3 1/1 Running 0 5m53s 10.104.1.95 4am-node10 <none> <none>
fouram-cron-1664121600-5-pulsar-bookie-0 1/1 Running 0 5m52s 10.104.9.40 4am-node14 <none> <none>
fouram-cron-1664121600-5-pulsar-bookie-1 1/1 Running 0 5m52s 10.104.5.39 4am-node12 <none> <none>
fouram-cron-1664121600-5-pulsar-bookie-2 1/1 Running 0 5m51s 10.104.1.120 4am-node10 <none> <none>
fouram-cron-1664121600-5-pulsar-bookie-init-zjvsx 0/1 Completed 0 5m54s 10.104.5.14 4am-node12 <none> <none>
fouram-cron-1664121600-5-pulsar-broker-0 1/1 Running 0 5m53s 10.104.1.93 4am-node10 <none> <none>
fouram-cron-1664121600-5-pulsar-proxy-0 1/1 Running 0 5m53s 10.104.5.16 4am-node12 <none> <none>
fouram-cron-1664121600-5-pulsar-pulsar-init-8mft6 0/1 Completed 0 5m54s 10.104.5.13 4am-node12 <none> <none>
fouram-cron-1664121600-5-pulsar-recovery-0 1/1 Running 0 5m53s 10.104.5.18 4am-node12 <none> <none>
fouram-cron-1664121600-5-pulsar-zookeeper-0 1/1 Running 0 5m52s 10.104.9.38 4am-node14 <none> <none>
fouram-cron-1664121600-5-pulsar-zookeeper-1 1/1 Running 0 4m19s 10.104.4.97 4am-node11 <none> <none>
fouram-cron-1664121600-5-pulsar-zookeeper-2 1/1 Running 0 3m39s 10.104.1.127 4am-node10 <none> <none>
[2022-09-25 16:17:55,577] [ ERROR] - RPC error: [search], <MilvusException: (code=1, message=Invalid shard leader)>, <Time:{'RPC start': '2022-09-25 16:17:55.353614', 'RPC error': '2022-09-25 16:17:55.577146'}> (pymilvus.decorators:112)
[2022-09-25 16:17:55,577] [ ERROR] - Traceback (most recent call last):
File "main.py", line 95, in run_suite
result = runner.run_case(case_metric, **case)
File "/src/milvus_benchmark/runners/accuracy.py", line 292, in run_case
self.milvus.query(case_param["vector_query"], filter_query=case_param["filter_query"],
File "/src/milvus_benchmark/client.py", line 53, in wrapper
result = func(*args, **kwargs)
File "/src/milvus_benchmark/client.py", line 346, in query
result = self._milvus.search(tmp_collection_name, **params)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/stub.py", line 844, in search
return handler.search(collection_name, data, anns_field, param, limit, expression,
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 113, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 139, in handler
ret = func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 89, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 51, in handler
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 472, in search
return self._execute_search_requests(requests, timeout, **_kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 436, in _execute_search_requests
raise pre_err
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 427, in _execute_search_requests
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Invalid shard leader)>
master-20220929-64662919 2.2.0.dev32
server-instance fouram-flvlb-1 server-configmap server-cluster-8c16m client-configmap client-acc-glove-ivf-flat
fouram-flvlb-1-etcd-0 1/1 Running 0 5m9s 10.104.1.75 4am-node10 <none> <none>
fouram-flvlb-1-etcd-1 1/1 Running 0 5m9s 10.104.5.141 4am-node12 <none> <none>
fouram-flvlb-1-etcd-2 1/1 Running 0 5m9s 10.104.9.202 4am-node14 <none> <none>
fouram-flvlb-1-milvus-datacoord-988c9fd87-kmmt8 1/1 Running 1 (98s ago) 5m9s 10.104.6.79 4am-node13 <none> <none>
fouram-flvlb-1-milvus-datanode-584df6479f-tlpw4 1/1 Running 0 5m9s 10.104.6.81 4am-node13 <none> <none>
fouram-flvlb-1-milvus-indexcoord-775d748cf9-gmmq6 1/1 Running 0 5m9s 10.104.4.131 4am-node11 <none> <none>
fouram-flvlb-1-milvus-indexnode-77994d956-xvhjp 1/1 Running 0 5m9s 10.104.4.129 4am-node11 <none> <none>
fouram-flvlb-1-milvus-proxy-6dffc68587-sl6sw 1/1 Running 1 (98s ago) 5m9s 10.104.5.138 4am-node12 <none> <none>
fouram-flvlb-1-milvus-querycoord-87669b49b-nmlwr 1/1 Running 0 5m9s 10.104.4.130 4am-node11 <none> <none>
fouram-flvlb-1-milvus-querynode-5f599886-zwn9t 1/1 Running 0 5m9s 10.104.4.132 4am-node11 <none> <none>
fouram-flvlb-1-milvus-rootcoord-757499fbfb-867vb 1/1 Running 0 5m9s 10.104.1.70 4am-node10 <none> <none>
fouram-flvlb-1-minio-0 1/1 Running 0 5m9s 10.104.1.77 4am-node10 <none> <none>
fouram-flvlb-1-minio-1 1/1 Running 0 5m9s 10.104.5.143 4am-node12 <none> <none>
fouram-flvlb-1-minio-2 1/1 Running 0 5m9s 10.104.4.134 4am-node11 <none> <none>
fouram-flvlb-1-minio-3 1/1 Running 0 5m8s 10.104.9.204 4am-node14 <none> <none>
fouram-flvlb-1-pulsar-bookie-0 1/1 Running 0 5m9s 10.104.1.80 4am-node10 <none> <none>
fouram-flvlb-1-pulsar-bookie-1 1/1 Running 0 5m8s 10.104.5.146 4am-node12 <none> <none>
fouram-flvlb-1-pulsar-bookie-2 1/1 Running 0 5m8s 10.104.4.137 4am-node11 <none> <none>
fouram-flvlb-1-pulsar-bookie-init-gzgvz 0/1 Completed 0 5m9s 10.104.6.82 4am-node13 <none> <none>
fouram-flvlb-1-pulsar-broker-0 1/1 Running 0 5m9s 10.104.5.139 4am-node12 <none> <none>
fouram-flvlb-1-pulsar-proxy-0 1/1 Running 0 5m9s 10.104.5.137 4am-node12 <none> <none>
fouram-flvlb-1-pulsar-pulsar-init-jvvlv 0/1 Completed 0 5m9s 10.104.1.71 4am-node10 <none> <none>
fouram-flvlb-1-pulsar-recovery-0 1/1 Running 0 5m9s 10.104.6.80 4am-node13 <none> <none>
fouram-flvlb-1-pulsar-zookeeper-0 1/1 Running 0 5m9s 10.104.1.76 4am-node10 <none> <none>
fouram-flvlb-1-pulsar-zookeeper-1 1/1 Running 0 4m28s 10.104.5.148 4am-node12 <none> <none>
fouram-flvlb-1-pulsar-zookeeper-2 1/1 Running 0 3m53s 10.104.9.206 4am-node14 <none> <none>
[2022-09-29 11:41:12,391] [ ERROR] - RPC error: [search], <MilvusException: (code=1, message=Invalid shard leader)>, <Time:{'RPC start': '2022-09-29 11:41:12.188213', 'RPC error': '2022-09-29 11:41:12.391508'}> (pymilvus.decorators:112)
[2022-09-29 11:41:12,391] [ ERROR] - Traceback (most recent call last):
File "main.py", line 95, in run_suite
result = runner.run_case(case_metric, **case)
File "/src/milvus_benchmark/runners/accuracy.py", line 292, in run_case
self.milvus.query(case_param["vector_query"], filter_query=case_param["filter_query"],
File "/src/milvus_benchmark/client.py", line 53, in wrapper
result = func(*args, **kwargs)
File "/src/milvus_benchmark/client.py", line 346, in query
result = self._milvus.search(tmp_collection_name, **params)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/stub.py", line 844, in search
return handler.search(collection_name, data, anns_field, param, limit, expression,
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 113, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 139, in handler
ret = func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 89, in handler
raise e
File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 51, in handler
return func(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 472, in search
return self._execute_search_requests(requests, timeout, **_kwargs)
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 436, in _execute_search_requests
raise pre_err
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 427, in _execute_search_requests
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Invalid shard leader)>
This problem has still not been fixed @sunby
@sunby it reproduced on master-20221006-e1124765 https://argo-workflows.zilliz.cc/workflows/qa/fouramf-cron-1665158400?tab=workflow&nodeId=fouramf-cron-1665158400-1787986870&nodePanelView=summary
[2022-10-07 17:27:21,155 - INFO - fouram]: [Base] Start load collection fouram_MZDwezDW, replica_number:1 (base.py:95)
[2022-10-07 17:30:25,071 - INFO - fouram]: [Time] Collection.load run in 183.916s (api_request.py:29)
[2022-10-07 17:30:30,576 - INFO - fouram]: [PerfTemplate] Actual parameters used: {'collection_params': {'other_fields': ['int64_1', 'int64_2', 'float_1', 'double_1', 'varchar_1']}, 'load_params': {
}, 'search_params': {'nq': 1, 'param': {'metric_type': 'L2', 'params': {'nprobe': 8}}, 'top_k': 1, 'expr': 'float_1 > -1.0 && float_1 < 5000000.0'}, 'dataset_params': {'dataset_name': 'sift', 'dim':
128, 'dataset_size': 50000000, 'ni_per': 50000, 'metric_type': 'L2', 'req_run_counts': 10}, 'index_params': {'index_type': 'IVF_FLAT', 'index_param': {'nlist': 2048}}} (performance_template.py:57)
[2022-10-07 17:30:30,576 - INFO - fouram]: [Base] Params of search: nq:1, anns_field:float_vector, param:{'metric_type': 'L2', 'params': {'nprobe': 8}}, limit:1, expr:"float_1 > -1.0 && float_1 < 50
00000.0" (base.py:261)
[2022-10-07 17:30:33,015 - ERROR - fouram]: Traceback (most recent call last):
File "/src/fouram/client/util/api_request.py", line 21, in inner_wrapper
res = func(*args, **kwargs)
File "/src/fouram/client/util/api_request.py", line 57, in api_request
return func(*arg, **kwargs)
...
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 436, in _execute_search_requests
raise pre_err
File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 427, in _execute_search_requests
raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Invalid shard leader)>
(api_request.py:35)
[2022-10-07 17:30:33,031 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=Invalid shard leader)> (api_request.py:36)
[2022-10-07 17:30:33,031 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=Invalid shard leader)> (func_check.py:40)
The issue has not been reproduced, a new error has appeared, first close the issue