milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark][cluster]Milvus search failed,rasie an error"Invalid shard leader"

Open jingkl opened this issue 2 years ago • 1 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:master-20220921-24ec3547
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2):2.2.0dev30
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

server-instance fouram-cron-1663776000-5 server-configmap server-cluster-8c16m client-configmap client-acc-glove-ivf-flat

fouram-cron-1663776000-5-minio-0                                  1/1     Running     0               5m54s   10.104.1.121   4am-nod
e10   <none>           <none>
fouram-cron-1663776000-5-minio-1                                  1/1     Running     0               5m53s   10.104.5.207   4am-nod
e12   <none>           <none>
fouram-cron-1663776000-5-minio-2                                  1/1     Running     0               5m53s   10.104.6.214   4am-nod
e13   <none>           <none>
fouram-cron-1663776000-5-minio-3                                  1/1     Running     0               5m53s   10.104.9.150   4am-nod
e14   <none>           <none>
fouram-cron-1663776000-5-pulsar-bookie-0                          1/1     Running     0               5m53s   10.104.5.224   4am-nod
e12   <none>           <none>
fouram-cron-1663776000-5-pulsar-bookie-1                          1/1     Running     0               5m52s   10.104.6.228   4am-nod
e13   <none>           <none>
fouram-cron-1663776000-5-pulsar-bookie-2                          1/1     Running     0               5m52s   10.104.1.136   4am-nod
e10   <none>           <none>
fouram-cron-1663776000-5-pulsar-bookie-init-w65pp                 0/1     Completed   0               5m57s   10.104.1.117   4am-nod
e10   <none>           <none>
fouram-cron-1663776000-5-pulsar-broker-0                          1/1     Running     0               5m54s   10.104.1.122   4am-nod
e10   <none>           <none>
fouram-cron-1663776000-5-pulsar-proxy-0                           1/1     Running     0               5m54s   10.104.9.148   4am-nod
e14   <none>           <none>
fouram-cron-1663776000-5-pulsar-pulsar-init-g7wd8                 0/1     Completed   0               5m57s   10.104.1.116   4am-nod
e10   <none>           <none>
fouram-cron-1663776000-5-pulsar-recovery-0                        1/1     Running     0               5m55s   10.104.9.147   4am-nod
e14   <none>           <none>
fouram-cron-1663776000-5-pulsar-zookeeper-0                       1/1     Running     0               5m54s   10.104.9.163   4am-nod
e14   <none>           <none>
fouram-cron-1663776000-5-pulsar-zookeeper-1                       1/1     Running     0               4m15s   10.104.5.226   4am-nod
e12   <none>           <none>
fouram-cron-1663776000-5-pulsar-zookeeper-2                       1/1     Running     0               3m39s   10.104.1.138   4am-nod
e10   <none>           <none>
[2022-09-21 16:16:06,922] [   ERROR] - Traceback (most recent call last):
  File "main.py", line 95, in run_suite
    result = runner.run_case(case_metric, **case)
  File "/src/milvus_benchmark/runners/accuracy.py", line 292, in run_case
    self.milvus.query(case_param["vector_query"], filter_query=case_param["filter_query"],
  File "/src/milvus_benchmark/client.py", line 53, in wrapper
    result = func(*args, **kwargs)
  File "/src/milvus_benchmark/client.py", line 346, in query
    result = self._milvus.search(tmp_collection_name, **params)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/stub.py", line 844, in search
    return handler.search(collection_name, data, anns_field, param, limit, expression,
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 113, in handler
    raise e
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 139, in handler
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 89, in handler
    raise e
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 51, in handler
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 472, in search
    return self._execute_search_requests(requests, timeout, **_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 436, in _execute_search_requests
    raise pre_err
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 427, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to Search, QueryNode ID=2, reason=query node 0 is not r
eady)>
 (milvus_benchmark.main:98)

  File "/src/milvus_benchmark/client.py", line 346, in query
    result = self._milvus.search(tmp_collection_name, **params)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/stub.py", line 844, in search
    return handler.search(collection_name, data, anns_field, param, limit, expression,
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 113, in handler
    raise e
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 139, in handler
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 89, in handler
    raise e
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 51, in handler
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 472, in search
    return self._execute_search_requests(requests, timeout, **_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 436, in _execute_search_requests
    raise pre_err
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 427, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Invalid shard leader)>
 (milvus_benchmark.main:98)

Expected Behavior

No response

Steps To Reproduce

1.create an collection
2.insert 1m glove data
3.create ivf flat index
4.search  raise an error

Milvus Log

No response

Anything else?

client-acc-glove-ivf-flat:

{
	"config.yaml": "ann_accuracy:
		  collections:
		    -
		      milvus:
		        cache_config.cpu_cache_capacity: 16GB
		        engine_config.use_blas_threshold: 1100
		      server:
		        cpus: 12
		      source_file: /test/milvus/ann_hdf5/glove-200-angular.hdf5
		      collection_name: glove_200_angular
		      index_types: ['ivf_flat']
		      index_params:
		        nlist: [1024]
		      top_ks: [10]
		      nqs: [10000]
		      search_params:
		        nprobe: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512]
		"
}

jingkl avatar Sep 22 '22 02:09 jingkl

/assign @sunby /unassign

yanliang567 avatar Sep 22 '22 11:09 yanliang567

server-instance fouram-cron-1664121600-5 server-configmap server-cluster-8c16m client-configmap client-acc-glove-ivf-flat

master-20220925-91df8f2d 2.2.0dev30

fouram-cron-1664121600-5-etcd-0                                   1/1     Running     0               5m53s   10.104.5.36    4am-node12   <none>           <none>
fouram-cron-1664121600-5-etcd-1                                   1/1     Running     0               5m53s   10.104.4.88    4am-node11   <none>           <none>
fouram-cron-1664121600-5-etcd-2                                   1/1     Running     0               5m52s   10.104.9.39    4am-node14   <none>           <none>
fouram-cron-1664121600-5-milvus-datacoord-998b5c6b5-8l2w9         1/1     Running     1 (112s ago)    5m53s   10.104.5.15    4am-node12   <none>           <none>
fouram-cron-1664121600-5-milvus-datanode-5fdc5b6b7f-2bdtk         1/1     Running     1 (111s ago)    5m53s   10.104.5.20    4am-node12   <none>           <none>
fouram-cron-1664121600-5-milvus-indexcoord-7c6bfff8c5-9bmr7       1/1     Running     1 (111s ago)    5m52s   10.104.9.13    4am-node14   <none>           <none>
fouram-cron-1664121600-5-milvus-indexnode-8447df45f8-sqb5p        1/1     Running     0               5m53s   10.104.1.94    4am-node10   <none>           <none>
fouram-cron-1664121600-5-milvus-proxy-54447d7685-sdqgb            1/1     Running     1 (111s ago)    5m52s   10.104.9.10    4am-node14   <none>           <none>
fouram-cron-1664121600-5-milvus-querycoord-57dfc86544-mftmr       1/1     Running     1 (112s ago)    5m53s   10.104.4.75    4am-node11   <none>           <none>
fouram-cron-1664121600-5-milvus-querynode-57669979d8-vp52l        1/1     Running     0               5m52s   10.104.4.76    4am-node11   <none>           <none>
fouram-cron-1664121600-5-milvus-rootcoord-6955dff79d-4qw2g        1/1     Running     1 (111s ago)    5m53s   10.104.5.19    4am-node12   <none>           <none>
fouram-cron-1664121600-5-minio-0                                  1/1     Running     0               5m53s   10.104.5.17    4am-node12   <none>           <none>
fouram-cron-1664121600-5-minio-1                                  1/1     Running     0               5m53s   10.104.4.74    4am-node11   <none>           <none>
fouram-cron-1664121600-5-minio-2                                  1/1     Running     0               5m53s   10.104.9.14    4am-node14   <none>           <none>
fouram-cron-1664121600-5-minio-3                                  1/1     Running     0               5m53s   10.104.1.95    4am-node10   <none>           <none>
fouram-cron-1664121600-5-pulsar-bookie-0                          1/1     Running     0               5m52s   10.104.9.40    4am-node14   <none>           <none>
fouram-cron-1664121600-5-pulsar-bookie-1                          1/1     Running     0               5m52s   10.104.5.39    4am-node12   <none>           <none>
fouram-cron-1664121600-5-pulsar-bookie-2                          1/1     Running     0               5m51s   10.104.1.120   4am-node10   <none>           <none>
fouram-cron-1664121600-5-pulsar-bookie-init-zjvsx                 0/1     Completed   0               5m54s   10.104.5.14    4am-node12   <none>           <none>
fouram-cron-1664121600-5-pulsar-broker-0                          1/1     Running     0               5m53s   10.104.1.93    4am-node10   <none>           <none>
fouram-cron-1664121600-5-pulsar-proxy-0                           1/1     Running     0               5m53s   10.104.5.16    4am-node12   <none>           <none>
fouram-cron-1664121600-5-pulsar-pulsar-init-8mft6                 0/1     Completed   0               5m54s   10.104.5.13    4am-node12   <none>           <none>
fouram-cron-1664121600-5-pulsar-recovery-0                        1/1     Running     0               5m53s   10.104.5.18    4am-node12   <none>           <none>
fouram-cron-1664121600-5-pulsar-zookeeper-0                       1/1     Running     0               5m52s   10.104.9.38    4am-node14   <none>           <none>
fouram-cron-1664121600-5-pulsar-zookeeper-1                       1/1     Running     0               4m19s   10.104.4.97    4am-node11   <none>           <none>
fouram-cron-1664121600-5-pulsar-zookeeper-2                       1/1     Running     0               3m39s   10.104.1.127   4am-node10   <none>           <none>
[2022-09-25 16:17:55,577] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=Invalid shard leader)>, <Time:{'RPC start': '2022-09-25 16:17:55.353614', 'RPC error': '2022-09-25 16:17:55.577146'}> (pymilvus.decorators:112)
[2022-09-25 16:17:55,577] [   ERROR] - Traceback (most recent call last):
  File "main.py", line 95, in run_suite
    result = runner.run_case(case_metric, **case)
  File "/src/milvus_benchmark/runners/accuracy.py", line 292, in run_case
    self.milvus.query(case_param["vector_query"], filter_query=case_param["filter_query"],
  File "/src/milvus_benchmark/client.py", line 53, in wrapper
    result = func(*args, **kwargs)
  File "/src/milvus_benchmark/client.py", line 346, in query
    result = self._milvus.search(tmp_collection_name, **params)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/stub.py", line 844, in search
    return handler.search(collection_name, data, anns_field, param, limit, expression,
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 113, in handler
    raise e
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 139, in handler
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 89, in handler
    raise e
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 51, in handler
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 472, in search
    return self._execute_search_requests(requests, timeout, **_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 436, in _execute_search_requests
    raise pre_err
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 427, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Invalid shard leader)>

jingkl avatar Sep 26 '22 03:09 jingkl

master-20220929-64662919 2.2.0.dev32

server-instance fouram-flvlb-1 server-configmap server-cluster-8c16m client-configmap client-acc-glove-ivf-flat

fouram-flvlb-1-etcd-0                                             1/1     Running     0               5m9s    10.104.1.75    4am-node10   <none>           <none>
fouram-flvlb-1-etcd-1                                             1/1     Running     0               5m9s    10.104.5.141   4am-node12   <none>           <none>
fouram-flvlb-1-etcd-2                                             1/1     Running     0               5m9s    10.104.9.202   4am-node14   <none>           <none>
fouram-flvlb-1-milvus-datacoord-988c9fd87-kmmt8                   1/1     Running     1 (98s ago)     5m9s    10.104.6.79    4am-node13   <none>           <none>
fouram-flvlb-1-milvus-datanode-584df6479f-tlpw4                   1/1     Running     0               5m9s    10.104.6.81    4am-node13   <none>           <none>
fouram-flvlb-1-milvus-indexcoord-775d748cf9-gmmq6                 1/1     Running     0               5m9s    10.104.4.131   4am-node11   <none>           <none>
fouram-flvlb-1-milvus-indexnode-77994d956-xvhjp                   1/1     Running     0               5m9s    10.104.4.129   4am-node11   <none>           <none>
fouram-flvlb-1-milvus-proxy-6dffc68587-sl6sw                      1/1     Running     1 (98s ago)     5m9s    10.104.5.138   4am-node12   <none>           <none>
fouram-flvlb-1-milvus-querycoord-87669b49b-nmlwr                  1/1     Running     0               5m9s    10.104.4.130   4am-node11   <none>           <none>
fouram-flvlb-1-milvus-querynode-5f599886-zwn9t                    1/1     Running     0               5m9s    10.104.4.132   4am-node11   <none>           <none>
fouram-flvlb-1-milvus-rootcoord-757499fbfb-867vb                  1/1     Running     0               5m9s    10.104.1.70    4am-node10   <none>           <none>
fouram-flvlb-1-minio-0                                            1/1     Running     0               5m9s    10.104.1.77    4am-node10   <none>           <none>
fouram-flvlb-1-minio-1                                            1/1     Running     0               5m9s    10.104.5.143   4am-node12   <none>           <none>
fouram-flvlb-1-minio-2                                            1/1     Running     0               5m9s    10.104.4.134   4am-node11   <none>           <none>
fouram-flvlb-1-minio-3                                            1/1     Running     0               5m8s    10.104.9.204   4am-node14   <none>           <none>
fouram-flvlb-1-pulsar-bookie-0                                    1/1     Running     0               5m9s    10.104.1.80    4am-node10   <none>           <none>
fouram-flvlb-1-pulsar-bookie-1                                    1/1     Running     0               5m8s    10.104.5.146   4am-node12   <none>           <none>
fouram-flvlb-1-pulsar-bookie-2                                    1/1     Running     0               5m8s    10.104.4.137   4am-node11   <none>           <none>
fouram-flvlb-1-pulsar-bookie-init-gzgvz                           0/1     Completed   0               5m9s    10.104.6.82    4am-node13   <none>           <none>
fouram-flvlb-1-pulsar-broker-0                                    1/1     Running     0               5m9s    10.104.5.139   4am-node12   <none>           <none>
fouram-flvlb-1-pulsar-proxy-0                                     1/1     Running     0               5m9s    10.104.5.137   4am-node12   <none>           <none>
fouram-flvlb-1-pulsar-pulsar-init-jvvlv                           0/1     Completed   0               5m9s    10.104.1.71    4am-node10   <none>           <none>
fouram-flvlb-1-pulsar-recovery-0                                  1/1     Running     0               5m9s    10.104.6.80    4am-node13   <none>           <none>
fouram-flvlb-1-pulsar-zookeeper-0                                 1/1     Running     0               5m9s    10.104.1.76    4am-node10   <none>           <none>
fouram-flvlb-1-pulsar-zookeeper-1                                 1/1     Running     0               4m28s   10.104.5.148   4am-node12   <none>           <none>
fouram-flvlb-1-pulsar-zookeeper-2                                 1/1     Running     0               3m53s   10.104.9.206   4am-node14   <none>           <none>
[2022-09-29 11:41:12,391] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=Invalid shard leader)>, <Time:{'RPC start': '2022-09-29 11:41:12.188213', 'RPC error': '2022-09-29 11:41:12.391508'}> (pymilvus.decorators:112)
[2022-09-29 11:41:12,391] [   ERROR] - Traceback (most recent call last):
  File "main.py", line 95, in run_suite
    result = runner.run_case(case_metric, **case)
  File "/src/milvus_benchmark/runners/accuracy.py", line 292, in run_case
    self.milvus.query(case_param["vector_query"], filter_query=case_param["filter_query"],
  File "/src/milvus_benchmark/client.py", line 53, in wrapper
    result = func(*args, **kwargs)
  File "/src/milvus_benchmark/client.py", line 346, in query
    result = self._milvus.search(tmp_collection_name, **params)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/stub.py", line 844, in search
    return handler.search(collection_name, data, anns_field, param, limit, expression,
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 113, in handler
    raise e
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 139, in handler
    ret = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 89, in handler
    raise e
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 51, in handler
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 472, in search
    return self._execute_search_requests(requests, timeout, **_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 436, in _execute_search_requests
    raise pre_err
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 427, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Invalid shard leader)>

jingkl avatar Sep 29 '22 13:09 jingkl

This problem has still not been fixed @sunby

jingkl avatar Sep 29 '22 13:09 jingkl

@sunby it reproduced on master-20221006-e1124765 https://argo-workflows.zilliz.cc/workflows/qa/fouramf-cron-1665158400?tab=workflow&nodeId=fouramf-cron-1665158400-1787986870&nodePanelView=summary

[2022-10-07 17:27:21,155 -  INFO - fouram]: [Base] Start load collection fouram_MZDwezDW, replica_number:1 (base.py:95)
[2022-10-07 17:30:25,071 -  INFO - fouram]: [Time] Collection.load run in 183.916s (api_request.py:29)
[2022-10-07 17:30:30,576 -  INFO - fouram]: [PerfTemplate] Actual parameters used: {'collection_params': {'other_fields': ['int64_1', 'int64_2', 'float_1', 'double_1', 'varchar_1']}, 'load_params': {
}, 'search_params': {'nq': 1, 'param': {'metric_type': 'L2', 'params': {'nprobe': 8}}, 'top_k': 1, 'expr': 'float_1 > -1.0 && float_1 < 5000000.0'}, 'dataset_params': {'dataset_name': 'sift', 'dim':
128, 'dataset_size': 50000000, 'ni_per': 50000, 'metric_type': 'L2', 'req_run_counts': 10}, 'index_params': {'index_type': 'IVF_FLAT', 'index_param': {'nlist': 2048}}} (performance_template.py:57)
[2022-10-07 17:30:30,576 -  INFO - fouram]: [Base] Params of search: nq:1, anns_field:float_vector, param:{'metric_type': 'L2', 'params': {'nprobe': 8}}, limit:1, expr:"float_1 > -1.0 && float_1 < 50
00000.0" (base.py:261)
[2022-10-07 17:30:33,015 - ERROR - fouram]: Traceback (most recent call last):
  File "/src/fouram/client/util/api_request.py", line 21, in inner_wrapper
    res = func(*args, **kwargs)
  File "/src/fouram/client/util/api_request.py", line 57, in api_request
    return func(*arg, **kwargs)
  ...
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 436, in _execute_search_requests
    raise pre_err
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 427, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=Invalid shard leader)>
 (api_request.py:35)
[2022-10-07 17:30:33,031 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=Invalid shard leader)> (api_request.py:36)
[2022-10-07 17:30:33,031 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=Invalid shard leader)> (func_check.py:40)

yanliang567 avatar Oct 09 '22 07:10 yanliang567

The issue has not been reproduced, a new error has appeared, first close the issue

jingkl avatar Dec 19 '22 03:12 jingkl