milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark][cluster]Milvus load , search timed out and reported an error"rpc deadline exceeded:Retry timeout"

Open jingkl opened this issue 2 years ago • 2 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.1.0-20220708-11d21a3b
- Deployment mode(standalone or cluster):cluster 
- SDK version(e.g. pymilvus v2.0.0rc2):pymilvus 2.1.0dev87
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

server-instance benchmark-tag-no-clean-xkxrt-1 server-configmap server-cluster-8c64m-querynode2 client-configmap client-random-locust-search-filter-100m-ddl-r8-w2-replica2

2022-07-11 01:36:09,935] [   DEBUG] - Milvus get run in 9.379s (milvus_benchmark.client:54)
[2022-07-11 01:36:09,936] [   DEBUG] - Milvus get run in 9.3797s (milvus_benchmark.client:54)
[2022-07-11 01:36:09,936] [   DEBUG] - Milvus get run in 9.3805s (milvus_benchmark.client:54)
[2022-07-11 01:36:09,937] [   DEBUG] - Milvus query run in 9.3818s (milvus_benchmark.client:54)
[2022-07-11 01:36:09,938] [   DEBUG] - Milvus query run in 9.3826s (milvus_benchmark.client:54)
[2022-07-11 01:36:10,143] [   DEBUG] - params of load collection: {'timeout': 30, 'replica_number': 2} (milvus_benchmark.client:494)
[2022-07-11 01:36:10,860] [   DEBUG] - Milvus flush run in 1.2489s (milvus_benchmark.client:54)
[2022-07-11 01:36:10,861] [   DEBUG] - Milvus get_info run in 1.2493s (milvus_benchmark.client:54)
[2022-07-11 01:36:10,861] [   DEBUG] - [scene_test] Start insert : scene_test_7482_352821 (milvus_benchmark.client:543)
[2022-07-11 01:36:11,071] [   DEBUG] - Milvus get_info run in 1.4597s (milvus_benchmark.client:54)
[2022-07-11 01:36:11,073] [   DEBUG] - [scene_test] Start insert : scene_test_6506_300073 (milvus_benchmark.client:543)
[2022-07-11 01:36:11,077] [   DEBUG] - Milvus get_info run in 1.465s (milvus_benchmark.client:54)
[2022-07-11 01:39:17,942] [   DEBUG] - [scene_test] Start insert : scene_test_6386_825821 (milvus_benchmark.client:543)
[2022-07-11 01:39:18,191] [   DEBUG] - Milvus get_info run in 188.5794s (milvus_benchmark.client:54)
[2022-07-11 01:39:18,232] [   DEBUG] - [scene_test] Start insert : scene_test_9685_298548 (milvus_benchmark.client:543)
[2022-07-11 01:39:18,233] [   DEBUG] - Milvus query run in 188.6217s (milvus_benchmark.client:54)
[2022-07-11 01:39:18,373] [   ERROR] - RPC error: [describe_collection], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2022-07-11 01:36:10.143693', 'RPC error': '2022-07-11 01:39:18.373718'}> (pymilvus.decorators:94)
[2022-07-11 01:39:18,374] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2022-07-11 01:36:10.143632', 'RPC error': '2022-07-11 01:39:18.374575'}> (pymilvus.decorators:94)
[2022-07-11 01:39:18,375] [   ERROR] - RPC error: [load_collection], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 30s)>, <Time:{'RPC start': '2022-07-11 01:36:10.757697', 'RPC error': '2022-07-11 01:39:18.375385'}> (pymilvus.decorators:94)
[2022-07-11 01:39:18,376] [   ERROR] - RPC error: [describe_collection], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2022-07-11 01:36:10.758118', 'RPC error': '2022-07-11 01:39:18.376225'}> (pymilvus.decorators:94)
[2022-07-11 01:39:18,789] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2022-07-11 01:36:10.758065', 'RPC error': '2022-07-11 01:39:18.789384'}> (pymilvus.decorators:94)
[2022-07-11 01:39:18,863] [   DEBUG] - Row count: 3000 in collection: <scene_test_8114_17319> (milvus_benchmark.client:428)
[2022-07-11 01:39:18,864] [   DEBUG] - [scene_test] Start create index : scene_test_8114_17319 (milvus_benchmark.client:550)
[2022-07-11 01:39:18,865] [    INFO] - Building index start, collection_name: scene_test_8114_17319, index_type: IVF_SQ8, metric_type: L2 (milvus_benchmark.client:279)
[2022-07-11 01:39:18,865] [    INFO] - {'nlist': 2048} (milvus_benchmark.client:281)
[2022-07-11 01:39:18,866] [   DEBUG] - collection: scene_test_8114_17319 Index params: {'index_type': 'IVF_SQ8', 'metric_type': 'L2', 'params': {'nlist': 2048}} (milvus_benchmark.client:287)

benchmark-tag-no-clean-xkxrt-1-etcd-0                             1/1     Running     0          2d12h   10.97.17.42    qa-node014.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-etcd-1                             1/1     Running     0          2d12h   10.97.16.114   qa-node013.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-etcd-2                             1/1     Running     0          2d12h   10.97.16.115   qa-node013.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-milvus-datacoord-b99d5768f-jc827   1/1     Running     0          2d12h   10.97.18.178   qa-node017.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-milvus-datanode-77bbb45fbd-d4lhr   1/1     Running     0          2d12h   10.97.17.40    qa-node014.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-milvus-indexcoord-6bfbcdbd6lbpwl   1/1     Running     0          2d12h   10.97.14.124   qa-node011.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-milvus-indexnode-7bcc8975c92j8d8   1/1     Running     0          2d12h   10.97.16.111   qa-node013.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-milvus-proxy-dcd7fc968-txmsf       1/1     Running     0          2d12h   10.97.13.62    qa-node010.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-milvus-querycoord-74bc88db4t8zcc   1/1     Running     0          2d12h   10.97.14.125   qa-node011.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-milvus-querynode-565fcbcc85gnsxs   1/1     Running     0          2d12h   10.97.13.63    qa-node010.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-milvus-querynode-565fcbcc85j9r4p   1/1     Running     0          2d12h   10.97.20.144   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-milvus-rootcoord-7b965fc85c987qt   1/1     Running     1          2d12h   10.97.14.123   qa-node011.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-minio-0                            1/1     Running     0          2d12h   10.97.19.159   qa-node016.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-minio-1                            1/1     Running     0          2d12h   10.97.19.160   qa-node016.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-minio-2                            1/1     Running     0          2d12h   10.97.12.109   qa-node015.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-minio-3                            1/1     Running     0          2d12h   10.97.12.115   qa-node015.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-pulsar-bookie-0                    1/1     Running     0          2d12h   10.97.19.162   qa-node016.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-pulsar-bookie-1                    1/1     Running     0          2d12h   10.97.12.116   qa-node015.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-pulsar-bookie-2                    1/1     Running     0          2d12h   10.97.12.117   qa-node015.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-pulsar-bookie-init-ttgsd           0/1     Completed   0          2d12h   10.97.10.211   qa-node008.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-pulsar-broker-0                    1/1     Running     0          2d12h   10.97.10.212   qa-node008.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-pulsar-proxy-0                     1/1     Running     0          2d12h   10.97.19.155   qa-node016.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-pulsar-pulsar-init-b4zt2           0/1     Completed   0          2d12h   10.97.5.65     qa-node003.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-pulsar-recovery-0                  1/1     Running     0          2d12h   10.97.19.154   qa-node016.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-pulsar-zookeeper-0                 1/1     Running     0          2d12h   10.97.5.67     qa-node003.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-pulsar-zookeeper-1                 1/1     Running     0          2d12h   10.97.5.69     qa-node003.zilliz.local   <none>           <none>
benchmark-tag-no-clean-xkxrt-1-pulsar-zookeeper-2                 1/1     Running     0          2d12h   10.97.10.214   qa-node008.zilliz.local   <none>           <none>

Expected Behavior

No response

Steps To Reproduce

1、create collection
2、create index of ivf_sq8
3、insert 100m million vectors
4、flush collection
5、build index with the same params
6、load collection
7、locust concurrent: query<-search, load, get<-query, scene_test 《- raise error

Milvus Log

No response

Anything else?

client-random-locust-search-filter-100m-ddl-r8-w2-replica2:

{
	"config.yaml": "locust_random_performance:
		  collections:
		    -
		      collection_name: sift_100m_128_l2
		      # collection_name: sift_10w_128_l2
		      other_fields: float1
		      ni_per: 50000
		      build_index: true
		      index_type: ivf_sq8
		      index_param:
		        nlist: 2048
		      load_param:
		        replica_number: 2
		      task:
		        types:
		          -
		            type: query
		            weight: 20
		            params:
		              top_k: 10
		              nq: 10
		              search_param:
		                nprobe: 16
		              filters:
		                -
		                  range: \"{'range': {'float1': {'GT': -1.0, 'LT': collection_size * 0.5}}}\"
		          -
		            type: load
		            weight: 1
		            params:
		              replica_number: 2
		          -
		            type: get
		            weight: 10
		            params:
		              ids_length: 10
		          -
		            type: scene_test
		            weight: 2
		        connection_num: 1
		        clients_num: 20
		        spawn_rate: 2
		        # during_time: 100
		        during_time: 84h
		"
}

jingkl avatar Jul 11 '22 02:07 jingkl

/assign @jiaoew1991 /unassign

yanliang567 avatar Jul 11 '22 03:07 yanliang567

server-instance benchmark-tag-no-clean-9ndwc-1 server-configmap server-single-32c128m client-configmap client-random-locust-100m-ddl-r8-w2

benchmark-tag-no-clean-9ndwc-1-etcd-0                             1/1     Running     0          2d14h   10.97.16.117   qa-node013.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9ndwc-1-milvus-standalone-795f84b4bsx6q9   1/1     Running     0          2d14h   10.97.20.146   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9ndwc-1-minio-f75cd79fd-b72g8              1/1     Running     0          2d14h   10.97.19.164   qa-node016.zilliz.local   <none>           <none>

error:

[2022-07-11 01:14:03,334] [   DEBUG] - [scene_test] Start scene test : scene_test_9258_431323 (milvus_benchmark.client:536)
[2022-07-11 01:14:03,390] [   DEBUG] - Row count: 3000 in collection: <scene_test_8137_690539> (milvus_benchmark.client:428)
[2022-07-11 01:14:56,177] [   DEBUG] - [scene_test] Start create index : scene_test_8137_690539 (milvus_benchmark.client:550)
[2022-07-11 01:15:03,825] [    INFO] - Building index start, collection_name: scene_test_8137_690539, index_type: IVF_SQ8, metric_type: L2 (milvus_benchmark.client:279)
[2022-07-11 01:15:03,826] [    INFO] - {'nlist': 2048} (milvus_benchmark.client:281)
[2022-07-11 01:15:04,033] [   DEBUG] - collection: scene_test_8137_690539 Index params: {'index_type': 'IVF_SQ8', 'metric_type': 'L2', 'params': {'nlist': 2048}} (milvus_benchmark.client:287)
[2022-07-11 01:15:05,463] [   DEBUG] - Row count: 3000 in collection: <scene_test_6966_667923> (milvus_benchmark.client:428)
[2022-07-11 01:15:05,464] [   DEBUG] - [scene_test] Start create index : scene_test_6966_667923 (milvus_benchmark.client:550)
[2022-07-11 01:15:05,465] [    INFO] - Building index start, collection_name: scene_test_6966_667923, index_type: IVF_SQ8, metric_type: L2 (milvus_benchmark.client:279)
[2022-07-11 01:15:05,465] [    INFO] - {'nlist': 2048} (milvus_benchmark.client:281)
[2022-07-11 01:15:05,466] [   DEBUG] - collection: scene_test_6966_667923 Index params: {'index_type': 'IVF_SQ8', 'metric_type': 'L2', 'params': {'nlist': 2048}} (milvus_benchmark.client:287)
[2022-07-11 01:15:05,467] [   DEBUG] - Milvus get run in 62.9965s (milvus_benchmark.client:54)
[2022-07-11 01:15:05,523] [   ERROR] - RPC error: [describe_collection], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2022-07-11 01:14:03.334084', 'RPC error': '2022-07-11 01:15:05.523187'}> (pymilvus.decorators:94)
[2022-07-11 01:15:05,524] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2022-07-11 01:14:03.334046', 'RPC error': '2022-07-11 01:15:05.524140'}> (pymilvus.decorators:94)
[2022-07-11 01:15:05,525] [   ERROR] - RPC error: [describe_collection], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2022-07-11 01:14:03.334173', 'RPC error': '2022-07-11 01:15:05.525835'}> (pymilvus.decorators:94)
[2022-07-11 01:15:05,526] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=rpc deadline exceeded: Retry timeout: 60s)>, <Time:{'RPC start': '2022-07-11 01:14:03.334150', 'RPC error': '2022-07-11 01:15:05.526516'}> (pymilvus.decorators:94)
[2022-07-11 01:15:05,527] [    INFO] - Create collection: <scene_test_9258_431323> successfully (milvus_benchmark.client:152)
[2022-07-11 01:15:05,528] [   DEBUG] - Milvus create_collection run in 62.1933s (milvus_benchmark.client:54)
[2022-07-11 01:15:05,529] [   DEBUG] - Milvus get run in 62.1956s (milvus_benchmark.client:54)
[2022-07-11 01:15:05,530] [   DEBUG] - [scene_test] Start scene test : scene_test_1798_516649 (milvus_benchmark.client:536)
[2022-07-11 01:15:05,605] [   DEBUG] - Milvus get run in 0.1371s (milvus_benchmark.client:54)
[2022-07-11 01:15:05,607] [   DEBUG] - Milvus get run in 62.2728s (milvus_benchmark.client:54)

jingkl avatar Jul 11 '22 03:07 jingkl

server-instance fouram-tag-no-clean-g4m6q-1 server-configmap server-single-32c128m-redur0 client-configmap client-random-locust-100m-ddl-r8-w2-60h

2.1.0-20220819-671d629f pymilvus:2.1.1.dev3

Verified, the problem has not yet appeared, first close issue

jingkl avatar Aug 22 '22 11:08 jingkl