milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark][cluster] 5 replicas and 10 clients will report an error"StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded"

Open jingkl opened this issue 2 years ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version: master-20220606-f38637c2
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2):2.1.0dev67
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo benchmark-tag-no-clean-9jqjt-1 server-configmap server-cluster-8c64m-querynode5 client-configmap client-random-locust-search-filter-100m-ddl-r8-w2-replica5-concurrent

server:

NAME                                                              READY   STATUS      RESTARTS   AGE     IP             NODE                      NOMINATED NODE   READINESS GATES
benchmark-tag-no-clean-9jqjt-1-etcd-0                             1/1     Running     0          4m8s    10.97.16.132   qa-node013.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-etcd-1                             1/1     Running     0          4m8s    10.97.17.30    qa-node014.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-etcd-2                             1/1     Running     0          4m7s    10.97.16.134   qa-node013.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-milvus-datacoord-bc7cfc47b-z65s4   1/1     Running     0          4m8s    10.97.9.135    qa-node007.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-milvus-datanode-9cc5c5b99-9f7gx    1/1     Running     0          4m8s    10.97.20.157   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-milvus-indexcoord-67464d99dxg6pz   1/1     Running     0          4m8s    10.97.9.136    qa-node007.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-milvus-indexnode-6f6659f766pspx2   1/1     Running     0          4m8s    10.97.20.158   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-milvus-proxy-65b76f9fd7-whmnf      1/1     Running     0          4m8s    10.97.4.23     qa-node002.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-milvus-querycoord-6f56ccf8fzq4zl   1/1     Running     0          4m8s    10.97.9.134    qa-node007.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-milvus-querynode-796cff79969m7gx   1/1     Running     0          4m8s    10.97.20.159   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-milvus-querynode-796cff7996dns9v   1/1     Running     0          4m8s    10.97.10.8     qa-node008.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-milvus-querynode-796cff7996dzwj7   1/1     Running     0          4m8s    10.97.20.160   qa-node018.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-milvus-querynode-796cff7996fslrj   1/1     Running     0          4m8s    10.97.17.28    qa-node014.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-milvus-querynode-796cff7996qczf8   1/1     Running     0          4m8s    10.97.19.67    qa-node016.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-milvus-rootcoord-59d4ff8776p5mvb   1/1     Running     0          4m8s    10.97.9.132    qa-node007.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-minio-0                            1/1     Running     0          4m8s    10.97.19.69    qa-node016.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-minio-1                            1/1     Running     0          4m8s    10.97.12.148   qa-node015.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-minio-2                            1/1     Running     0          4m8s    10.97.12.155   qa-node015.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-minio-3                            1/1     Running     0          4m8s    10.97.12.154   qa-node015.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-pulsar-bookie-0                    1/1     Running     0          4m8s    10.97.12.147   qa-node015.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-pulsar-bookie-1                    1/1     Running     0          4m8s    10.97.5.3      qa-node003.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-pulsar-bookie-2                    1/1     Running     0          4m7s    10.97.11.219   qa-node009.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-pulsar-bookie-init-rdjj9           0/1     Completed   0          4m8s    10.97.11.216   qa-node009.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-pulsar-broker-0                    1/1     Running     0          4m8s    10.97.11.215   qa-node009.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-pulsar-proxy-0                     1/1     Running     0          4m8s    10.97.12.141   qa-node015.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-pulsar-pulsar-init-nk75c           0/1     Completed   0          4m8s    10.97.18.216   qa-node017.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-pulsar-recovery-0                  1/1     Running     0          4m8s    10.97.3.194    qa-node001.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-pulsar-zookeeper-0                 1/1     Running     0          4m8s    10.97.18.218   qa-node017.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-pulsar-zookeeper-1                 1/1     Running     0          3m34s   10.97.9.138    qa-node007.zilliz.local   <none>           <none>
benchmark-tag-no-clean-9jqjt-1-pulsar-zookeeper-2                 1/1     Running     0          3m8s    10.97.18.220   qa-node017.zilliz.local   <none>           <none>

client error:

[2022-06-08 07:58:07,457] [   ERROR] - grpc RpcError: [has_collection], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-08 07:56:30.241219', 'gRPC error': '2022-06-08 07:58:07.457363'}> (pymilvus.decorators:86)
[2022-06-08 08:00:08,051] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-08 07:58:07.438642', 'gRPC error': '2022-06-08 08:00:08.051164'}> (pymilvus.decorators:86)
[2022-06-08 08:00:08,073] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-08 07:54:38.181764', 'gRPC error': '2022-06-08 08:00:08.073445'}> (pymilvus.decorators:86)
[2022-06-08 08:00:08,075] [   ERROR] - grpc RpcError: [_execute_search_requests], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-08 07:58:07.439192', 'gRPC error': '2022-06-08 08:00:08.075428'}> (pymilvus.decorators:86)
[2022-06-08 08:00:08,075] [   ERROR] - grpc RpcError: [search], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-08 07:54:38.182176', 'gRPC error': '2022-06-08 08:00:08.075941'}> (pymilvus.decorators:86)
[2022-06-08 08:00:08,076] [   DEBUG] - Milvus get run in 277.6588s (milvus_benchmark.client:54)
[2022-06-08 08:00:08,077] [   DEBUG] - Milvus get run in 277.6594s (milvus_benchmark.client:54)
[2022-06-08 08:00:08,078] [   ERROR] - grpc RpcError: [has_collection], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-08 07:58:07.448281', 'gRPC error': '2022-06-08 08:00:08.078872'}> (pymilvus.decorators:86)
[2022-06-08 08:00:08,083] [   ERROR] - grpc RpcError: [has_collection], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-08 07:58:07.448535', 'gRPC error': '2022-06-08 08:00:08.083583'}> (pymilvus.decorators:86)
[2022-06-08 08:00:08,084] [   ERROR] - grpc RpcError: [has_collection], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-08 07:58:07.448744', 'gRPC error': '2022-06-08 08:00:08.084808'}> (pymilvus.decorators:86)
[2022-06-08 08:00:08,085] [   ERROR] - grpc RpcError: [has_collection], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-08 07:58:07.448959', 'gRPC error': '2022-06-08 08:00:08.085389'}> (pymilvus.decorators:86)
[2022-06-08 08:02:21,183] [   DEBUG] - Milvus get run in 133.1052s (milvus_benchmark.client:54)
[2022-06-08 08:02:21,207] [   DEBUG] - Milvus get run in 253.7588s (milvus_benchmark.client:54)
[2022-06-08 08:04:48,881] [   DEBUG] - Milvus get run in 280.8035s (milvus_benchmark.client:54)
[2022-06-08 08:07:21,189] [   ERROR] - grpc RpcError: [describe_collection], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-08 08:02:21.182734', 'gRPC error': '2022-06-08 08:07:21.188793'}> (pymilvus.decorators:86)
[2022-06-08 08:07:21,191] [   ERROR] - grpc RpcError: [query], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-08 08:00:08.078757', 'gRPC error': '2022-06-08 08:07:21.191564'}> (pymilvus.decorators:86)
[2022-06-08 08:07:21,194] [   ERROR] - grpc RpcError: [has_collection], <_MultiThreadedRendezvous: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2022-06-08 08:04:48.881363', 'gRPC error': '2022-06-08 08:07:21.193978'}> (pymilvus.decorators:86)
截屏2022-06-09 14 12 31

Expected Behavior

No response

Steps To Reproduce

1、create collection
2、build index of ivf_sq8
3、insert 1b vectors
4、flush collection
5、build index with the same params
6、load collection
7、locust concurrency: query<-search、load、get scene_test
8. raise 10 clients
9. search raise error

Milvus Log

No response

Anything else?

{
	"config.yaml": "locust_random_performance:
		  collections:
		    -
		      collection_name: sift_100m_128_l2
		      other_fields: float1
		      ni_per: 50000
		      build_index: true
		      index_type: ivf_sq8
		      index_param:
		        nlist: 2048
		      load_param:
		        replica_number: 5
		      task:
		        types:
		          -
		            type: query
		            weight: 20
		            params:
		              top_k: 10
		              nq: 10
		              search_param:
		                nprobe: 16
		              filters:
		                -
		                  range: \"{'range': {'float1': {'GT': -1.0, 'LT': collection_size * 0.5}}}\"
		          -
		            type: load
		            weight: 1
		            params:
		              replica_number: 5
		          -
		            type: get
		            weight: 10
		            params:
		              ids_length: 10
		          -
		            type: scene_test
		            weight: 2
		        connection_num: 1
		        clients_num: 20
		        spawn_rate: 2
		        # during_time: 100
		        during_time: 2h
		"
}

jingkl avatar Jun 09 '22 06:06 jingkl

@czs007 could you please take a look at this issue? it seems it returns timeout issue when searching concurrently.

yanliang567 avatar Jun 10 '22 01:06 yanliang567

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jul 10 '22 17:07 stale[bot]

keep it open

yanliang567 avatar Jul 11 '22 01:07 yanliang567

reopen it for tracking

yanliang567 avatar Jul 19 '22 00:07 yanliang567

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Sep 10 '22 08:09 stale[bot]