milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark][cluster]Milvus search with compaction,raise an error“force to deny /milvus.proto.milvus.MilvusService/Delete“

Open jingkl opened this issue 2 years ago • 8 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.2.2-20221219-ae5259ca
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):2.2.1dev4
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

server-instance fouram-tag-no-clean-jns96-1 server-configmap server-cluster-8c64m-compaction client-configmap client-random-locust-compaction

fouram-tag-no-clean-jns96-1-etcd-0                                1/1     Running     0               35m     10.104.4.64    4am-node11   <none>           <none>
fouram-tag-no-clean-jns96-1-etcd-1                                1/1     Running     0               35m     10.104.5.132   4am-node12   <none>           <none>
fouram-tag-no-clean-jns96-1-etcd-2                                1/1     Running     0               35m     10.104.6.209   4am-node13   <none>           <none>
fouram-tag-no-clean-jns96-1-milvus-datacoord-8bf66f7fd-87zwk      1/1     Running     0               35m     10.104.5.126   4am-node12   <none>           <none>
fouram-tag-no-clean-jns96-1-milvus-datanode-78dc69c4bf-gs78l      1/1     Running     3 (9m54s ago)   35m     10.104.5.125   4am-node12   <none>           <none>
fouram-tag-no-clean-jns96-1-milvus-indexcoord-5fcb9d984f-nxcnr    1/1     Running     0               35m     10.104.1.247   4am-node10   <none>           <none>
fouram-tag-no-clean-jns96-1-milvus-indexnode-5cb68c9d76-jfwv4     1/1     Running     0               35m     10.104.1.246   4am-node10   <none>           <none>
fouram-tag-no-clean-jns96-1-milvus-proxy-78778c8dc7-4ccwx         1/1     Running     0               35m     10.104.5.124   4am-node12   <none>           <none>
fouram-tag-no-clean-jns96-1-milvus-querycoord-7f4d8dcff8-tpbv6    1/1     Running     0               35m     10.104.6.203   4am-node13   <none>           <none>
fouram-tag-no-clean-jns96-1-milvus-querynode-54f8f77c98-nbql4     1/1     Running     0               35m     10.104.6.197   4am-node13   <none>           <none>
fouram-tag-no-clean-jns96-1-milvus-rootcoord-69948dfcc6-fkm44     1/1     Running     0               35m     10.104.5.122   4am-node12   <none>           <none>
fouram-tag-no-clean-jns96-1-minio-0                               1/1     Running     0               35m     10.104.1.249   4am-node10   <none>           <none>
fouram-tag-no-clean-jns96-1-minio-1                               1/1     Running     0               35m     10.104.4.62    4am-node11   <none>           <none>
fouram-tag-no-clean-jns96-1-minio-2                               1/1     Running     0               35m     10.104.5.130   4am-node12   <none>           <none>
fouram-tag-no-clean-jns96-1-minio-3                               1/1     Running     0               35m     10.104.6.210   4am-node13   <none>           <none>
fouram-tag-no-clean-jns96-1-pulsar-bookie-0                       1/1     Running     0               35m     10.104.4.65    4am-node11   <none>           <none>
fouram-tag-no-clean-jns96-1-pulsar-bookie-1                       1/1     Running     0               35m     10.104.5.134   4am-node12   <none>           <none>
fouram-tag-no-clean-jns96-1-pulsar-bookie-2                       1/1     Running     0               35m     10.104.6.211   4am-node13   <none>           <none>
fouram-tag-no-clean-jns96-1-pulsar-bookie-init-874rr              0/1     Completed   0               35m     10.104.1.245   4am-node10   <none>           <none>
fouram-tag-no-clean-jns96-1-pulsar-broker-0                       1/1     Running     0               35m     10.104.4.58    4am-node11   <none>           <none>
fouram-tag-no-clean-jns96-1-pulsar-proxy-0                        1/1     Running     0               35m     10.104.6.199   4am-node13   <none>           <none>
fouram-tag-no-clean-jns96-1-pulsar-pulsar-init-frnnm              0/1     Completed   0               35m     10.104.6.196   4am-node13   <none>           <none>
fouram-tag-no-clean-jns96-1-pulsar-recovery-0                     1/1     Running     0               35m     10.104.5.123   4am-node12   <none>           <none>
fouram-tag-no-clean-jns96-1-pulsar-zookeeper-0                    1/1     Running     0               35m     10.104.6.204   4am-node13   <none>           <none>
fouram-tag-no-clean-jns96-1-pulsar-zookeeper-1                    1/1     Running     0               34m     10.104.4.67    4am-node11   <none>           <none>
fouram-tag-no-clean-jns96-1-pulsar-zookeeper-2                    1/1     Running     0               33m     10.104.5.136   4am-node12   <none>           <none>

client log:

截屏2022-12-20 10 54 51

Expected Behavior

No response

Steps To Reproduce

1. create a collection
2. build ivfsq8 index
3. insert 10w data
4. build index again
5. load collection 
7. search, query, load, scene_insert_delete_flush with collection->raise an error

Milvus Log

No response

Anything else?

{
	"config.yaml": "locust_random_performance:
		  collections:
		    -
		      collection_name: sift_10w_128_l2
		      ni_per: 50000
		      other_fields: float1
		      build_index: true
		      index_type: ivf_sq8
		      index_param:
		        nlist: 2048
		      task:
		        types:
		          -
		            type: query
		            weight: 20
		            params:
		              top_k: 10
		              nq: 10
		              search_param:
		                nprobe: 16
		              filters:
		                -
		                  range: \"{'range': {'float1': {'GT': -1.0, 'LT': collection_size * 0.5}}}\"
		          -
		            type: load
		            weight: 1
		          -
		            type: get
		            weight: 10
		            params:
		              ids_length: 10
		          -
		            type: scene_insert_delete_flush
		            weight: 1
		        connection_num: 1
		        clients_num: 20
		        spawn_rate: 2
		        during_time: 12h
		"
}

jingkl avatar Dec 20 '22 02:12 jingkl

/assign @bigsheeper pls take a look into it

xiaofan-luan avatar Dec 20 '22 03:12 xiaofan-luan

Collection of data as long as 10w, compaction operations cause a large increase in memory

querynode memory:

截屏2022-12-20 12 00 34

rootcoord log: 截屏2022-12-20 12 01 35

jingkl avatar Dec 20 '22 04:12 jingkl

is is caused by compaction or delete? Because I saw scene_insert_delete_flush?

xiaofan-luan avatar Dec 20 '22 15:12 xiaofan-luan

force to deny writing is caused by memory protection, which i think is good. So the issue turns to why the memory increased to 60GB in 30 mins. @jingkl how many times were scene_insert_delete_flush called in this period?

yanliang567 avatar Dec 21 '22 01:12 yanliang567

There are 2000 requests so far, but 1765 should have been limited 截屏2022-12-21 11 40 51

jingkl avatar Dec 21 '22 03:12 jingkl

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Feb 05 '23 07:02 stale[bot]

keep it avtive

yanliang567 avatar Feb 06 '23 01:02 yanliang567

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Mar 17 '23 04:03 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Aug 02 '23 05:08 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Sep 03 '23 18:09 stale[bot]