milvus [Bug]: Standalone pod restarted several times when concurrent insert and search multi collections

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version: 2.2.0-20230504-842e5d21
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):  rocksmq  
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.2.8.dev1
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

deploy standalone with image 2.2.0-20230504-842e5d21 and config

  config:
    log:
      level: debug
    quotaAndLimits:
      limitWriting:
        diskProtection:
          diskQuotaPerCollection: 500

Test concurrent create collection and insert search

create collection
insert 1m-128d data and ni is 10000
flush and get num entities
build hnsw index: {"M": 8, "efConstruction": 200}
load collection
Search with nq=10, top_k=100 for 5000 times

standalone pod restart 4 times

panic logs:

Expected Behavior

No response

Steps To Reproduce

fouram argo name: `quota-collections-3`

run fouramf case:

    @pytest.mark.locust
    @pytest.mark.parametrize("deploy_mode", [STANDALONE])
    def test_concurrent_locust_multi_collections(self, input_params: InputParamsBase, deploy_mode):
        """
        Used to check whether the memory usage of queryNodes is balanced.

        :test steps:
            1. concurrent test and calculation of RT and QPS
        """
        concurrent_tasks = [
            ConcurrentParams.params_scene_search_test(
                weight=5, shards_num=2, data_size='1m', nb=10000, replica_number=1,
                index_type=pn.IndexTypeName.HNSW, index_param={"M": 8, "efConstruction": 200}, nq=10, top_k=100, search_param={"ef": 34},
                search_counts=5000)
        ]
        default_case_params = ConcurrentParams().params_scene_concurrent(
            concurrent_tasks, concurrent_number=[50], during_time="5h", interval=20, dataset_size=0, ni_per=0,
            replica_number=1, **cdp.DefaultIndexParams.HNSW)

        self.concurrency_template(input_params=input_params, cpu=dp.min_cpu, mem=dp.min_mem,
                                  deploy_mode=deploy_mode, old_version_format=False,
                                  case_callable_obj=ConcurrentClientBase().scene_concurrent_locust,
                                  default_case_params=default_case_params)



### Milvus Log

server pods in `fouram` cluster and `qa-milvus` ns:

k get pod -o wide -n qa-milvus | grep fouram-op-54-8249 fouram-op-54-8249-etcd-0 1/1 Running 0 28h 10.104.4.130 4am-node11 fouram-op-54-8249-milvus-standalone-6b454c485b-bgfnx 1/1 Running 4 (101m ago) 28h 10.104.4.151 4am-node11 fouram-op-54-8249-minio-744659cbdf-h5xlr 1/1 Running 0 28h 10.104.4.131 4am-node11


client pod in `fouram` cluster and `qa` ns:

quota-collections-3-1904905947


[standalone_pre.log](https://github.com/milvus-io/milvus/files/11406741/standalone_pre.log)


### Anything else?

_No response_

May 05 '23 13:05 ThreadDao

/assign @jiaoew1991 /unassign

May 06 '23 01:05 yanliang567

/assign @yah01 /unassign

May 06 '23 09:05 jiaoew1991

concurrent write/read map

May 09 '23 02:05 yah01

#23957 has fixed this

May 09 '23 02:05 yah01

/assign @ThreadDao plz help check with #23957

May 09 '23 02:05 yah01

rerun-image: 2.2.0-20230509-341b62d5 standalone also restarted, one oomkilled and other is completed with 0 exit code

I stop the test and update standalone pod memory from 16G to 20G, it also crashed

fouram-op-54-8249-etcd-0                                          1/1     Running                  0                5d      10.104.4.130    4am-node11   <none>           <none>
fouram-op-54-8249-milvus-standalone-865768cb7c-vhvsm              0/1     Running                  10 (5m32s ago)   40m     10.104.4.170    4am-node11   <none>           <none>
fouram-op-54-8249-minio-744659cbdf-h5xlr                          1/1     Running                  0                5d      10.104.4.131    4am-node11   <none>           <none>

standalone pod previous log: standalone_pre_1.log

May 09 '23 09:05 ThreadDao

/assign @yah01 plz help to check whether it is caused by insufficient memory? if yew, why the exit code is 0

May 09 '23 09:05 ThreadDao

/assign @yah01 /unassign

image: 2.2.0-20230512-d882624b the standalone pod also crash

fouram-op-54-8249-etcd-0                                          1/1     Running            0                 8d
fouram-op-54-8249-milvus-standalone-576d456c9d-7z9sb              0/1     Running            2 (45s ago)       6m56s
fouram-op-54-8249-minio-744659cbdf-h5xlr                          1/1     Running            0                 8d

standalone_pre_1.log

May 12 '23 10:05 ThreadDao

standalone also crash, exit code 0

pre log: Uploading standalone_pre_completed.log…

May 17 '23 08:05 ThreadDao

@yah01 any updates?

May 24 '23 07:05 binbinlv

@yah01 any updates?

related #24489

May 29 '23 10:05 yah01

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

Aug 03 '23 03:08 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

Sep 04 '23 06:09 stale[bot]

milvus milvus copied to clipboard

[Bug]: Standalone pod restarted several times when concurrent insert and search multi collections

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

milvus
milvus copied to clipboard