milvus [Bug]: [benckmark] growing segent concurrent search error: "failed to search: leader not available: node offline" with milvus reboot

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version:2.3-20240110-1128b1dd
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

The configuration I added to config.yaml makes segment guaranteed to be growing segment

  dataCoord:
    segment:
      sealProportion: 0.9
      maxIdleTime: 7200

I have a continuous searcg with scalar filtering after inserting 40w 128d vectors for milvus

server:

new-stable-mast09600-4-68-3801-etcd-0                             1/1     Running     0               2m18s   10.104.27.67    4am-node31   <none>           <none>
new-stable-mast09600-4-68-3801-milvus-standalone-7fc975956xdh4t   1/1     Running     0               2m18s   10.104.25.112   4am-node30   <none>           <none>
new-stable-mast09600-4-68-3801-minio-7559486467-9nh72             1/1     Running     0               2m18s   10.104.16.191   4am-node21   <none>           <none> (base.py:257)
[2024-01-10 22:06:04,192 -  INFO - fouram]: [Cmd Exe]  kubectl get pods  -n qa-milvus  -o wide | grep -E 'STATUS|new-stable-mast09600-4-68-3801-milvus|new-stable-mast09600-4-68-3801-minio|new-stable-mast09600-4-68-3801-etcd|new-stable-mast09600-4-68-3801-pulsar|new-stable-mast09600-4-68-3801-kafka'  (util_cmd.py:14)
[2024-01-10 22:06:13,733 -  INFO - fouram]: [CliClient] pod details of release(new-stable-mast09600-4-68-3801): 
 I0110 22:06:05.460316     490 request.go:665] Waited for 1.1665361s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/actions.summerwind.dev/v1alpha1?timeout=32s
NAME                                                              READY   STATUS      RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
new-stable-mast09600-4-68-3801-etcd-0                             1/1     Running     0               4h3m    10.104.27.67    4am-node31   <none>           <none>
new-stable-mast09600-4-68-3801-milvus-standalone-7fc975956xdh4t   1/1     Running     1 (148m ago)    4h3m    10.104.25.112   4am-node30   <none>           <none>
new-stable-mast09600-4-68-3801-minio-7559486467-9nh72             1/1     Running     0               4h3m    10.104.16.191   4am-node21   <none>           <none> (cli_client.py:132)

client error log: {pod=~"new-stable-master-1704909600-2128327225"} |= "ERROR"

A milvus reboot occurs and is a growing segment before the reboot and a sealed segment after the reboot.

Expected Behavior

search should not fail no milvus reboot

Steps To Reproduce

Modify config.yaml

  dataCoord:
    segment:
      sealProportion: 0.9
      maxIdleTime: 7200

Deploy milvus
Insert 40w 128d vector and a column of float scalars
perform concurrent search with scalar filtering
Step 4 lasts 4h

Milvus Log

No response

Anything else?

No response

Jan 11 '24 04:01 elstic

/assign @congqixia /unassign

Jan 12 '24 01:01 yanliang567

Continuous search in the growing segment phase, with some search errors.

client pod: new-stable-master-1705600800-2994728836 client error log:

[2024-01-18 18:35:31,116 -  INFO - fouram]:  (stats.py:820)
[2024-01-18 18:35:47,596 - ERROR - fouram]: grpc RpcError: [search], <_InactiveRpcError: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-01-18 18:34:47.533596', 'gRPC error': '2024-01-18 18:35:47.595973'}> (decorators.py:139)
[2024-01-18 18:35:48,165 - ERROR - fouram]: (api_response) : <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "UNKNOWN:Deadline Exceeded {grpc_status:4, created_time:"2024-01-18T18:35:47.587786919+00:00"}"
> (api_request.py:54)
[2024-01-18 18:35:48,165 - ERROR - fouram]: [CheckFunc] search request check failed, response:<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "UNKNOWN:Deadline Exceeded {grpc_status:4, created_time:"2024-01-18T18:35:47.587786919+00:00"}"
> (func_check.py:48)
[2024-01-18 18:35:48,168 - ERROR - fouram]: grpc RpcError: [search], <_InactiveRpcError: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-01-18 18:34:47.535324', 'gRPC error': '2024-01-18 18:35:48.168617'}> (decorators.py:139)
[2024-01-18 18:35:48,169 - ERROR - fouram]: (api_response) : <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "UNKNOWN:Deadline Exceeded {created_time:"2024-01-18T18:35:47.59056936+00:00", grpc_status:4}"
> (api_request.py:54)
[2024-01-18 18:35:48,169 - ERROR - fouram]: [CheckFunc] search request check failed, response:<_InactiveRpcError of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "UNKNOWN:Deadline Exceeded {created_time:"2024-01-18T18:35:47.59056936+00:00", grpc_status:4}"

server:

new-stable-mast00800-4-51-5682-etcd-0                             1/1     Running           0                2m39s   10.104.15.49    4am-node20   <none>           <none>
new-stable-mast00800-4-51-5682-milvus-standalone-7567bc6596wv85   1/1     Running           0                2m39s   10.104.25.228   4am-node30   <none>           <none>
new-stable-mast00800-4-51-5682-minio-57b9dc878b-tzvmf             1/1     Running           0                2m39s   10.104.27.180   4am-node31   <none>           <none>

client result (A total of 18 failed searches.)

'search': {'Requests': 4979423,
          'Fails': 18,
          'RPS': 345.79,
          'fail_s': 0.0,
          'RT_max': 60640.49,
          'RT_avg': 45.48,

test env : 4am cluster

Jan 19 '24 06:01 elstic

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

Jun 15 '24 16:06 stale[bot]

This issue has not arisen recently

Jun 17 '24 02:06 elstic

milvus milvus copied to clipboard

[Bug]: [benckmark] growing segent concurrent search error: "failed to search: leader not available: node offline" with milvus reboot

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

milvus
milvus copied to clipboard