milvus
milvus copied to clipboard
[Bug]: [benckmark] growing segent concurrent search error: "failed to search: leader not available: node offline" with milvus reboot
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:2.3-20240110-1128b1dd
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
The configuration I added to config.yaml makes segment guaranteed to be growing segment
dataCoord:
segment:
sealProportion: 0.9
maxIdleTime: 7200
I have a continuous searcg with scalar filtering after inserting 40w 128d vectors for milvus
server:
new-stable-mast09600-4-68-3801-etcd-0 1/1 Running 0 2m18s 10.104.27.67 4am-node31 <none> <none>
new-stable-mast09600-4-68-3801-milvus-standalone-7fc975956xdh4t 1/1 Running 0 2m18s 10.104.25.112 4am-node30 <none> <none>
new-stable-mast09600-4-68-3801-minio-7559486467-9nh72 1/1 Running 0 2m18s 10.104.16.191 4am-node21 <none> <none> (base.py:257)
[2024-01-10 22:06:04,192 - INFO - fouram]: [Cmd Exe] kubectl get pods -n qa-milvus -o wide | grep -E 'STATUS|new-stable-mast09600-4-68-3801-milvus|new-stable-mast09600-4-68-3801-minio|new-stable-mast09600-4-68-3801-etcd|new-stable-mast09600-4-68-3801-pulsar|new-stable-mast09600-4-68-3801-kafka' (util_cmd.py:14)
[2024-01-10 22:06:13,733 - INFO - fouram]: [CliClient] pod details of release(new-stable-mast09600-4-68-3801):
I0110 22:06:05.460316 490 request.go:665] Waited for 1.1665361s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.default.svc.cluster.local/apis/actions.summerwind.dev/v1alpha1?timeout=32s
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
new-stable-mast09600-4-68-3801-etcd-0 1/1 Running 0 4h3m 10.104.27.67 4am-node31 <none> <none>
new-stable-mast09600-4-68-3801-milvus-standalone-7fc975956xdh4t 1/1 Running 1 (148m ago) 4h3m 10.104.25.112 4am-node30 <none> <none>
new-stable-mast09600-4-68-3801-minio-7559486467-9nh72 1/1 Running 0 4h3m 10.104.16.191 4am-node21 <none> <none> (cli_client.py:132)
client error log:
{pod=~"new-stable-master-1704909600-2128327225"} |= "ERROR"
A milvus reboot occurs and is a growing segment before the reboot and a sealed segment after the reboot.
Expected Behavior
search should not fail no milvus reboot
Steps To Reproduce
- Modify config.yaml
dataCoord:
segment:
sealProportion: 0.9
maxIdleTime: 7200
- Deploy milvus
- Insert 40w 128d vector and a column of float scalars
- perform concurrent search with scalar filtering
- Step 4 lasts 4h
Milvus Log
No response
Anything else?
No response
/assign @congqixia /unassign
Continuous search in the growing segment phase, with some search errors.
client pod: new-stable-master-1705600800-2994728836 client error log:
[2024-01-18 18:35:31,116 - INFO - fouram]: (stats.py:820)
[2024-01-18 18:35:47,596 - ERROR - fouram]: grpc RpcError: [search], <_InactiveRpcError: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-01-18 18:34:47.533596', 'gRPC error': '2024-01-18 18:35:47.595973'}> (decorators.py:139)
[2024-01-18 18:35:48,165 - ERROR - fouram]: (api_response) : <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Deadline Exceeded {grpc_status:4, created_time:"2024-01-18T18:35:47.587786919+00:00"}"
> (api_request.py:54)
[2024-01-18 18:35:48,165 - ERROR - fouram]: [CheckFunc] search request check failed, response:<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Deadline Exceeded {grpc_status:4, created_time:"2024-01-18T18:35:47.587786919+00:00"}"
> (func_check.py:48)
[2024-01-18 18:35:48,168 - ERROR - fouram]: grpc RpcError: [search], <_InactiveRpcError: StatusCode.DEADLINE_EXCEEDED, Deadline Exceeded>, <Time:{'RPC start': '2024-01-18 18:34:47.535324', 'gRPC error': '2024-01-18 18:35:48.168617'}> (decorators.py:139)
[2024-01-18 18:35:48,169 - ERROR - fouram]: (api_response) : <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Deadline Exceeded {created_time:"2024-01-18T18:35:47.59056936+00:00", grpc_status:4}"
> (api_request.py:54)
[2024-01-18 18:35:48,169 - ERROR - fouram]: [CheckFunc] search request check failed, response:<_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "UNKNOWN:Deadline Exceeded {created_time:"2024-01-18T18:35:47.59056936+00:00", grpc_status:4}"
server:
new-stable-mast00800-4-51-5682-etcd-0 1/1 Running 0 2m39s 10.104.15.49 4am-node20 <none> <none>
new-stable-mast00800-4-51-5682-milvus-standalone-7567bc6596wv85 1/1 Running 0 2m39s 10.104.25.228 4am-node30 <none> <none>
new-stable-mast00800-4-51-5682-minio-57b9dc878b-tzvmf 1/1 Running 0 2m39s 10.104.27.180 4am-node31 <none> <none>
client result (A total of 18 failed searches.)
'search': {'Requests': 4979423,
'Fails': 18,
'RPS': 345.79,
'fail_s': 0.0,
'RT_max': 60640.49,
'RT_avg': 45.48,
test env : 4am cluster
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
This issue has not arisen recently