milvus
milvus copied to clipboard
[Bug]: Search failed with error `Search 2 failed, reason query shard(channel) by-dev-rootcoord-dml_3_437739255564537236v1 does not exist` after pulsar pod failure chaos test
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: master-20221130-67390d20
- Deployment mode(standalone or cluster): cluster
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.3.0.dev15
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
[2022-11-30T21:20:20.057Z] [2022-11-30 21:20:19 - INFO - ci_test]: [test][2022-11-30T21:20:19Z] [0.00206491s] Hello_Milvus flush -> None (wrapper.py:30)
[2022-11-30T21:20:20.057Z] [2022-11-30 21:20:19 - INFO - ci_test]: assert flush: 2.0231664180755615, entities: 9000 (test_data_persistence.py:45)
[2022-11-30T21:20:20.057Z] [2022-11-30 21:20:19 - INFO - ci_test]: index info: [{'collection': 'Hello_Milvus', 'field': 'float_vector', 'index_name': 'test_HLbXFvT4', 'index_param': {'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 48, 'efConstruction': 500}}}, {'collection': 'Hello_Milvus', 'field': 'varchar', 'index_name': 'test_rJL6bPkC', 'index_param': {'index_type': 'Trie'}}] (test_data_persistence.py:64)
[2022-11-30T21:20:20.057Z] [2022-11-30 21:20:19 - DEBUG - ci_test]: (api_request) : [Collection.load] args: [None, 1, 120], kwargs: {} (api_request.py:56)
[2022-11-30T21:20:20.057Z] [2022-11-30 21:20:19 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2022-11-30T21:20:20.057Z] [2022-11-30 21:20:19 - INFO - ci_test]: [test][2022-11-30T21:20:19Z] [0.00523320s] Hello_Milvus load -> None (wrapper.py:30)
[2022-11-30T21:20:20.057Z] [2022-11-30 21:20:19 - DEBUG - ci_test]: (api_request) : [Collection.search] args: [[[0.04650967923462504, 0.09359632712206453, 0.09375678212618015, 0.05750658831758217, 0.12979588567542033, 0.08243681298128856, 0.022396676907106085, 0.07546737456769882, 0.10166835352461175, 0.0890813797380326, 0.13020253195002898, 0.0245454026352224, 0.11237346686014102, 0.015401923391799665, 0.1......, kwargs: {} (api_request.py:56)
[2022-11-30T21:20:20.057Z] [2022-11-30 21:20:19 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=7, reason=Search 2 failed, reason query shard(channel) by-dev-rootcoord-dml_3_437739255564537236v1 does not exist
[2022-11-30T21:20:20.057Z] err %!w(<nil>))>, <Time:{'RPC start': '2022-11-30 21:20:19.559414', 'RPC error': '2022-11-30 21:20:19.783251'}> (decorators.py:108)
[2022-11-30T21:20:20.057Z] [2022-11-30 21:20:19 - ERROR - ci_test]: Traceback (most recent call last):
[2022-11-30T21:20:20.057Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper
[2022-11-30T21:20:20.057Z] res = func(*args, **_kwargs)
[2022-11-30T21:20:20.057Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request
[2022-11-30T21:20:20.057Z] return func(*arg, **kwargs)
[2022-11-30T21:20:20.057Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 610, in search
[2022-11-30T21:20:20.057Z] res = conn.search(self._name, data, anns_field, param, limit, expr,
[2022-11-30T21:20:20.057Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
[2022-11-30T21:20:20.057Z] raise e
[2022-11-30T21:20:20.057Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
[2022-11-30T21:20:20.057Z] return func(*args, **kwargs)
[2022-11-30T21:20:20.057Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
[2022-11-30T21:20:20.057Z] ret = func(self, *args, **kwargs)
[2022-11-30T21:20:20.057Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
[2022-11-30T21:20:20.057Z] raise e
[2022-11-30T21:20:20.057Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
[2022-11-30T21:20:20.057Z] return func(self, *args, **kwargs)
[2022-11-30T21:20:20.057Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 469, in search
[2022-11-30T21:20:20.057Z] return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)
[2022-11-30T21:20:20.057Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 438, in _execute_search_requests
[2022-11-30T21:20:20.057Z] raise pre_err
[2022-11-30T21:20:20.057Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 429, in _execute_search_requests
[2022-11-30T21:20:20.057Z] raise MilvusException(response.status.error_code, response.status.reason)
[2022-11-30T21:20:20.057Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=7, reason=Search 2 failed, reason query shard(channel) by-dev-rootcoord-dml_3_437739255564537236v1 does not exist
[2022-11-30T21:20:20.058Z] err %!w(<nil>))>
[2022-11-30T21:20:20.058Z] (api_request.py:39)
[2022-11-30T21:20:20.058Z] [2022-11-30 21:20:19 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=7, reason=Search 2 failed, reason query shard(channel) by-dev-rootcoord-dml_3_437739255564537236v1 does not exist
[2022-11-30T21:20:20.058Z] err %!w(<nil>))> (api_request.py:40)
Expected Behavior
all test cases passed
Steps To Reproduce
No response
Milvus Log
chaos type: pod-failure image tag: master-20221130-67390d20 target pod: pulsar failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/247/pipeline log:
artifacts-pulsar-pod-failure-247-server-logs.tar.gz
artifacts-pulsar-pod-failure-247-pytest-logs.tar.gz
Anything else?
No response
/assign @jiaoew1991 /unassign
/assign @aoiasd /unassign
This problem has been bothering me, and it has not been resolved yet. The problem comes from issue 21324
It was reproduced in 2.2.0-20230116-3a5f38b1
[2023-01-16T23:15:12.029Z] [2023-01-16 23:14:52 - DEBUG - ci_test]: (api_request) : [Collection.load] args: [None, 1, 120], kwargs: {} (api_request.py:56)
[2023-01-16T23:15:12.029Z] [2023-01-16 23:14:52 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2023-01-16T23:15:12.029Z] [2023-01-16 23:14:52 - INFO - ci_test]: [test][2023-01-16T23:14:52Z] [0.00717665s] SearchChecker__DGgJKDXD load -> None (wrapper.py:30)
[2023-01-16T23:15:12.029Z] [2023-01-16 23:14:52 - DEBUG - ci_test]: (api_request) : [Collection.search] args: [[[0.004736048288464679, 0.009620340391613972, 0.08556947487657732, 0.1232704381627744, 0.0474209602725107, 0.03195405985025566, 0.09706087773977706, 0.14389298676275802, 0.13296566682522157, 0.11703348228419408, 0.10078517190687802, 0.11420135602802678, 0.02528739878783225, 0.028994504250801856, 0......., kwargs: {} (api_request.py:56)
[2023-01-16T23:15:12.029Z] [2023-01-16 23:14:52 - ERROR - pymilvus.decorators]: RPC error: [search], <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=20, reason=Search 21 failed, reason query shard(channel) by-dev-rootcoord-dml_23_438805441546227711v1 does not exist
[2023-01-16T23:15:12.029Z] err %!w(<nil>))>, <Time:{'RPC start': '2023-01-16 23:14:52.714673', 'RPC error': '2023-01-16 23:14:52.980790'}> (decorators.py:108)
[2023-01-16T23:15:12.029Z] [2023-01-16 23:14:52 - ERROR - ci_test]: Traceback (most recent call last):
[2023-01-16T23:15:12.029Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper
[2023-01-16T23:15:12.029Z] res = func(*args, **_kwargs)
[2023-01-16T23:15:12.029Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request
[2023-01-16T23:15:12.029Z] return func(*arg, **kwargs)
[2023-01-16T23:15:12.029Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 609, in search
[2023-01-16T23:15:12.029Z] res = conn.search(self._name, data, anns_field, param, limit, expr,
[2023-01-16T23:15:12.029Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
[2023-01-16T23:15:12.029Z] raise e
[2023-01-16T23:15:12.029Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
[2023-01-16T23:15:12.029Z] return func(*args, **kwargs)
[2023-01-16T23:15:12.029Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
[2023-01-16T23:15:12.029Z] ret = func(self, *args, **kwargs)
[2023-01-16T23:15:12.029Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
[2023-01-16T23:15:12.029Z] raise e
[2023-01-16T23:15:12.029Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
[2023-01-16T23:15:12.029Z] return func(self, *args, **kwargs)
[2023-01-16T23:15:12.029Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 470, in search
[2023-01-16T23:15:12.029Z] return self._execute_search_requests(requests, timeout, round_decimal=round_decimal, auto_id=auto_id, **kwargs)
[2023-01-16T23:15:12.029Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 439, in _execute_search_requests
[2023-01-16T23:15:12.029Z] raise pre_err
[2023-01-16T23:15:12.029Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 430, in _execute_search_requests
[2023-01-16T23:15:12.029Z] raise MilvusException(response.status.error_code, response.status.reason)
[2023-01-16T23:15:12.029Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=20, reason=Search 21 failed, reason query shard(channel) by-dev-rootcoord-dml_23_438805441546227711v1 does not exist
[2023-01-16T23:15:12.029Z] err %!w(<nil>))>
[2023-01-16T23:15:12.029Z] (api_request.py:39)
[2023-01-16T23:15:12.029Z] [2023-01-16 23:14:52 - ERROR - ci_test]: (api_response) : <MilvusException: (code=1, message=fail to search on all shard leaders, err=fail to Search, QueryNode ID=20, reason=Search 21 failed, reason query shard(channel) by-dev-rootcoord-dml_23_438805441546227711v1 does not exist
[2023-01-16T23:15:12.029Z] err %!w(<nil>))> (api_request.py:40)
[2023-01-16T23:15:12.029Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------
[2023-01-16T23:15:12.029Z] =========================== short test summary info ============================
[2023-01-16T23:15:12.029Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[DeleteChecker__1ecIfg9u] - AssertionError
[2023-01-16T23:15:12.029Z] FAILED testcases/test_all_collections_after_chaos.py::TestAllCollection::test_milvus_default[SearchChecker__DGgJKDXD] - AssertionError
[2023-01-16T23:15:12.029Z] =================== 2 failed, 10 passed in 77.02s (0:01:17) ====================
script returned exit code 1
chaos type: pod-kill image tag: 2.2.0-20230116-3a5f38b1 target pod: querynode failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-kafka-for-release-cron/detail/chaos-test-kafka-for-release-cron/1282/pipeline
log:
artifacts-querynode-pod-kill-1282-server-logs.tar.gz artifacts-querynode-pod-kill-1282-pytest-logs.tar.gz
@aoiasd
Please take a look
@aoiasd
Please take a look
OK
first err: one querynode restart because pulsar and fetch search task immediately before watchDeltaChannel,So could not get shard second err: related https://github.com/milvus-io/milvus/issues/21357 collection has two vchannel v0 and v1,one node has shard leader of v0 and one shard of v1, querycoord want to unsubscribe v0, but unsubChannelTask unsubscribe both.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
吼吼,知道了~~
Fixed at https://github.com/milvus-io/milvus/pull/21794
吼吼,知道了~~
/unassign @aoiasd /assign @zhuwenxing pls verify it
吼吼,知道了~~