milvus
milvus copied to clipboard
[Bug]: Query failed with error `fail to Query, QueryNode ID = 19, reason=target node id not match target id = 19, node id = 27` after reinstallation
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20230420-935d79c9
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
[2023-04-20T10:37:51.596Z] self = <pymilvus.client.grpc_handler.GrpcHandler object at 0x7fe3bdc3ff40>
[2023-04-20T10:37:51.596Z] collection_name = 'deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_1_is_deleted_is_deleted_data_size_3000'
[2023-04-20T10:37:51.596Z] expr = 'int64 in [0, 1]', output_fields = ['int64'], partition_names = None
[2023-04-20T10:37:51.596Z] timeout = None
[2023-04-20T10:37:51.596Z] kwargs = {'check_task': 'check_query_not_empty', 'guarantee_timestamp': 0, 'schema': {'auto_id': False, 'consistency_level': 0,...R: 21>}, {'description': '', 'name': 'binary_vector', 'params': {'dim': 128}, 'type': <DataType.BINARY_VECTOR: 100>}]}}
[2023-04-20T10:37:51.596Z] collection_schema = {'auto_id': False, 'consistency_level': 0, 'description': '', 'fields': [{'auto_id': False, 'description': '', 'is_pri...AR: 21>}, {'description': '', 'name': 'binary_vector', 'params': {'dim': 128}, 'type': <DataType.BINARY_VECTOR: 100>}]}
[2023-04-20T10:37:51.596Z] consistency_level = 0
[2023-04-20T10:37:51.596Z] request = collection_name: "deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_...ta_size_3000"
[2023-04-20T10:37:51.596Z] expr: "int64 in [0, 1]"
[2023-04-20T10:37:51.596Z] output_fields: "int64"
[2023-04-20T10:37:51.596Z] query_params {
[2023-04-20T10:37:51.596Z] key: "ignore_growing"
[2023-04-20T10:37:51.596Z] value: "False"
[2023-04-20T10:37:51.596Z] }
[2023-04-20T10:37:51.596Z]
[2023-04-20T10:37:51.596Z] future = <_MultiThreadedRendezvous of RPC that terminated with:
[2023-04-20T10:37:51.596Z] status = StatusCode.OK
[2023-04-20T10:37:51.596Z] details = ""
[2023-04-20T10:37:51.596Z] >
[2023-04-20T10:37:51.596Z] response = status {
[2023-04-20T10:37:51.596Z] error_code: UnexpectedError
[2023-04-20T10:37:51.596Z] reason: "fail to query on all shard leaders, err=fail to Query, QueryNode ID = 19, reason=target node id not match target id = 19, node id = 27"
[2023-04-20T10:37:51.596Z] }
[2023-04-20T10:37:51.596Z]
[2023-04-20T10:37:51.596Z]
[2023-04-20T10:37:51.596Z] @retry_on_rpc_failure()
[2023-04-20T10:37:51.596Z] def query(self, collection_name, expr, output_fields=None, partition_names=None, timeout=None, **kwargs):
[2023-04-20T10:37:51.596Z] if output_fields is not None and not isinstance(output_fields, (list,)):
[2023-04-20T10:37:51.596Z] raise ParamError(message="Invalid query format. 'output_fields' must be a list")
[2023-04-20T10:37:51.596Z] collection_schema = kwargs.get("schema", None)
[2023-04-20T10:37:51.596Z] if not collection_schema:
[2023-04-20T10:37:51.596Z] collection_schema = self.describe_collection(collection_name, timeout)
[2023-04-20T10:37:51.596Z] consistency_level = collection_schema["consistency_level"]
[2023-04-20T10:37:51.596Z] # overwrite the consistency level defined when user created the collection
[2023-04-20T10:37:51.596Z] consistency_level = get_consistency_level(kwargs.get("consistency_level", consistency_level))
[2023-04-20T10:37:51.596Z]
[2023-04-20T10:37:51.596Z] ts_utils.construct_guarantee_ts(consistency_level, collection_name, kwargs)
[2023-04-20T10:37:51.596Z] request = Prepare.query_request(collection_name, expr, output_fields, partition_names, **kwargs)
[2023-04-20T10:37:51.596Z]
[2023-04-20T10:37:51.596Z] future = self._stub.Query.future(request, timeout=timeout)
[2023-04-20T10:37:51.596Z] response = future.result()
[2023-04-20T10:37:51.596Z] if response.status.error_code == Status.EMPTY_COLLECTION:
[2023-04-20T10:37:51.596Z] return []
[2023-04-20T10:37:51.596Z] if response.status.error_code != Status.SUCCESS:
[2023-04-20T10:37:51.596Z] > raise MilvusException(response.status.error_code, response.status.reason)
[2023-04-20T10:37:51.596Z] E pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=fail to query on all shard leaders, err=fail to Query, QueryNode ID = 19, reason=target node id not match target id = 19, node id = 27)>
[2023-04-20T10:37:51.596Z]
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
failed job: deploy_test_cron/666 log: artifacts-pulsar-cluster-reinstall-666-server-logs.tar.gz artifacts-pulsar-cluster-reinstall-666-pytest-logs.tar.gz
Anything else?
No response
/assign @jiaoew1991 /unassign
/unassign @jiaoew1991 /assign
It seems that pulsar is started in exclusive mode, which means, only 1 consumer is allowed as active, but two consumers are created. Can you tell me the specific settings or provide a method to get the specific settings of pulsar to help me verify this conclusion? @zhuwenxing

![]()
Try to clean up subscriptions, but failed. It is considered to be a network problem between milvus and pulsar. Is this problem recurring stably? @zhuwenxing
We should think of use shared mode in all subscription rather than exclusive mode. And we need to figure out a way to clean subscription when server dead. I thought there is already a mechanism to cleanup expired subscriptions right?
Is this problem recurring stably?
@smellthemoon Not a stable issue
a network problem between milvus and pulsar. /assign @zhuwenxing /unassign