milvus
milvus copied to clipboard
[Bug]: Replicas number is not as expected after upgrade from v2.2.3 to 2.2.0-20230310-b2ece6a5
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:v2.2.3 --> 2.2.0-20230310-b2ece6a5
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
[2023-03-13T11:01:08.474Z] =================================== FAILURES ===================================
[2023-03-13T11:01:08.474Z] _ TestActionSecondDeployment.test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] _
[2023-03-13T11:01:08.474Z] [gw3] linux -- Python 3.8.10 /usr/bin/python3.8
[2023-03-13T11:01:08.474Z]
[2023-03-13T11:01:08.474Z] self = <test_action_second_deployment.TestActionSecondDeployment object at 0x7f7fda184dc0>
[2023-03-13T11:01:08.474Z] all_collection_name = 'deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000'
[2023-03-13T11:01:08.474Z] data_size = 3000
[2023-03-13T11:01:08.474Z]
[2023-03-13T11:01:08.474Z] @pytest.mark.tags(CaseLabel.L3)
[2023-03-13T11:01:08.474Z] def test_check(self, all_collection_name, data_size):
[2023-03-13T11:01:08.474Z] """
[2023-03-13T11:01:08.474Z] before reinstall: create collection
[2023-03-13T11:01:08.474Z] """
[2023-03-13T11:01:08.474Z] self._connect()
[2023-03-13T11:01:08.474Z] ms = MilvusSys()
[2023-03-13T11:01:08.474Z] name = all_collection_name
[2023-03-13T11:01:08.474Z] is_binary = False
[2023-03-13T11:01:08.474Z] if "BIN" in name:
[2023-03-13T11:01:08.474Z] is_binary = True
[2023-03-13T11:01:08.474Z] collection_w, _ = self.collection_wrap.init_collection(name=name)
[2023-03-13T11:01:08.474Z] self.collection_w = collection_w
[2023-03-13T11:01:08.474Z] schema = collection_w.schema
[2023-03-13T11:01:08.474Z] data_type = [field.dtype for field in schema.fields]
[2023-03-13T11:01:08.474Z] field_name = [field.name for field in schema.fields]
[2023-03-13T11:01:08.474Z] type_field_map = dict(zip(data_type, field_name))
[2023-03-13T11:01:08.474Z] if is_binary:
[2023-03-13T11:01:08.474Z] default_index_field = ct.default_binary_vec_field_name
[2023-03-13T11:01:08.474Z] vector_index_type = "BIN_IVF_FLAT"
[2023-03-13T11:01:08.474Z] else:
[2023-03-13T11:01:08.474Z] default_index_field = ct.default_float_vec_field_name
[2023-03-13T11:01:08.474Z] vector_index_type = "IVF_FLAT"
[2023-03-13T11:01:08.474Z]
[2023-03-13T11:01:08.474Z] binary_vector_index_types = [index.params["index_type"] for index in collection_w.indexes if
[2023-03-13T11:01:08.474Z] index.field_name == type_field_map.get(100, "")]
[2023-03-13T11:01:08.474Z] float_vector_index_types = [index.params["index_type"] for index in collection_w.indexes if
[2023-03-13T11:01:08.474Z] index.field_name == type_field_map.get(101, "")]
[2023-03-13T11:01:08.474Z] index_field_map = dict([(index.field_name, index.index_name) for index in collection_w.indexes])
[2023-03-13T11:01:08.474Z] index_names = [index.index_name for index in collection_w.indexes] # used to drop index
[2023-03-13T11:01:08.475Z] vector_index_types = binary_vector_index_types + float_vector_index_types
[2023-03-13T11:01:08.475Z] if len(vector_index_types) > 0:
[2023-03-13T11:01:08.475Z] vector_index_type = vector_index_types[0]
[2023-03-13T11:01:08.475Z] try:
[2023-03-13T11:01:08.475Z] t0 = time.time()
[2023-03-13T11:01:08.475Z] self.utility_wrap.wait_for_loading_complete(name)
[2023-03-13T11:01:08.475Z] log.info(f"wait for {name} loading complete cost {time.time() - t0}")
[2023-03-13T11:01:08.475Z] except Exception as e:
[2023-03-13T11:01:08.475Z] log.error(e)
[2023-03-13T11:01:08.475Z] # get replicas loaded
[2023-03-13T11:01:08.475Z] try:
[2023-03-13T11:01:08.475Z] replicas = collection_w.get_replicas(enable_traceback=False)
[2023-03-13T11:01:08.475Z] replicas_loaded = len(replicas.groups)
[2023-03-13T11:01:08.475Z] except Exception as e:
[2023-03-13T11:01:08.475Z] log.error(e)
[2023-03-13T11:01:08.475Z] replicas_loaded = 0
[2023-03-13T11:01:08.475Z]
[2023-03-13T11:01:08.475Z] log.info(f"collection {name} has {replicas_loaded} replicas")
[2023-03-13T11:01:08.475Z] actual_replicas = re.search(r'replica_number_(.*?)_', name).group(1)
[2023-03-13T11:01:08.475Z] > assert replicas_loaded == int(actual_replicas)
[2023-03-13T11:01:08.475Z] E AssertionError: assert 0 == 2
[2023-03-13T11:01:08.475Z] E + where 2 = int('2')
[2023-03-13T11:01:08.475Z]
[2023-03-13T11:01:08.475Z] testcases/test_action_second_deployment.py:119: AssertionError
[2023-03-13T11:01:08.475Z] ------------------------------ Captured log setup ------------------------------
[2023-03-13T11:01:08.475Z] [2023-03-13 10:56:17 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:39)
[2023-03-13T11:01:08.475Z] [2023-03-13 10:56:17 - INFO - ci_test]: [setup_method] Start setup test case test_check. (client_base.py:40)
[2023-03-13T11:01:08.475Z] ------------------------------ Captured log call -------------------------------
[2023-03-13T11:01:08.475Z] [2023-03-13 10:56:17 - DEBUG - ci_test]: (api_request) : [Connections.connect] args: ['default'], kwargs: {'host': '10.101.14.41', 'port': 19530} (api_request.py:56)
[2023-03-13T11:01:08.475Z] [2023-03-13 10:56:17 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2023-03-13T11:01:08.475Z] [2023-03-13 10:56:17 - DEBUG - ci_test]: (api_request) : [Collection] args: ['deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000', None, 'default', 2], kwargs: {'consistency_level': 'Strong'} (api_request.py:56)
[2023-03-13T11:01:08.475Z] [2023-03-13 10:56:17 - DEBUG - ci_test]: (api_response) : <Collection>:
[2023-03-13T11:01:08.475Z] -------------
[2023-03-13T11:01:08.475Z] <name>: deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000
[2023-03-13T11:01:08.475Z] <partitions>: [{"name": "_default", "collection_name": "deploy_test_index_type_BIN_...... (api_request.py:31)
[2023-03-13T11:01:08.475Z] [2023-03-13 10:56:17 - DEBUG - ci_test]: (api_request) : [wait_for_loading_complete] args: ['deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000', None, 20, 'default'], kwargs: {} (api_request.py:56)
[2023-03-13T11:01:08.475Z] [2023-03-13 10:56:17 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2023-03-13T11:01:08.475Z] [2023-03-13 10:56:17 - INFO - ci_test]: wait for deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000 loading complete cost 0.0041506290435791016 (test_action_second_deployment.py:106)
[2023-03-13T11:01:08.475Z] [2023-03-13 10:56:17 - ERROR - pymilvus.decorators]: RPC error: [get_replicas], <MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:440061701486776054 channelName:"by-dev-rootcoord-dml_132_440061701486776054v0" seek_position:<channel_name:"by-dev-rootcoord-dml_132_440061701486776054v0" msgID:"\021\000\000\000\000\000\000\000" msgGroup:"by-dev-dataNode-24-by-dev-rootcoord-dml_132_440061701486776054v0" timestamp:440061817125601281 > unflushedSegmentIds:440061701486776177 , the collection not loaded or leader is offline[NodeNotFound(0)])>, <Time:{'RPC start': '2023-03-13 10:56:17.850885', 'RPC error': '2023-03-13 10:56:17.857130'}> (decorators.py:108)
[2023-03-13T11:01:08.475Z] [2023-03-13 10:56:17 - ERROR - ci_test]: <MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:440061701486776054 channelName:"by-dev-rootcoord-dml_132_440061701486776054v0" seek_position:<channel_name:"by-dev-rootcoord-dml_132_440061701486776054v0" msgID:"\021\000\000\000\000\000\000\000" msgGroup:"by-dev-dataNode-24-by-dev-rootcoord-dml_132_440061701486776054v0" timestamp:440061817125601281 > unflushedSegmentIds:440061701486776177 , the collection not loaded or leader is offline[NodeNotFound(0)])> (test_action_second_deployment.py:114)
[2023-03-13T11:01:08.475Z] [2023-03-13 10:56:17 - INFO - ci_test]: collection deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000 has 0 replicas (test_action_second_deployment.py:117)
[2023-03-13T11:01:08.475Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------
[2023-03-13T11:01:08.475Z] =========================== short test summary info ============================
[2023-03-13T11:01:08.475Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-03-13T11:01:08.475Z] + where 2 = int('2')
[2023-03-13T11:01:08.475Z] ================== 1 failed, 49 passed in 1413.72s (0:23:33) ===================
Expected Behavior
this collection is loaded with 2 replicas. After the upgrade, the replicas should also be 2
Steps To Reproduce
No response
Milvus Log
milvus mode: cluster deploy task: upgrade old image tag: v2.2.3 new image tag: 2.2.0-20230310-b2ece6a5 failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_for_release_cron/detail/deploy_test_kafka_for_release_cron/522/pipeline log:
artifacts-kafka-cluster-upgrade-522-server-second-deployment-logs.tar.gz artifacts-kafka-cluster-upgrade-522-server-first-deployment-logs.tar.gz artifacts-kafka-cluster-upgrade-522-pytest-logs.tar.gz
Anything else?
No response
/assign @jiaoew1991 I guess the root cause was a load failure.
/unassign
[2023-03-14T11:03:12.524Z] <name>: deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000
[2023-03-14T11:03:12.524Z] <partitions>: [{"name": "_default", "collection_name": "deploy_test_index_type_HNSW_is_compacted_not...... (api_request.py:31)
[2023-03-14T11:03:12.524Z] [2023-03-14 11:00:39 - DEBUG - ci_test]: (api_request) : [wait_for_loading_complete] args: ['deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000', None, 20, 'default'], kwargs: {} (api_request.py:56)
[2023-03-14T11:03:12.524Z] [2023-03-14 11:00:39 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2023-03-14T11:03:12.524Z] [2023-03-14 11:00:39 - INFO - ci_test]: wait for deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000 loading complete cost 0.002306222915649414 (test_action_second_deployment.py:106)
[2023-03-14T11:03:12.524Z] [2023-03-14 11:00:39 - ERROR - pymilvus.decorators]: RPC error: [get_replicas], <MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:440084580325221241 channelName:"by-dev-rootcoord-dml_54_440084580325221241v0" seek_position:<channel_name:"by-dev-rootcoord-dml_54" msgID:"\337\013\000\000\000\000\000\000" msgGroup:"by-dev-dataNode-10-by-dev-rootcoord-dml_54_440084580325221241v0" timestamp:440084823232479233 > unflushedSegmentIds:440084580325422051 flushedSegmentIds:440084580325423835 dropped_segmentIds:440084580325221517 dropped_segmentIds:440084580325422061 dropped_segmentIds:440084580325421799 dropped_segmentIds:440084580325421959 dropped_segmentIds:440084580325421631 dropped_segmentIds:440084580325221285 , the collection not loaded or leader is offline[NodeNotFound(0)])>, <Time:{'RPC start': '2023-03-14 11:00:39.093974', 'RPC error': '2023-03-14 11:00:39.097337'}> (decorators.py:108)
[2023-03-14T11:03:12.524Z] [2023-03-14 11:00:39 - ERROR - ci_test]: <MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:440084580325221241 channelName:"by-dev-rootcoord-dml_54_440084580325221241v0" seek_position:<channel_name:"by-dev-rootcoord-dml_54" msgID:"\337\013\000\000\000\000\000\000" msgGroup:"by-dev-dataNode-10-by-dev-rootcoord-dml_54_440084580325221241v0" timestamp:440084823232479233 > unflushedSegmentIds:440084580325422051 flushedSegmentIds:440084580325423835 dropped_segmentIds:440084580325221517 dropped_segmentIds:440084580325422061 dropped_segmentIds:440084580325421799 dropped_segmentIds:440084580325421959 dropped_segmentIds:440084580325421631 dropped_segmentIds:440084580325221285 , the collection not loaded or leader is offline[NodeNotFound(0)])> (test_action_second_deployment.py:114)
[2023-03-14T11:03:12.524Z] [2023-03-14 11:00:39 - INFO - ci_test]: collection deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000 has 0 replicas (test_action_second_deployment.py:117)
[2023-03-14T11:03:12.524Z] _ TestActionSecondDeployment.test_check[deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] _
[2023-03-14T11:03:12.524Z] [gw1] linux -- Python 3.8.10 /usr/bin/python3.8
[2023-03-14T11:03:12.524Z]
[2023-03-14T11:03:12.524Z] self = <test_action_second_deployment.TestActionSecondDeployment object at 0x7ef98992afd0>
[2023-03-14T11:03:12.524Z] all_collection_name = 'deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000'
[2023-03-14T11:03:12.524Z] data_size = 3000
[2023-03-14T11:03:12.524Z]
[2023-03-14T11:03:12.524Z] @pytest.mark.tags(CaseLabel.L3)
[2023-03-14T11:03:12.525Z] def test_check(self, all_collection_name, data_size):
[2023-03-14T11:03:12.525Z] """
[2023-03-14T11:03:12.525Z] before reinstall: create collection
[2023-03-14T11:03:12.525Z] """
[2023-03-14T11:03:12.525Z] self._connect()
[2023-03-14T11:03:12.525Z] ms = MilvusSys()
[2023-03-14T11:03:12.525Z] name = all_collection_name
[2023-03-14T11:03:12.525Z] is_binary = False
[2023-03-14T11:03:12.525Z] if "BIN" in name:
[2023-03-14T11:03:12.525Z] is_binary = True
[2023-03-14T11:03:12.525Z] collection_w, _ = self.collection_wrap.init_collection(name=name)
[2023-03-14T11:03:12.525Z] self.collection_w = collection_w
[2023-03-14T11:03:12.525Z] schema = collection_w.schema
[2023-03-14T11:03:12.525Z] data_type = [field.dtype for field in schema.fields]
[2023-03-14T11:03:12.525Z] field_name = [field.name for field in schema.fields]
[2023-03-14T11:03:12.525Z] type_field_map = dict(zip(data_type, field_name))
[2023-03-14T11:03:12.525Z] if is_binary:
[2023-03-14T11:03:12.525Z] default_index_field = ct.default_binary_vec_field_name
[2023-03-14T11:03:12.525Z] vector_index_type = "BIN_IVF_FLAT"
[2023-03-14T11:03:12.525Z] else:
[2023-03-14T11:03:12.525Z] default_index_field = ct.default_float_vec_field_name
[2023-03-14T11:03:12.525Z] vector_index_type = "IVF_FLAT"
[2023-03-14T11:03:12.525Z]
[2023-03-14T11:03:12.525Z] binary_vector_index_types = [index.params["index_type"] for index in collection_w.indexes if
[2023-03-14T11:03:12.525Z] index.field_name == type_field_map.get(100, "")]
[2023-03-14T11:03:12.525Z] float_vector_index_types = [index.params["index_type"] for index in collection_w.indexes if
[2023-03-14T11:03:12.525Z] index.field_name == type_field_map.get(101, "")]
[2023-03-14T11:03:12.525Z] index_field_map = dict([(index.field_name, index.index_name) for index in collection_w.indexes])
[2023-03-14T11:03:12.525Z] index_names = [index.index_name for index in collection_w.indexes] # used to drop index
[2023-03-14T11:03:12.525Z] vector_index_types = binary_vector_index_types + float_vector_index_types
[2023-03-14T11:03:12.525Z] if len(vector_index_types) > 0:
[2023-03-14T11:03:12.525Z] vector_index_type = vector_index_types[0]
[2023-03-14T11:03:12.525Z] try:
[2023-03-14T11:03:12.525Z] t0 = time.time()
[2023-03-14T11:03:12.525Z] [get_env_variable] failed to get environment variables : 'CI_LOG_PATH', use default path : /tmp/ci_logs
[2023-03-14T11:03:12.525Z] self.utility_wrap.wait_for_loading_complete(name)
[2023-03-14T11:03:12.525Z] log.info(f"wait for {name} loading complete cost {time.time() - t0}")
[2023-03-14T11:03:12.525Z] except Exception as e:
[2023-03-14T11:03:12.525Z] log.error(e)
[2023-03-14T11:03:12.525Z] # get replicas loaded
[2023-03-14T11:03:12.525Z] try:
[2023-03-14T11:03:12.525Z] replicas = collection_w.get_replicas(enable_traceback=False)
[2023-03-14T11:03:12.525Z] replicas_loaded = len(replicas.groups)
[2023-03-14T11:03:12.525Z] except Exception as e:
[2023-03-14T11:03:12.525Z] log.error(e)
[2023-03-14T11:03:12.525Z] replicas_loaded = 0
[2023-03-14T11:03:12.525Z]
[2023-03-14T11:03:12.525Z] log.info(f"collection {name} has {replicas_loaded} replicas")
[2023-03-14T11:03:12.525Z] actual_replicas = re.search(r'replica_number_(.*?)_', name).group(1)
[2023-03-14T11:03:12.525Z] > assert replicas_loaded == int(actual_replicas)
[2023-03-14T11:03:12.525Z] E AssertionError: assert 0 == 2
[2023-03-14T11:03:12.525Z] E + where 2 = int('2')
[2023-03-14T11:03:12.525Z]
[2023-03-14T11:03:12.525Z] testcases/test_action_second_deployment.py:119: AssertionError
milvus mode: cluster deploy task: upgrade old image tag: v2.2.3 new image tag: 2.2.0-20230314-3aa28506 log: artifacts-kafka-cluster-upgrade-538-server-second-deployment-logs.tar.gz artifacts-kafka-cluster-upgrade-538-server-first-deployment-logs.tar.gz artifacts-kafka-cluster-upgrade-538-pytest-logs.tar.gz
We reassign all node to one replica and cause another one don't have node.
So we will failed to get replicas, because one replica has no node and shard leader.
related: https://github.com/milvus-io/milvus/issues/22782
/assign @weiliu1031 /unassign
it reproduces on 2.2.0-20230317-bbc21fe8
[2023-03-20T14:13:45.159Z] =================================== FAILURES ===================================
[2023-03-20T14:13:45.159Z] _ TestActionSecondDeployment.test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] _
[2023-03-20T14:13:45.159Z] [gw1] linux -- Python 3.8.10 /usr/bin/python3.8
[2023-03-20T14:13:45.159Z]
[2023-03-20T14:13:45.159Z] self = <test_action_second_deployment.TestActionSecondDeployment object at 0x7f5197766460>
[2023-03-20T14:13:45.159Z] all_collection_name = 'deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000'
[2023-03-20T14:13:45.159Z] data_size = 3000
[2023-03-20T14:13:45.159Z]
[2023-03-20T14:13:45.159Z] @pytest.mark.tags(CaseLabel.L3)
[2023-03-20T14:13:45.159Z] def test_check(self, all_collection_name, data_size):
[2023-03-20T14:13:45.159Z] """
[2023-03-20T14:13:45.159Z] before reinstall: create collection
[2023-03-20T14:13:45.159Z] """
[2023-03-20T14:13:45.159Z] self._connect()
[2023-03-20T14:13:45.159Z] ms = MilvusSys()
[2023-03-20T14:13:45.159Z] name = all_collection_name
[2023-03-20T14:13:45.159Z] is_binary = False
[2023-03-20T14:13:45.159Z] if "BIN" in name:
[2023-03-20T14:13:45.159Z] is_binary = True
[2023-03-20T14:13:45.159Z] collection_w, _ = self.collection_wrap.init_collection(name=name)
[2023-03-20T14:13:45.159Z] self.collection_w = collection_w
[2023-03-20T14:13:45.159Z] schema = collection_w.schema
[2023-03-20T14:13:45.159Z] data_type = [field.dtype for field in schema.fields]
[2023-03-20T14:13:45.159Z] field_name = [field.name for field in schema.fields]
[2023-03-20T14:13:45.159Z] type_field_map = dict(zip(data_type, field_name))
[2023-03-20T14:13:45.159Z] if is_binary:
[2023-03-20T14:13:45.159Z] default_index_field = ct.default_binary_vec_field_name
[2023-03-20T14:13:45.159Z] vector_index_type = "BIN_IVF_FLAT"
[2023-03-20T14:13:45.159Z] else:
[2023-03-20T14:13:45.159Z] default_index_field = ct.default_float_vec_field_name
[2023-03-20T14:13:45.159Z] vector_index_type = "IVF_FLAT"
[2023-03-20T14:13:45.159Z]
[2023-03-20T14:13:45.159Z] binary_vector_index_types = [index.params["index_type"] for index in collection_w.indexes if
[2023-03-20T14:13:45.159Z] index.field_name == type_field_map.get(100, "")]
[2023-03-20T14:13:45.159Z] float_vector_index_types = [index.params["index_type"] for index in collection_w.indexes if
[2023-03-20T14:13:45.159Z] index.field_name == type_field_map.get(101, "")]
[2023-03-20T14:13:45.159Z] index_field_map = dict([(index.field_name, index.index_name) for index in collection_w.indexes])
[2023-03-20T14:13:45.159Z] index_names = [index.index_name for index in collection_w.indexes] # used to drop index
[2023-03-20T14:13:45.159Z] vector_index_types = binary_vector_index_types + float_vector_index_types
[2023-03-20T14:13:45.159Z] if len(vector_index_types) > 0:
[2023-03-20T14:13:45.159Z] vector_index_type = vector_index_types[0]
[2023-03-20T14:13:45.159Z] try:
[2023-03-20T14:13:45.159Z] t0 = time.time()
[2023-03-20T14:13:45.159Z] self.utility_wrap.wait_for_loading_complete(name)
[2023-03-20T14:13:45.159Z] log.info(f"wait for {name} loading complete cost {time.time() - t0}")
[2023-03-20T14:13:45.159Z] except Exception as e:
[2023-03-20T14:13:45.159Z] log.error(e)
[2023-03-20T14:13:45.159Z] # get replicas loaded
[2023-03-20T14:13:45.159Z] try:
[2023-03-20T14:13:45.159Z] replicas = collection_w.get_replicas(enable_traceback=False)
[2023-03-20T14:13:45.159Z] replicas_loaded = len(replicas.groups)
[2023-03-20T14:13:45.159Z] except Exception as e:
[2023-03-20T14:13:45.159Z] log.error(e)
[2023-03-20T14:13:45.159Z] replicas_loaded = 0
[2023-03-20T14:13:45.159Z]
[2023-03-20T14:13:45.159Z] log.info(f"collection {name} has {replicas_loaded} replicas")
[2023-03-20T14:13:45.159Z] actual_replicas = re.search(r'replica_number_(.*?)_', name).group(1)
[2023-03-20T14:13:45.159Z] > assert replicas_loaded == int(actual_replicas)
[2023-03-20T14:13:45.159Z] E AssertionError: assert 0 == 2
[2023-03-20T14:13:45.159Z] E + where 2 = int('2')
[2023-03-20T14:13:45.159Z]
[2023-03-20T14:13:45.159Z] testcases/test_action_second_deployment.py:119: AssertionError
[2023-03-20T14:13:45.159Z] ------------------------------ Captured log setup ------------------------------
[2023-03-20T14:13:45.159Z] [2023-03-20 13:53:26 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:39)
[2023-03-20T14:13:45.159Z] [2023-03-20 13:53:26 - INFO - ci_test]: [setup_method] Start setup test case test_check. (client_base.py:40)
[2023-03-20T14:13:45.159Z] ------------------------------ Captured log call -------------------------------
[2023-03-20T14:13:45.159Z] [2023-03-20 13:53:26 - DEBUG - ci_test]: (api_request) : [Connections.connect] args: ['default'], kwargs: {'host': '10.101.185.165', 'port': 19530} (api_request.py:56)
[2023-03-20T14:13:45.159Z] [2023-03-20 13:53:26 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2023-03-20T14:13:45.159Z] [2023-03-20 13:53:26 - DEBUG - ci_test]: (api_request) : [Collection] args: ['deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000', None, 'default', 2], kwargs: {'consistency_level': 'Strong'} (api_request.py:56)
[2023-03-20T14:13:45.159Z] [2023-03-20 13:53:26 - DEBUG - ci_test]: (api_response) : <Collection>:
[2023-03-20T14:13:45.159Z] -------------
[2023-03-20T14:13:45.159Z] <name>: deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000
[2023-03-20T14:13:45.159Z] <partitions>: [{"name": "_default", "collection_name": "deploy_test_index_type_BIN_IVF_FLAT_...... (api_request.py:31)
[2023-03-20T14:13:45.159Z] [2023-03-20 13:53:26 - DEBUG - ci_test]: (api_request) : [wait_for_loading_complete] args: ['deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000', None, 20, 'default'], kwargs: {} (api_request.py:56)
[2023-03-20T14:13:45.159Z] [2023-03-20 13:53:26 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2023-03-20T14:13:45.159Z] [2023-03-20 13:53:26 - INFO - ci_test]: wait for deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000 loading complete cost 0.0013630390167236328 (test_action_second_deployment.py:106)
[2023-03-20T14:13:45.159Z] [2023-03-20 13:53:26 - ERROR - pymilvus.decorators]: RPC error: [get_replicas], <MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:440222522875749049 channelName:"by-dev-rootcoord-dml_131_440222522875749049v1" seek_position:<channel_name:"by-dev-rootcoord-dml_131_440222522875749049v1" msgID:"\326\000\000\000\000\000\000\000" msgGroup:"by-dev-dataNode-6-by-dev-rootcoord-dml_131_440222522875749049v1" timestamp:440222656096894978 > flushedSegmentIds:440222522875750116 dropped_segmentIds:440222522875750009 dropped_segmentIds:440222522875749617 dropped_segmentIds:440222522875749741 dropped_segmentIds:440222522875749083 dropped_segmentIds:440222522875749986 dropped_segmentIds:440222522875749540 , the collection not loaded or leader is offline[NodeNotFound(0)])>, <Time:{'RPC start': '2023-03-20 13:53:26.895052', 'RPC error': '2023-03-20 13:53:26.897301'}> (decorators.py:108)
[2023-03-20T14:13:45.159Z] [2023-03-20 13:53:26 - ERROR - ci_test]: <MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:440222522875749049 channelName:"by-dev-rootcoord-dml_131_440222522875749049v1" seek_position:<channel_name:"by-dev-rootcoord-dml_131_440222522875749049v1" msgID:"\326\000\000\000\000\000\000\000" msgGroup:"by-dev-dataNode-6-by-dev-rootcoord-dml_131_440222522875749049v1" timestamp:440222656096894978 > flushedSegmentIds:440222522875750116 dropped_segmentIds:440222522875750009 dropped_segmentIds:440222522875749617 dropped_segmentIds:440222522875749741 dropped_segmentIds:440222522875749083 dropped_segmentIds:440222522875749986 dropped_segmentIds:440222522875749540 , the collection not loaded or leader is offline[NodeNotFound(0)])> (test_action_second_deployment.py:114)
[2023-03-20T14:13:45.159Z] [2023-03-20 13:53:26 - INFO - ci_test]: collection deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000 has 0 replicas (test_action_second_deployment.py:117)
[2023-03-20T14:13:45.159Z] _ TestActionSecondDeployment.test_check[deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] _
[2023-03-20T14:13:45.159Z] [gw1] linux -- Python 3.8.10 /usr/bin/python3.8
[2023-03-20T14:13:45.159Z]
[2023-03-20T14:13:45.159Z] self = <test_action_second_deployment.TestActionSecondDeployment object at 0x7f5197766c40>
[2023-03-20T14:13:45.159Z] all_collection_name = 'deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000'
[2023-03-20T14:13:45.159Z] data_size = 3000
[2023-03-20T14:13:45.159Z]
[2023-03-20T14:13:45.159Z] @pytest.mark.tags(CaseLabel.L3)
[2023-03-20T14:13:45.159Z] def test_check(self, all_collection_name, data_size):
[2023-03-20T14:13:45.159Z] """
[2023-03-20T14:13:45.159Z] before reinstall: create collection
[2023-03-20T14:13:45.159Z] """
[2023-03-20T14:13:45.159Z] self._connect()
[2023-03-20T14:13:45.159Z] ms = MilvusSys()
[2023-03-20T14:13:45.159Z] name = all_collection_name
[2023-03-20T14:13:45.159Z] is_binary = False
[2023-03-20T14:13:45.159Z] if "BIN" in name:
[2023-03-20T14:13:45.159Z] is_binary = True
[2023-03-20T14:13:45.160Z] [get_env_variable] failed to get environment variables : 'CI_LOG_PATH', use default path : /tmp/ci_logs
[2023-03-20T14:13:45.160Z] collection_w, _ = self.collection_wrap.init_collection(name=name)
[2023-03-20T14:13:45.160Z] self.collection_w = collection_w
[2023-03-20T14:13:45.160Z] schema = collection_w.schema
[2023-03-20T14:13:45.160Z] data_type = [field.dtype for field in schema.fields]
[2023-03-20T14:13:45.160Z] field_name = [field.name for field in schema.fields]
[2023-03-20T14:13:45.160Z] type_field_map = dict(zip(data_type, field_name))
[2023-03-20T14:13:45.160Z] if is_binary:
[2023-03-20T14:13:45.160Z] default_index_field = ct.default_binary_vec_field_name
[2023-03-20T14:13:45.160Z] vector_index_type = "BIN_IVF_FLAT"
[2023-03-20T14:13:45.160Z] else:
[2023-03-20T14:13:45.160Z] default_index_field = ct.default_float_vec_field_name
[2023-03-20T14:13:45.160Z] vector_index_type = "IVF_FLAT"
[2023-03-20T14:13:45.160Z]
[2023-03-20T14:13:45.160Z] binary_vector_index_types = [index.params["index_type"] for index in collection_w.indexes if
[2023-03-20T14:13:45.160Z] index.field_name == type_field_map.get(100, "")]
[2023-03-20T14:13:45.160Z] float_vector_index_types = [index.params["index_type"] for index in collection_w.indexes if
[2023-03-20T14:13:45.160Z] index.field_name == type_field_map.get(101, "")]
[2023-03-20T14:13:45.160Z] index_field_map = dict([(index.field_name, index.index_name) for index in collection_w.indexes])
[2023-03-20T14:13:45.160Z] index_names = [index.index_name for index in collection_w.indexes] # used to drop index
[2023-03-20T14:13:45.160Z] vector_index_types = binary_vector_index_types + float_vector_index_types
[2023-03-20T14:13:45.160Z] if len(vector_index_types) > 0:
[2023-03-20T14:13:45.160Z] vector_index_type = vector_index_types[0]
[2023-03-20T14:13:45.160Z] try:
[2023-03-20T14:13:45.160Z] t0 = time.time()
[2023-03-20T14:13:45.160Z] self.utility_wrap.wait_for_loading_complete(name)
[2023-03-20T14:13:45.160Z] log.info(f"wait for {name} loading complete cost {time.time() - t0}")
[2023-03-20T14:13:45.160Z] except Exception as e:
[2023-03-20T14:13:45.160Z] log.error(e)
[2023-03-20T14:13:45.160Z] # get replicas loaded
[2023-03-20T14:13:45.160Z] try:
[2023-03-20T14:13:45.160Z] replicas = collection_w.get_replicas(enable_traceback=False)
[2023-03-20T14:13:45.160Z] replicas_loaded = len(replicas.groups)
[2023-03-20T14:13:45.160Z] except Exception as e:
[2023-03-20T14:13:45.160Z] log.error(e)
[2023-03-20T14:13:45.160Z] replicas_loaded = 0
[2023-03-20T14:13:45.160Z]
[2023-03-20T14:13:45.160Z] log.info(f"collection {name} has {replicas_loaded} replicas")
[2023-03-20T14:13:45.160Z] actual_replicas = re.search(r'replica_number_(.*?)_', name).group(1)
[2023-03-20T14:13:45.160Z] > assert replicas_loaded == int(actual_replicas)
[2023-03-20T14:13:45.160Z] E AssertionError: assert 0 == 2
[2023-03-20T14:13:45.160Z] E + where 2 = int('2')
[2023-03-20T14:13:45.160Z]
[2023-03-20T14:13:45.160Z] testcases/test_action_second_deployment.py:119: AssertionError
[2023-03-20T14:13:45.160Z] ------------------------------ Captured log setup ------------------------------
[2023-03-20T14:13:45.160Z] [2023-03-20 14:08:40 - INFO - ci_test]: *********************************** setup *********************************** (client_base.py:39)
[2023-03-20T14:13:45.160Z] [2023-03-20 14:08:40 - INFO - ci_test]: [setup_method] Start setup test case test_check. (client_base.py:40)
[2023-03-20T14:13:45.160Z] ------------------------------ Captured log call -------------------------------
[2023-03-20T14:13:45.160Z] [2023-03-20 14:08:40 - DEBUG - ci_test]: (api_request) : [Connections.connect] args: ['default'], kwargs: {'host': '10.101.185.165', 'port': 19530} (api_request.py:56)
[2023-03-20T14:13:45.160Z] [2023-03-20 14:08:40 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2023-03-20T14:13:45.160Z] [2023-03-20 14:08:40 - DEBUG - ci_test]: (api_request) : [Collection] args: ['deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000', None, 'default', 2], kwargs: {'consistency_level': 'Strong'} (api_request.py:56)
[2023-03-20T14:13:45.160Z] [2023-03-20 14:08:40 - DEBUG - ci_test]: (api_response) : <Collection>:
[2023-03-20T14:13:45.160Z] -------------
[2023-03-20T14:13:45.160Z] <name>: deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000
[2023-03-20T14:13:45.160Z] <partitions>: [{"name": "_default", "collection_name": "deploy_test_index_type_HNSW_is_compacted_is_...... (api_request.py:31)
[2023-03-20T14:13:45.160Z] [2023-03-20 14:08:40 - DEBUG - ci_test]: (api_request) : [wait_for_loading_complete] args: ['deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000', None, 20, 'default'], kwargs: {} (api_request.py:56)
[2023-03-20T14:13:45.160Z] [2023-03-20 14:08:40 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2023-03-20T14:13:45.160Z] [2023-03-20 14:08:40 - INFO - ci_test]: wait for deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000 loading complete cost 0.0014719963073730469 (test_action_second_deployment.py:106)
[2023-03-20T14:13:45.160Z] [2023-03-20 14:08:40 - ERROR - pymilvus.decorators]: RPC error: [get_replicas], <MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:440222522875142971 channelName:"by-dev-rootcoord-dml_54_440222522875142971v0" seek_position:<channel_name:"by-dev-rootcoord-dml_54_440222522875142971v0" msgID:"9\n\000\000\000\000\000\000" msgGroup:"by-dev-dataNode-24-by-dev-rootcoord-dml_54_440222522875142971v0" timestamp:440222776608686082 > unflushedSegmentIds:440222522875343687 flushedSegmentIds:440222522875344951 flushedSegmentIds:440222522875343680 dropped_segmentIds:440222522875343330 dropped_segmentIds:440222522875343225 dropped_segmentIds:440222522875343569 dropped_segmentIds:440222522875142985 dropped_segmentIds:440222522875343444 , the collection not loaded or leader is offline[NodeNotFound(0)])>, <Time:{'RPC start': '2023-03-20 14:08:40.719219', 'RPC error': '2023-03-20 14:08:40.721647'}> (decorators.py:108)
[2023-03-20T14:13:45.160Z] [2023-03-20 14:08:40 - ERROR - ci_test]: <MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:440222522875142971 channelName:"by-dev-rootcoord-dml_54_440222522875142971v0" seek_position:<channel_name:"by-dev-rootcoord-dml_54_440222522875142971v0" msgID:"9\n\000\000\000\000\000\000" msgGroup:"by-dev-dataNode-24-by-dev-rootcoord-dml_54_440222522875142971v0" timestamp:440222776608686082 > unflushedSegmentIds:440222522875343687 flushedSegmentIds:440222522875344951 flushedSegmentIds:440222522875343680 dropped_segmentIds:440222522875343330 dropped_segmentIds:440222522875343225 dropped_segmentIds:440222522875343569 dropped_segmentIds:440222522875142985 dropped_segmentIds:440222522875343444 , the collection not loaded or leader is offline[NodeNotFound(0)])> (test_action_second_deployment.py:114)
[2023-03-20T14:13:45.160Z] [2023-03-20 14:08:40 - INFO - ci_test]: collection deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000 has 0 replicas (test_action_second_deployment.py:117)
[2023-03-20T14:13:45.160Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------
[2023-03-20T14:13:45.160Z] =========================== short test summary info ============================
[2023-03-20T14:13:45.160Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-03-20T14:13:45.160Z] + where 2 = int('2')
[2023-03-20T14:13:45.160Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-03-20T14:13:45.160Z] + where 2 = int('2')
[2023-03-20T14:13:45.160Z] ================== 2 failed, 48 passed in 4442.39s (1:14:02) ===================
v2.2.3 -->2.2.0-20230320-61692278 failed job:https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_for_release_cron/detail/deploy_test_kafka_for_release_cron/606/pipeline log: artifacts-kafka-cluster-upgrade-606-server-second-deployment-logs.tar.gz artifacts-kafka-cluster-upgrade-606-server-first-deployment-logs.tar.gz artifacts-kafka-cluster-upgrade-606-pytest-logs.tar.gz
the root cause is load failed. and came from two problems:
- all nodes has been assign to one of replicas during rolling upgrade. related to #22782, and WIP
- cause pass pChannel name to vChannel, which cause failed to consume from mq. already fixed by #22721
@zhuwenxing please verify the second part first.
It still reproduced in 2.2.0-20230324-a59dc9cb @weiliu1031 PTAL failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_for_release_cron/detail/deploy_test_for_release_cron/68/pipeline
[2023-03-26T11:57:52.001Z] <name>: deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000
[2023-03-26T11:57:52.001Z] <partitions>: [{"name": "_default", "collection_name": "deploy_test_index_type_HNSW_is_compacted_no...... (api_request.py:31)
[2023-03-26T11:57:52.001Z] [2023-03-26 11:55:27 - DEBUG - ci_test]: (api_request) : [wait_for_loading_complete] args: ['deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000', None, 20, 'default'], kwargs: {} (api_request.py:56)
[2023-03-26T11:57:52.001Z] [2023-03-26 11:55:27 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2023-03-26T11:57:52.001Z] [2023-03-26 11:55:27 - INFO - ci_test]: wait for deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000 loading complete cost 0.002002239227294922 (test_action_second_deployment.py:106)
[2023-03-26T11:57:52.001Z] [2023-03-26 11:55:27 - ERROR - pymilvus.decorators]: RPC error: [get_replicas], <MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:440357182843681219 channelName:"by-dev-rootcoord-dml_86_440357182843681219v0" seek_position:<channel_name:"by-dev-rootcoord-dml_86_440357182843681219v0" msgID:"\010V\020\213\001\030\000 \000" msgGroup:"by-dev-dataNode-13-by-dev-rootcoord-dml_86_440357182843681219v0" timestamp:440357282411446273 > unflushedSegmentIds:440357182843681806 flushedSegmentIds:440357182843882764 dropped_segmentIds:440357182843681332 dropped_segmentIds:440357182843681406 dropped_segmentIds:440357182843681679 dropped_segmentIds:440357182843681669 dropped_segmentIds:440357182843681230 dropped_segmentIds:440357182843681623 , the collection not loaded or leader is offline[NodeNotFound(0)])>, <Time:{'RPC start': '2023-03-26 11:55:27.284975', 'RPC error': '2023-03-26 11:55:27.287056'}> (decorators.py:108)
[2023-03-26T11:57:52.001Z] [2023-03-26 11:55:27 - ERROR - ci_test]: <MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:440357182843681219 channelName:"by-dev-rootcoord-dml_86_440357182843681219v0" seek_position:<channel_name:"by-dev-rootcoord-dml_86_440357182843681219v0" msgID:"\010V\020\213\001\030\000 \000" msgGroup:"by-dev-dataNode-13-by-dev-rootcoord-dml_86_440357182843681219v0" timestamp:440357282411446273 > unflushedSegmentIds:440357182843681806 flushedSegmentIds:440357182843882764 dropped_segmentIds:440357182843681332 dropped_segmentIds:440357182843681406 dropped_segmentIds:440357182843681679 dropped_segmentIds:440357182843681669 dropped_segmentIds:440357182843681230 dropped_segmentIds:440357182843681623 , the collection not loaded or leader is offline[NodeNotFound(0)])> (test_action_second_deployment.py:114)
[2023-03-26T11:57:52.001Z] [2023-03-26 11:55:27 - INFO - ci_test]: collection deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000 has 0 replicas (test_action_second_deployment.py:117)
[2023-03-26T11:57:52.001Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------
[2023-03-26T11:57:52.001Z] =========================== short test summary info ============================
[2023-03-26T11:57:52.001Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-03-26T11:57:52.001Z] + where 2 = int('2')
[2023-03-26T11:57:52.001Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-03-26T11:57:52.001Z] + where 2 = int('2')
[2023-03-26T11:57:52.001Z] =================== 2 failed, 48 passed in 955.31s (0:15:55) ===================
https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_for_release_cron/detail/deploy_test_for_release_cron/68/pipeline
base on the log, the second problems mentioned above already fixed, and the first problem will need more design work and WIP
It also reproduced when v2.2.5 --> master-latest failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_cron/detail/deploy_test_kafka_cron/688/pipeline/
[2023-04-25T11:12:49.499Z] ------------------------------ Captured log call -------------------------------
[2023-04-25T11:12:49.499Z] [get_env_variable] failed to get environment variables : 'CI_LOG_PATH', use default path : /tmp/ci_logs
[2023-04-25T11:12:49.499Z] [2023-04-25 10:56:10 - DEBUG - ci_test]: (api_request) : [Connections.connect] args: ['default'], kwargs: {'host': '10.101.49.137', 'port': 19530} (api_request.py:56)
[2023-04-25T11:12:49.499Z] [2023-04-25 10:56:10 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2023-04-25T11:12:49.499Z] [2023-04-25 10:56:10 - DEBUG - ci_test]: (api_request) : [Collection] args: ['deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000', None, 'default', 2], kwargs: {'consistency_level': 'Strong'} (api_request.py:56)
[2023-04-25T11:12:49.499Z] [2023-04-25 10:56:10 - DEBUG - ci_test]: (api_response) : <Collection>:
[2023-04-25T11:12:49.499Z] -------------
[2023-04-25T11:12:49.499Z] <name>: deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000
[2023-04-25T11:12:49.499Z] <partitions>: [{"name": "_default", "collection_name": "deploy_test_index_type_BIN_...... (api_request.py:31)
[2023-04-25T11:12:49.499Z] [2023-04-25 10:56:10 - DEBUG - ci_test]: (api_request) : [wait_for_loading_complete] args: ['deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000', None, 20, 'default'], kwargs: {} (api_request.py:56)
[2023-04-25T11:12:49.499Z] [2023-04-25 10:56:10 - DEBUG - ci_test]: (api_response) : None (api_request.py:31)
[2023-04-25T11:12:49.499Z] [2023-04-25 10:56:10 - INFO - ci_test]: wait for deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000 loading complete cost 0.0013170242309570312 (test_action_second_deployment.py:106)
[2023-04-25T11:12:49.499Z] [2023-04-25 10:56:10 - ERROR - pymilvus.decorators]: RPC error: [get_replicas], <MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:441035718610425131 channelName:"by-dev-rootcoord-dml_98_441035718610425131v0" seek_position:<channel_name:"by-dev-rootcoord-dml_98_441035718610425131v0" msgID:"1\000\000\000\000\000\000\000" msgGroup:"by-dev-dataNode-10-by-dev-rootcoord-dml_98_441035718610425131v0" timestamp:441035880914747392 > unflushedSegmentIds:441035718610425358 , the collection not loaded or leader is offline[NodeNotFound(0)])>, <Time:{'RPC start': '2023-04-25 10:56:10.696751', 'RPC error': '2023-04-25 10:56:10.699063'}> (decorators.py:108)
[2023-04-25T11:12:49.499Z] [2023-04-25 10:56:10 - ERROR - ci_test]: <MilvusException: (code=15, message=failed to get replica info, err=failed to get shard leader for shard collectionID:441035718610425131 channelName:"by-dev-rootcoord-dml_98_441035718610425131v0" seek_position:<channel_name:"by-dev-rootcoord-dml_98_441035718610425131v0" msgID:"1\000\000\000\000\000\000\000" msgGroup:"by-dev-dataNode-10-by-dev-rootcoord-dml_98_441035718610425131v0" timestamp:441035880914747392 > unflushedSegmentIds:441035718610425358 , the collection not loaded or leader is offline[NodeNotFound(0)])> (test_action_second_deployment.py:114)
[2023-04-25T11:12:49.499Z] [2023-04-25 10:56:10 - INFO - ci_test]: collection deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000 has 0 replicas (test_action_second_deployment.py:117)
[2023-04-25T11:12:49.499Z] ------------- generated html file: file:///tmp/ci_logs/report.html -------------
[2023-04-25T11:12:49.499Z] =========================== short test summary info ============================
[2023-04-25T11:12:49.499Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-04-25T11:12:49.499Z] + where 2 = int('2')
[2023-04-25T11:12:49.499Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-04-25T11:12:49.499Z] + where 2 = int('2')
[2023-04-25T11:12:49.499Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_only_growing_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-04-25T11:12:49.499Z] + where 2 = int('2')
[2023-04-25T11:12:49.499Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-04-25T11:12:49.499Z] + where 2 = int('2')
[2023-04-25T11:12:49.499Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_HNSW_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-04-25T11:12:49.499Z] + where 2 = int('2')
[2023-04-25T11:12:49.499Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_only_growing_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-04-25T11:12:49.499Z] + where 2 = int('2')
[2023-04-25T11:12:49.499Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-04-25T11:12:49.499Z] + where 2 = int('2')
[2023-04-25T11:12:49.499Z] ================== 7 failed, 43 passed in 1680.76s (0:28:00) ===================
log: artifacts-kafka-cluster-upgrade-688-server-second-deployment-logs.tar.gz
artifacts-kafka-cluster-upgrade-688-server-first-deployment-logs.tar.gz
should be fixed in #23415 #23626, please verify this @zhuwenxing
/assign @zhuwenxing
It is still reproduced in 2.2.0
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_kafka_for_release_cron/detail/deploy_test_kafka_for_release_cron/1115/pipeline
[2023-06-26T08:15:04.286Z] =========================== short test summary info ============================
[2023-06-26T08:15:04.286Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_HNSW_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-06-26T08:15:04.286Z] + where 2 = int('2')
[2023-06-26T08:15:04.286Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_only_growing_is_string_indexed_not_string_indexed_replica_number_2_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 2
[2023-06-26T08:15:04.286Z] + where 2 = int('2')
[2023-06-26T08:15:04.286Z] =================== 2 failed, 48 passed in 404.73s (0:06:44) ===================
log: artifacts-kafka-cluster-upgrade-1115-server-second-deployment-logs.tar.gz artifacts-kafka-cluster-upgrade-1115-server-first-deployment-logs.tar.gz artifacts-kafka-cluster-upgrade-1115-pytest-logs.tar.gz
/assign @weiliu1031
It is reproduced when using the helm upgrade and will be fixed in 2.3.x.
So this will be considered a known issue.
The version has bumped to 2.3.1, so I think this issue should be solved in the next version. @weiliu1031
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
@zhuwenxing any updates?
not reproduced in 2.4