milvus
milvus copied to clipboard
[Bug]: Load failed with error `deny to load, insufficient memory, please allocate more resources` after upgrading from v2.2.5 to master-20230506-ad75afdc when no limit is set to memory
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:v2.2.5 --> master-20230506-ad75afdc
- Deployment mode(standalone or cluster):standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
[2023-05-07T10:20:14.971Z] =========================== short test summary info ============================
[2023-05-07T10:20:14.971Z] [get_env_variable] failed to get environment variables : 'CI_LOG_PATH', use default path : /tmp/ci_logs
[2023-05-07T10:20:14.971Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_1_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 1
[2023-05-07T10:20:14.971Z] + where 1 = int('1')
[2023-05-07T10:20:14.971Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_1_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 1
[2023-05-07T10:20:14.971Z] + where 1 = int('1')
[2023-05-07T10:20:14.971Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_1_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 1
[2023-05-07T10:20:14.971Z] + where 1 = int('1')
[2023-05-07T10:20:14.971Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000)>
[2023-05-07T10:20:14.971Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000)>
[2023-05-07T10:20:14.971Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000)>
[2023-05-07T10:20:14.971Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_1_is_deleted_is_deleted_data_size_3000] - AssertionError: assert 0 == 1
[2023-05-07T10:20:14.971Z] + where 1 = int('1')
[2023-05-07T10:20:14.971Z] FAILED testcases/test_action_second_deployment.py::TestActionSecondDeployment::test_check[deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000] - pymilvus.exceptions.MilvusException: <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000)>
[2023-05-07T10:20:14.971Z] ================== 8 failed, 26 passed in 1074.22s (0:17:54) ===================
[2023-05-07T10:20:14.970Z] [2023-05-07 10:09:23 - DEBUG - ci_test]: (api_request) : [wait_for_loading_complete] args: ['deploy_test_index_type_BIN_IVF_FLAT_is_compacted_is_compacted_segment_status_all_is_string_indexed_is_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000', None, 20, 'default'], kwargs: {} (api_request.py:56)
[2023-05-07T10:20:14.970Z] [2023-05-07 10:09:23 - ERROR - pymilvus.decorators]: RPC error: [get_loading_progress], <MilvusException: (code=1, message=collection 441306758232705148 has not been loaded to memory or load failed)>, <Time:{'RPC start': '2023-05-07 10:09:23.629464', 'RPC error': '2023-05-07 10:09:23.630401'}> (decorators.py:108)
[2023-05-07T10:20:14.970Z] [2023-05-07 10:09:23 - ERROR - pymilvus.decorators]: RPC error: [wait_for_loading_collection], <MilvusException: (code=1, message=collection 441306758232705148 has not been loaded to memory or load failed)>, <Time:{'RPC start': '2023-05-07 10:09:23.629449', 'RPC error': '2023-05-07 10:09:23.630503'}> (decorators.py:108)
[2023-05-07T10:20:14.970Z] [2023-05-07 10:09:23 - ERROR - ci_test]: Traceback (most recent call last):
[2023-05-07T10:20:14.970Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 26, in inner_wrapper
[2023-05-07T10:20:14.970Z] res = func(*args, **_kwargs)
[2023-05-07T10:20:14.970Z] File "/home/jenkins/agent/workspace/tests/python_client/utils/api_request.py", line 57, in api_request
[2023-05-07T10:20:14.970Z] return func(*arg, **kwargs)
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/utility.py", line 277, in wait_for_loading_complete
[2023-05-07T10:20:14.970Z] return _get_connection(using).wait_for_loading_collection(collection_name, timeout=timeout)
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
[2023-05-07T10:20:14.970Z] raise e
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
[2023-05-07T10:20:14.970Z] return func(*args, **kwargs)
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
[2023-05-07T10:20:14.970Z] ret = func(self, *args, **kwargs)
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
[2023-05-07T10:20:14.970Z] raise e
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
[2023-05-07T10:20:14.970Z] return func(self, *args, **kwargs)
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 732, in wait_for_loading_collection
[2023-05-07T10:20:14.970Z] progress = self.get_loading_progress(collection_name, timeout=timeout)
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
[2023-05-07T10:20:14.970Z] raise e
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 105, in handler
[2023-05-07T10:20:14.970Z] return func(*args, **kwargs)
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 136, in handler
[2023-05-07T10:20:14.970Z] ret = func(self, *args, **kwargs)
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 85, in handler
[2023-05-07T10:20:14.970Z] raise e
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 50, in handler
[2023-05-07T10:20:14.970Z] return func(self, *args, **kwargs)
[2023-05-07T10:20:14.970Z] File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 799, in get_loading_progress
[2023-05-07T10:20:14.970Z] raise MilvusException(response.status.error_code, response.status.reason)
[2023-05-07T10:20:14.970Z] pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=collection 441306758232705148 has not been loaded to memory or load failed)>
Expected Behavior
No response
Steps To Reproduce
Milvus Log
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/deploy_test_cron/detail/deploy_test_cron/738/pipeline
log: artifacts-rocksmq-standalone-upgrade-738-server-logs.tar.gz
artifacts-rocksmq-standalone-upgrade-738-pytest-logs.tar.gz
Anything else?
No response
/assign @jiaoew1991 /unassign
[2023-05-07T10:20:14.965Z] self = <test_action_second_deployment.TestActionSecondDeployment object at 0x7f35147c98e0>
[2023-05-07T10:20:14.965Z] all_collection_name = 'deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000'
[2023-05-07T10:20:14.965Z] data_size = 3000
[2023-05-07T10:20:14.965Z]
[2023-05-07T10:20:14.965Z] @pytest.mark.tags(CaseLabel.L3)
[2023-05-07T10:20:14.965Z] def test_check(self, all_collection_name, data_size):
[2023-05-07T10:20:14.965Z] """
[2023-05-07T10:20:14.965Z] before reinstall: create collection
[2023-05-07T10:20:14.965Z] """
[2023-05-07T10:20:14.965Z] self._connect()
[2023-05-07T10:20:14.965Z] ms = MilvusSys()
[2023-05-07T10:20:14.965Z] name = all_collection_name
[2023-05-07T10:20:14.965Z] is_binary = False
[2023-05-07T10:20:14.965Z] if "BIN" in name:
[2023-05-07T10:20:14.965Z] is_binary = True
[2023-05-07T10:20:14.966Z] collection_w, _ = self.collection_wrap.init_collection(name=name)
[2023-05-07T10:20:14.966Z] self.collection_w = collection_w
[2023-05-07T10:20:14.966Z] schema = collection_w.schema
[2023-05-07T10:20:14.966Z] data_type = [field.dtype for field in schema.fields]
[2023-05-07T10:20:14.966Z] field_name = [field.name for field in schema.fields]
[2023-05-07T10:20:14.966Z] type_field_map = dict(zip(data_type, field_name))
[2023-05-07T10:20:14.966Z] if is_binary:
[2023-05-07T10:20:14.966Z] default_index_field = ct.default_binary_vec_field_name
[2023-05-07T10:20:14.966Z] vector_index_type = "BIN_IVF_FLAT"
[2023-05-07T10:20:14.966Z] else:
[2023-05-07T10:20:14.966Z] default_index_field = ct.default_float_vec_field_name
[2023-05-07T10:20:14.966Z] vector_index_type = "IVF_FLAT"
[2023-05-07T10:20:14.966Z]
[2023-05-07T10:20:14.966Z] binary_vector_index_types = [index.params["index_type"] for index in collection_w.indexes if
[2023-05-07T10:20:14.966Z] index.field_name == type_field_map.get(100, "")]
[2023-05-07T10:20:14.966Z] float_vector_index_types = [index.params["index_type"] for index in collection_w.indexes if
[2023-05-07T10:20:14.966Z] index.field_name == type_field_map.get(101, "")]
[2023-05-07T10:20:14.966Z] index_field_map = dict([(index.field_name, index.index_name) for index in collection_w.indexes])
[2023-05-07T10:20:14.966Z] index_names = [index.index_name for index in collection_w.indexes] # used to drop index
[2023-05-07T10:20:14.966Z] vector_index_types = binary_vector_index_types + float_vector_index_types
[2023-05-07T10:20:14.966Z] if len(vector_index_types) > 0:
[2023-05-07T10:20:14.966Z] vector_index_type = vector_index_types[0]
[2023-05-07T10:20:14.966Z] try:
[2023-05-07T10:20:14.966Z] t0 = time.time()
[2023-05-07T10:20:14.966Z] self.utility_wrap.wait_for_loading_complete(name)
[2023-05-07T10:20:14.966Z] log.info(f"wait for {name} loading complete cost {time.time() - t0}")
[2023-05-07T10:20:14.966Z] except Exception as e:
[2023-05-07T10:20:14.966Z] log.error(e)
[2023-05-07T10:20:14.966Z] # get replicas loaded
[2023-05-07T10:20:14.966Z] try:
[2023-05-07T10:20:14.966Z] replicas = collection_w.get_replicas(enable_traceback=False)
[2023-05-07T10:20:14.966Z] replicas_loaded = len(replicas.groups)
[2023-05-07T10:20:14.966Z] except Exception as e:
[2023-05-07T10:20:14.966Z] log.error(e)
[2023-05-07T10:20:14.966Z] replicas_loaded = 0
[2023-05-07T10:20:14.966Z]
[2023-05-07T10:20:14.966Z] log.info(f"collection {name} has {replicas_loaded} replicas")
[2023-05-07T10:20:14.966Z] actual_replicas = re.search(r'replica_number_(.*?)_', name).group(1)
[2023-05-07T10:20:14.966Z] assert replicas_loaded == int(actual_replicas)
[2023-05-07T10:20:14.966Z] # params for search and query
[2023-05-07T10:20:14.966Z] if is_binary:
[2023-05-07T10:20:14.966Z] _, vectors_to_search = cf.gen_binary_vectors(
[2023-05-07T10:20:14.966Z] default_nb, default_dim)
[2023-05-07T10:20:14.966Z] default_search_field = ct.default_binary_vec_field_name
[2023-05-07T10:20:14.966Z] else:
[2023-05-07T10:20:14.966Z] vectors_to_search = cf.gen_vectors(default_nb, default_dim)
[2023-05-07T10:20:14.966Z] default_search_field = ct.default_float_vec_field_name
[2023-05-07T10:20:14.966Z] search_params = gen_search_param(vector_index_type)[0]
[2023-05-07T10:20:14.966Z]
[2023-05-07T10:20:14.966Z] # load if not loaded
[2023-05-07T10:20:14.966Z] if replicas_loaded == 0:
[2023-05-07T10:20:14.966Z] # create index for vector if not exist before load
[2023-05-07T10:20:14.966Z] is_vector_indexed = False
[2023-05-07T10:20:14.966Z] index_infos = [index.to_dict() for index in collection_w.indexes]
[2023-05-07T10:20:14.966Z] for index_info in index_infos:
[2023-05-07T10:20:14.966Z] if "metric_type" in index_info.keys():
[2023-05-07T10:20:14.966Z] is_vector_indexed = True
[2023-05-07T10:20:14.966Z] break
[2023-05-07T10:20:14.966Z] if is_vector_indexed is False:
[2023-05-07T10:20:14.966Z] default_index_param = gen_index_param(vector_index_type)
[2023-05-07T10:20:14.966Z] self.create_index(collection_w, default_index_field, default_index_param)
[2023-05-07T10:20:14.966Z] > collection_w.load()
[2023-05-07T10:20:14.966Z]
[2023-05-07T10:20:14.966Z] testcases/test_action_second_deployment.py:142:
[2023-05-07T10:20:14.966Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py:372: in load
[2023-05-07T10:20:14.966Z] conn.load_collection(self._name, replica_number=replica_number, timeout=timeout, **kwargs)
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:109: in handler
[2023-05-07T10:20:14.966Z] raise e
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:105: in handler
[2023-05-07T10:20:14.966Z] return func(*args, **kwargs)
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:136: in handler
[2023-05-07T10:20:14.966Z] ret = func(self, *args, **kwargs)
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:85: in handler
[2023-05-07T10:20:14.966Z] raise e
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:50: in handler
[2023-05-07T10:20:14.966Z] return func(self, *args, **kwargs)
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py:714: in load_collection
[2023-05-07T10:20:14.966Z] self.wait_for_loading_collection(collection_name, timeout)
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:109: in handler
[2023-05-07T10:20:14.966Z] raise e
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:105: in handler
[2023-05-07T10:20:14.966Z] return func(*args, **kwargs)
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:136: in handler
[2023-05-07T10:20:14.966Z] ret = func(self, *args, **kwargs)
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:85: in handler
[2023-05-07T10:20:14.966Z] raise e
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:50: in handler
[2023-05-07T10:20:14.966Z] return func(self, *args, **kwargs)
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py:732: in wait_for_loading_collection
[2023-05-07T10:20:14.966Z] progress = self.get_loading_progress(collection_name, timeout=timeout)
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:109: in handler
[2023-05-07T10:20:14.966Z] raise e
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:105: in handler
[2023-05-07T10:20:14.966Z] return func(*args, **kwargs)
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:136: in handler
[2023-05-07T10:20:14.966Z] ret = func(self, *args, **kwargs)
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:85: in handler
[2023-05-07T10:20:14.966Z] raise e
[2023-05-07T10:20:14.966Z] /usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py:50: in handler
[2023-05-07T10:20:14.966Z] return func(self, *args, **kwargs)
[2023-05-07T10:20:14.966Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2023-05-07T10:20:14.966Z]
[2023-05-07T10:20:14.966Z] self = <pymilvus.client.grpc_handler.GrpcHandler object at 0x7f34f07ad970>
[2023-05-07T10:20:14.966Z] collection_name = 'deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000'
[2023-05-07T10:20:14.966Z] partition_names = None, timeout = None
[2023-05-07T10:20:14.966Z]
[2023-05-07T10:20:14.966Z] @retry_on_rpc_failure()
[2023-05-07T10:20:14.966Z] def get_loading_progress(self, collection_name, partition_names=None, timeout=None):
[2023-05-07T10:20:14.966Z] request = Prepare.get_loading_progress(collection_name, partition_names)
[2023-05-07T10:20:14.966Z] response = self._stub.GetLoadingProgress.future(request, timeout=timeout).result()
[2023-05-07T10:20:14.966Z] if response.status.error_code != 0:
[2023-05-07T10:20:14.966Z] > raise MilvusException(response.status.error_code, response.status.reason)
[2023-05-07T10:20:14.966Z] E pymilvus.exceptions.MilvusException: <MilvusException: (code=52, message=deny to load, insufficient memory, please allocate more resources, collectionName: deploy_test_index_type_BIN_IVF_FLAT_is_compacted_not_compacted_segment_status_all_is_string_indexed_not_string_indexed_replica_number_0_is_deleted_is_deleted_data_size_3000)>
[2023-05-07T10:20:14.967Z]
[2023-05-07T10:20:14.967Z] /usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py:799: MilvusException
memory usage
/assign @aoiasd
/unassign
KNOWHERE index_.Deserialize will throw error if use BIN_IVF_FLAT, seems BIN_IVF_FLAT index binary format changed between 2.2.5 and master.
/assign @liliu-z please help on it
/assign
Same question at BIN_IVF
But querycoord should not throw Insufficient Error for every error in meta.GlobalFailedLoadCache.Get, will fix it.
BIN_IVF_FLAT upgrade issue has been fixed in knowhere, please rerun this case. @zhuwenxing
related PR: https://github.com/milvus-io/knowhere/pull/874 https://github.com/milvus-io/knowhere/pull/887 https://github.com/milvus-io/knowhere/pull/889 https://github.com/milvus-io/milvus/pull/24187
Hi @zhuwenxing , please re-test this issue
/assign @zhuwenxing