milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark][standalone] Milvus load failure causes search and query errors,error message"collection not loaded into memory when search"

Open jingkl opened this issue 2 years ago • 5 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.1.0-20220906-7affd93
- Deployment mode(standalone or cluster):standalone
- SDK version(e.g. pymilvus v2.0.0rc2):pymilvus 2.1.2dev8
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

server-instance fouram-test-no-clean-hv4sr-1 server-configmap server-single-32c128m-redur0 client-configmap client-random-locust-100m-ddl-r8-w2-12h

sever:

fouram-test-no-clean-hv4sr-1-etcd-0                               1/1     Running     0               15h     10.104.9.71    4am-node14   <none>           <none>
fouram-test-no-clean-hv4sr-1-milvus-standalone-c55bcf85f-zddt9    1/1     Running     0               15h     10.104.1.132   4am-node10   <none>           <none>
fouram-test-no-clean-hv4sr-1-minio-7488777966-s2b8c               1/1     Running     0               15h     10.104.1.133   4am-node10   <none>           <none>

log:

[2022-09-07 16:33:03,804] [    INFO] - 100000000 (milvus_benchmark.runners.locust:458)
[2022-09-07 16:33:03,804] [    INFO] - Start load collection (milvus_benchmark.runners.locust:459)
[2022-09-07 17:23:04,430] [   DEBUG] - Milvus load_collection run in 3000.6256s (milvus_benchmark.client:57)
[2022-09-07 17:23:04,431] [   DEBUG] - {'load_time': 3000.6264} (milvus_benchmark.runners.locust:462)
[2022-09-07 17:23:04,442] [   DEBUG] - Milvus load_collection_progress run in 0.0117s (milvus_benchmark.client:57)
[2022-09-07 17:23:04,443] [   DEBUG] - load_collection_progress: {'loading_progress': '0%', 'num_loaded_partitions': 0, 'not_loaded_partitions': ['_default']} (milvus_benchmark.runners.locust:464)
[2022-09-07 17:23:04,443] [    INFO] - Start run case (milvus_benchmark.main:88)
[2022-09-07 17:23:04,445] [   DEBUG] - Milvus get_info run in 0.0025s (milvus_benchmark.client:57)
[2022-09-07 17:23:04,447] [   DEBUG] - Milvus get_info run in 0.0019s (milvus_benchmark.client:57)
[2022-09-07 17:23:04,447] [    INFO] - {'index_field_name': 'float_vector', 'vector_field_name': 'float_vector', 'dimension': 128, 'collection_info': {'collection_name': 'sift_100m_128_l2', 'auto_id': False, 'description': '', 'fields': [{'field_id': 100, 'name': 'float_vector', 'description': '', 'type': 101, 'params': {'dim': 128}, 'is_primary': False, 'auto_id': False}, {'field_id': 101, 'name': 'id', 'description': '', 'type': 5, 'params': {}, 'is_primary': True, 'auto_id': False}], 'aliases': [], 'collection_id': 435830998750137665, 'consistency_level': 2}} (milvus_benchmark.runners.locust:64)
[2022-09-07 17:23:04,447] [    INFO] - {'tasks': {'query': {'weight': 8, 'params': {'top_k': 10, 'nq': 10, 'search_param': {'nprobe': 16}}}, 'load': {'weight': 1, 'params': None}, 'get': {'weight': 8, 'params': {'ids_length': 10}}, 'scene_test': {'weight': 2, 'params': None}}, 'types': [{'type': 'query', 'weight': 8, 'params': {'top_k': 10, 'nq': 10, 'search_param': {'nprobe': 16}}}, {'type': 'load', 'weight': 1}, {'type': 'get', 'weight': 8, 'params': {'ids_length': 10}}, {'type': 'scene_test', 'weight': 2}], 'connection_num': 1, 'clients_num': 20, 'spawn_rate': 2, 'during_time': 43200, 'op_info': {'index_field_name': 'float_vector', 'vector_field_name': 'float_vector', 'dimension': 128, 'collection_info': {'collection_name': 'sift_100m_128_l2', 'auto_id': False, 'description': '', 'fields': [{'field_id': 100, 'name': 'float_vector', 'description': '', 'type': 101, 'params': {'dim': 128}, 'is_primary': False, 'auto_id': False}, {'field_id': 101, 'name': 'id', 'description': '', 'type': 5, 'params': {}, 'is_primary': True, 'auto_id': False}], 'aliases': [], 'collection_id': 435830998750137665, 'consistency_level': 2}}} (milvus_benchmark.runners.locust:75)
[2022-09-07 17:23:04,448] [ WARNING] - [93m[WARNING] PyMilvus: class Milvus will be deprecated soon, please use Collection/utility instead[0m (pymilvus.decorators:19)
[2022-09-07 17:23:04,448] [ WARNING] - [get_env_variable] GO_PLAY failed to get environment variables : 'GO_PLAY', use default value :  (milvus_benchmark.utils:443)
[2022-09-07 17:23:04,448] [    INFO] - [<function Tasks.query at 0x7f64e965f280>, <function Tasks.query at 0x7f64e965f280>, <function Tasks.query at 0x7f64e965f280>, <function Tasks.query at 0x7f64e965f280>, <function Tasks.query at 0x7f64e965f280>, <function Tasks.query at 0x7f64e965f280>, <function Tasks.query at 0x7f64e965f280>, <function Tasks.query at 0x7f64e965f280>, <function Tasks.load at 0x7f64e965f430>, <function Tasks.get at 0x7f64e965f790>, <function Tasks.get at 0x7f64e965f790>, <function Tasks.get at 0x7f64e965f790>, <function Tasks.get at 0x7f64e965f790>, <function Tasks.get at 0x7f64e965f790>, <function Tasks.get at 0x7f64e965f790>, <function Tasks.get at 0x7f64e965f790>, <function Tasks.get at 0x7f64e965f790>, <function Tasks.scene_test at 0x7f64e965f820>, <function Tasks.scene_test at 0x7f64e965f820>] (milvus_benchmark.runners.locust_user:70)
[2022-09-07 17:23:04,449] [    INFO] - {'query': {'top_k': 10, 'nq': 10, 'search_param': {'nprobe': 16}}, 'load': None, 'get': {'ids_length': 10}, 'scene_test': None} (milvus_benchmark.runners.locust_user:71)
[2022-09-07 17:23:04,718] [    INFO] -  Name                                                                              # reqs      # fails  |     Avg     Min     Max  Median  |   req/s failures/s (locust.stats_logger:724)
[2022-09-07 17:23:04,718] [    INFO] - ---------------------------------------------------------------------------------------------------------------------------------------------------------------- (locust.stats_logger:728)
[2022-09-07 17:23:04,718] [    INFO] - ---------------------------------------------------------------------------------------------------------------------------------------------------------------- (locust.stats_logger:732)
[2022-09-07 17:23:04,718] [    INFO] -  Aggregated                                                                             0     0(0.00%)  |       0       0       0       0  |    0.00    0.00 (locust.stats_logger:733)
[2022-09-07 17:23:04,718] [    INFO] -  (locust.stats_logger:734)
[2022-09-07 17:23:04,719] [    INFO] - Ramping to 20 users at a rate of 2.00 per second (locust.runners:332)
[2022-09-07 17:23:04,720] [   DEBUG] - Ramping to {"MyUser": 2} (2 total users) (locust.runners:341)
[2022-09-07 17:23:04,720] [   DEBUG] - Spawning additional {"MyUser": 2} ({"MyUser": 0} already running)... (locust.runners:206)
[2022-09-07 17:23:04,720] [   DEBUG] - 2 users spawned (locust.runners:220)
[2022-09-07 17:23:04,720] [   DEBUG] - All users of class MyUser spawned (locust.runners:221)
[2022-09-07 17:23:04,720] [   DEBUG] - 0 users have been stopped, 2 still running (locust.runners:281)
[2022-09-07 17:23:04,721] [   DEBUG] - [scene_test] Start scene test : scene_test_5764_317630 (milvus_benchmark.client:633)
[2022-09-07 17:23:04,735] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-07 17:23:04.720771', 'RPC error': '2022-09-07 17:23:04.735080'}> (pymilvus.decorators:95)
[2022-09-07 17:23:04,735] [    INFO] - Create collection: <scene_test_5764_317630> successfully (milvus_benchmark.client:158)
[2022-09-07 17:23:04,735] [   DEBUG] - Milvus create_collection run in 0.0136s (milvus_benchmark.client:57)
[2022-09-07 17:23:04,743] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-07 17:23:04.735530', 'RPC error': '2022-09-07 17:23:04.743472'}> (pymilvus.decorators:95)
[2022-09-07 17:23:04,747] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-07 17:23:04.743680', 'RPC error': '2022-09-07 17:23:04.747202'}> (pymilvus.decorators:95)
[2022-09-07 17:23:05,719] [   DEBUG] - Ramping to {"MyUser": 4} (4 total users) (locust.runners:341)
[2022-09-07 17:23:05,719] [   DEBUG] - Spawning additional {"MyUser": 2} ({"MyUser": 2} already running)... (locust.runners:206)
[2022-09-07 17:23:05,720] [   DEBUG] - 4 users spawned (locust.runners:220)
[2022-09-07 17:23:05,720] [   DEBUG] - All users of class MyUser spawned (locust.runners:221)
[2022-09-07 17:23:05,720] [   DEBUG] - 0 users have been stopped, 4 still running (locust.runners:281)
[2022-09-07 17:23:05,724] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-07 17:23:05.720271', 'RPC error': '2022-09-07 17:23:05.724226'}> (pymilvus.decorators:95)
[2022-09-07 17:23:05,726] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-07 17:23:05.720641', 'RPC error': '2022-09-07 17:23:05.726258'}> (pymilvus.decorators:95)
[2022-09-07 17:23:05,731] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-07 17:23:05.725974', 'RPC error': '2022-09-07 17:23:05.731272'}> (pymilvus.decorators:95)
[2022-09-07 17:23:05,736] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-07 17:23:05.726790', 'RPC error': '2022-09-07 17:23:05.736312'}> (pymilvus.decorators:95)
[2022-09-07 17:23:05,743] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-07 17:23:05.736214', 'RPC error': '2022-09-07 17:23:05.743378'}> (pymilvus.decorators:95)
[2022-09-07 17:23:05,745] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-07 17:23:05.736601', 'RPC error': '2022-09-07 17:23:05.745252'}> (pymilvus.decorators:95)
[2022-09-07 17:23:05,749] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-07 17:23:05.746500', 'RPC error': '2022-09-07 17:23:05.749656'}> (pymilvus.decorators:95)
[2022-09-07 17:23:05,752] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-07 17:23:05.750110', 'RPC error': '2022-09-07 17:23:05.752291'}> (pymilvus.decorators:95)
[2022-09-07 17:23:05,754] [   DEBUG] - Milvus get_info run in 0.0021s (milvus_benchmark.client:57)
[2022-09-07 17:23:05,754] [   DEBUG] - [scene_test] Start insert : scene_test_5764_317630 (milvus_benchmark.client:640)
[2022-09-07 17:23:05,819] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-07 17:23:05.753175', 'RPC error': '2022-09-07 17:23:05.819893'}> (pymilvus.decorators:95)
[2022-09-07 17:23:05,820] [   DEBUG] - Milvus insert run in 0.0654s (milvus_benchmark.client:57)
[2022-09-07 17:23:05,820] [   DEBUG] - [scene_test] Start flush : scene_test_5764_317630 (milvus_benchmark.client:642)
[2022-09-07 17:23:05,826] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-07 17:23:05.820678', 'RPC error': '2022-09-07 17:23:05.826062'}> (pymilvus.decorators:95)
[2022-09-07 17:23:06,720] [   DEBUG] - Ramping to {"MyUser": 6} (6 total users) (locust.runners:341)
[2022-09-07 17:23:06,721] [   DEBUG] - Spawning additional {"MyUser": 2} ({"MyUser": 4} already running)... (locust.runners:206)
[2022-09-07 17:23:06,721] [   DEBUG] - 6 users spawned (locust.runners:220)

Expected Behavior

No response

Steps To Reproduce

1. create an collection
2. build ivfsq8 index 
3. insert 100m data
4. build index again
5. load data

Milvus Log

No response

Anything else?

server-single-32c128m-redur0

{
	"config.yaml": "server:
		  server_tag: \"32c128m\"
		milvus:
		  deploy_mode: \"single\"
		server_resource:
		  dataCoordinator:
		    segment:
		      sealProportion: 0.23
		  common:
		    compaction: 
		      retentionDuration: 0
		"
}

jingkl avatar Sep 08 '22 06:09 jingkl

Could you please refer this script to export the whole Milvus logs for investigation?

weiliu1031 avatar Sep 09 '22 02:09 weiliu1031

server-instance fouram-test-no-clean-km8wb-1 server-configmap server-single-32c128m-redur-seal client-configmap client-random-locust-100m-ddl-r8-w2-1h-insert10w

fouram-test-no-clean-km8wb-1-etcd-0                               1/1     Running     0               3d16h   10.104.6.166   4am-node13   <none>           <none>
fouram-test-no-clean-km8wb-1-milvus-standalone-df59f865d-sm8mv    1/1     Running     0               3d16h   10.104.5.74    4am-node12   <none>           <none>
fouram-test-no-clean-km8wb-1-minio-64b7c5bbdc-5glgb               1/1     Running     0               3d16h   10.104.5.73    4am-node12   <none>           <none>

server-single-32c128m-redur-seal:

{
	"config.yaml": "server:
		  server_tag: \"32c128m\"
		milvus:
		  deploy_mode: \"single\"
		server_resource:
		  dataCoordinator:
		    segment:
		      sealProportion: 0.20
		  common:
		    compaction: 
		      retentionDuration: 0
		"
}

client log:

[2022-09-09 13:42:54,110] [   DEBUG] - Milvus load_collection run in 3000.5978s (milvus_benchmark.client:57)
[2022-09-09 13:42:54,111] [   DEBUG] - {'load_time': 3000.5986} (milvus_benchmark.runners.locust:462)
[2022-09-09 13:42:54,115] [   DEBUG] - Milvus load_collection_progress run in 0.0038s (milvus_benchmark.client:57)
[2022-09-09 13:42:54,115] [   DEBUG] - load_collection_progress: {'loading_progress': '0%', 'num_loaded_partitions': 0, 'not_loaded_partitions': ['_default']} (milvus_benchmark.runners.locust:464)
[2022-09-09 13:42:54,115] [    INFO] - Start run case (milvus_benchmark.main:88)
[2022-09-09 13:42:54,118] [   DEBUG] - Milvus get_info run in 0.0029s (milvus_benchmark.client:57)
[2022-09-09 13:42:54,120] [   DEBUG] - Milvus get_info run in 0.0021s (milvus_benchmark.client:57)
[2022-09-09 13:42:54,120] [    INFO] - {'index_field_name': 'float_vector', 'vector_field_name': 'float_vector', 'dimension': 128, 'collection_info': {'collection_name': 'sift_100m_128_l2', 'auto_id': False, 'description': '', 'fields': [{'field_id': 100, 'name': 'float_vector', 'description': '', 'type': 101, 'params': {'dim': 128}, 'is_primary': False, 'auto_id': False}, {'field_id': 101, 'name': 'id', 'description': '', 'type': 5, 'params': {}, 'is_primary': True, 'auto_id': False}], 'aliases': [], 'collection_id': 435873110825045313, 'consistency_level': 2}} (milvus_benchmark.runners.locust:64)
[2022-09-09 13:42:54,120] [    INFO] - {'tasks': {'query': {'weight': 8, 'params': {'top_k': 10, 'nq': 10, 'search_param': {'nprobe': 16}}}, 'load': {'weight': 1, 'params': None}, 'get': {'weight': 8, 'params': {'ids_length': 10}}, 'scene_test': {'weight': 2, 'params': None}}, 'types': [{'type': 'query', 'weight': 8, 'params': {'top_k': 10, 'nq': 10, 'search_param': {'nprobe': 16}}}, {'type': 'load', 'weight': 1}, {'type': 'get', 'weight': 8, 'params': {'ids_length': 10}}, {'type': 'scene_test', 'weight': 2}], 'connection_num': 1, 'clients_num': 20, 'spawn_rate': 2, 'during_time': 3600, 'op_info': {'index_field_name': 'float_vector', 'vector_field_name': 'float_vector', 'dimension': 128, 'collection_info': {'collection_name': 'sift_100m_128_l2', 'auto_id': False, 'description': '', 'fields': [{'field_id': 100, 'name': 'float_vector', 'description': '', 'type': 101, 'params': {'dim': 128}, 'is_primary': False, 'auto_id': False}, {'field_id': 101, 'name': 'id', 'description': '', 'type': 5, 'params': {}, 'is_primary': True, 'auto_id': False}], 'aliases': [], 'collection_id': 435873110825045313, 'consistency_level': 2}}} (milvus_benchmark.runners.locust:75)
[2022-09-09 13:42:54,121] [ WARNING] - [93m[WARNING] PyMilvus: class Milvus will be deprecated soon, please use Collection/utility instead[0m (pymilvus.decorators:19)
[2022-09-09 13:42:54,121] [ WARNING] - [get_env_variable] GO_PLAY failed to get environment variables : 'GO_PLAY', use default value :  (milvus_benchmark.utils:443)
[2022-09-09 13:42:54,121] [    INFO] - [<function Tasks.query at 0x7f63c027e310>, <function Tasks.query at 0x7f63c027e310>, <function Tasks.query at 0x7f63c027e310>, <function Tasks.query at 0x7f63c027e310>, <function Tasks.query at 0x7f63c027e310>, <function Tasks.query at 0x7f63c027e310>, <function Tasks.query at 0x7f63c027e310>, <function Tasks.query at 0x7f63c027e310>, <function Tasks.load at 0x7f63c027e4c0>, <function Tasks.get at 0x7f63c027e820>, <function Tasks.get at 0x7f63c027e820>, <function Tasks.get at 0x7f63c027e820>, <function Tasks.get at 0x7f63c027e820>, <function Tasks.get at 0x7f63c027e820>, <function Tasks.get at 0x7f63c027e820>, <function Tasks.get at 0x7f63c027e820>, <function Tasks.get at 0x7f63c027e820>, <function Tasks.scene_test at 0x7f63c027e8b0>, <function Tasks.scene_test at 0x7f63c027e8b0>] (milvus_benchmark.runners.locust_user:70)
[2022-09-09 13:42:54,122] [    INFO] - {'query': {'top_k': 10, 'nq': 10, 'search_param': {'nprobe': 16}}, 'load': None, 'get': {'ids_length': 10}, 'scene_test': None} (milvus_benchmark.runners.locust_user:71)
[2022-09-09 13:42:54,374] [    INFO] -  Name                                                                              # reqs      # fails  |     Avg     Min     Max  Median  |   req/s failures/s (locust.stats_logger:724)
[2022-09-09 13:42:54,374] [    INFO] - ---------------------------------------------------------------------------------------------------------------------------------------------------------------- (locust.stats_logger:728)
[2022-09-09 13:42:54,374] [    INFO] - ---------------------------------------------------------------------------------------------------------------------------------------------------------------- (locust.stats_logger:732)
[2022-09-09 13:42:54,374] [    INFO] -  Aggregated                                                                             0     0(0.00%)  |       0       0       0       0  |    0.00    0.00 (locust.stats_logger:733)
[2022-09-09 13:42:54,374] [    INFO] -  (locust.stats_logger:734)
[2022-09-09 13:42:54,374] [    INFO] - Ramping to 20 users at a rate of 2.00 per second (locust.runners:332)
[2022-09-09 13:42:54,376] [   DEBUG] - Ramping to {"MyUser": 2} (2 total users) (locust.runners:341)
[2022-09-09 13:42:54,376] [   DEBUG] - Spawning additional {"MyUser": 2} ({"MyUser": 0} already running)... (locust.runners:206)
[2022-09-09 13:42:54,376] [   DEBUG] - 2 users spawned (locust.runners:220)
[2022-09-09 13:42:54,376] [   DEBUG] - All users of class MyUser spawned (locust.runners:221)
[2022-09-09 13:42:54,376] [   DEBUG] - 0 users have been stopped, 2 still running (locust.runners:281)
[2022-09-09 13:42:54,382] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-09 13:42:54.376619', 'RPC error': '2022-09-09 13:42:54.382546'}> (pymilvus.decorators:95)
[2022-09-09 13:42:54,383] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-09 13:42:54.377244', 'RPC error': '2022-09-09 13:42:54.383457'}> (pymilvus.decorators:95)
[2022-09-09 13:42:54,383] [   DEBUG] - [scene_test] Start scene test : scene_test_5159_346144 (milvus_benchmark.client:633)
[2022-09-09 13:42:54,390] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-09 13:42:54.383322', 'RPC error': '2022-09-09 13:42:54.390819'}> (pymilvus.decorators:95)
[2022-09-09 13:42:54,390] [    INFO] - Create collection: <scene_test_5159_346144> successfully (milvus_benchmark.client:158)
[2022-09-09 13:42:54,391] [   DEBUG] - Milvus create_collection run in 0.0072s (milvus_benchmark.client:57)
[2022-09-09 13:42:55,375] [   DEBUG] - Ramping to {"MyUser": 4} (4 total users) (locust.runners:341)
[2022-09-09 13:42:55,376] [   DEBUG] - Spawning additional {"MyUser": 2} ({"MyUser": 2} already running)... (locust.runners:206)
[2022-09-09 13:42:55,376] [   DEBUG] - 4 users spawned (locust.runners:220)
[2022-09-09 13:42:55,376] [   DEBUG] - All users of class MyUser spawned (locust.runners:221)
[2022-09-09 13:42:55,376] [   DEBUG] - 0 users have been stopped, 4 still running (locust.runners:281)
[2022-09-09 13:42:55,380] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-09 13:42:55.376659', 'RPC error': '2022-09-09 13:42:55.380925'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,382] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-09 13:42:55.377049', 'RPC error': '2022-09-09 13:42:55.382214'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,385] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-09 13:42:55.382016', 'RPC error': '2022-09-09 13:42:55.385664'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,386] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-09 13:42:55.382534', 'RPC error': '2022-09-09 13:42:55.386887'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,390] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-09 13:42:55.386650', 'RPC error': '2022-09-09 13:42:55.390087'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,391] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-09 13:42:55.387524', 'RPC error': '2022-09-09 13:42:55.391813'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,395] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-09 13:42:55.390826', 'RPC error': '2022-09-09 13:42:55.395584'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,395] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-09 13:42:55.393039', 'RPC error': '2022-09-09 13:42:55.395737'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,398] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-09 13:42:55.395868', 'RPC error': '2022-09-09 13:42:55.398631'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,401] [   DEBUG] - Milvus get_info run in 0.0029s (milvus_benchmark.client:57)
[2022-09-09 13:42:55,401] [   DEBUG] - [scene_test] Start insert : scene_test_5159_346144 (milvus_benchmark.client:640)
[2022-09-09 13:42:55,401] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-09 13:42:55.396028', 'RPC error': '2022-09-09 13:42:55.401724'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,403] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-09 13:42:55.399135', 'RPC error': '2022-09-09 13:42:55.403310'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,457] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-09 13:42:55.401988', 'RPC error': '2022-09-09 13:42:55.457782'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,458] [   DEBUG] - Milvus insert run in 0.0565s (milvus_benchmark.client:57)
[2022-09-09 13:42:55,458] [   DEBUG] - [scene_test] Start flush : scene_test_5159_346144 (milvus_benchmark.client:642)
[2022-09-09 13:42:55,462] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-09 13:42:55.435690', 'RPC error': '2022-09-09 13:42:55.462743'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,465] [   DEBUG] - [scene_test] Start scene test : scene_test_4668_553823 (milvus_benchmark.client:633)
[2022-09-09 13:42:55,466] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-09 13:42:55.459173', 'RPC error': '2022-09-09 13:42:55.466934'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,471] [    INFO] - Create collection: <scene_test_4668_553823> successfully (milvus_benchmark.client:158)
[2022-09-09 13:42:55,471] [   DEBUG] - Milvus create_collection run in 0.0059s (milvus_benchmark.client:57)
[2022-09-09 13:42:55,473] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-09 13:42:55.471065', 'RPC error': '2022-09-09 13:42:55.473202'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,475] [   ERROR] - RPC error: [search], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when search)>, <Time:{'RPC start': '2022-09-09 13:42:55.473371', 'RPC error': '2022-09-09 13:42:55.475190'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,476] [   ERROR] - RPC error: [query], <MilvusException: (code=1, message=collection:sift_100m_128_l2 or partition:[] not loaded into memory when query)>, <Time:{'RPC start': '2022-09-09 13:42:55.475352', 'RPC error': '2022-09-09 13:42:55.476913'}> (pymilvus.decorators:95)
[2022-09-09 13:42:55,477] [   DEBUG] - [scene_test] Start scene test : scene_test_8474_505162 (milvus_benchmark.client:633)
[2022-09-09 13:42:55,480] [    INFO] - Create collection: <scene_test_8474_505162> successfully (milvus_benchmark.client:158)
[2022-09-09 13:42:55,480] [   DEBUG] - Milvus create_collection run in 0.0036s (milvus_benchmark.client:57)

Steps:

1. create an collection
2. build ivfsq8 index 
3. insert 100m data when insert per 10w
4. build index again
5. load data success but search raise load an error

jingkl avatar Sep 13 '22 04:09 jingkl

Could you please refer this script to export the whole Milvus logs for investigation?

milvus-log.tar.gz

yanliang567 avatar Sep 13 '22 09:09 yanliang567

/assign @weiliu1031 /unassign

yanliang567 avatar Sep 13 '22 09:09 yanliang567

release_name_prefix fouramf-rzbls deploy_config empty-file case_params empty-file other_params --milvus_tag=2.1.0-20220913-3c3ba55 -k test_search_time -s

fouramf-rzbls-90-3702-etcd-0                                      1/1     Running     0               5m50s   10.104.1.34    4am-node10   <none>           <none>
fouramf-rzbls-90-3702-etcd-1                                      1/1     Running     0               5m50s   10.104.6.125   4am-node13   <none>           <none>
fouramf-rzbls-90-3702-etcd-2                                      1/1     Running     0               5m49s   10.104.4.237   4am-node11   <none>           <none>
fouramf-rzbls-90-3702-milvus-datacoord-7449c69dcc-lckpq           1/1     Running     1 (2m19s ago)   5m50s   10.104.5.77    4am-node12   <none>           <none>
fouramf-rzbls-90-3702-milvus-datanode-7bd547845-f5s6p             1/1     Running     1 (109s ago)    5m50s   10.104.1.26    4am-node10   <none>           <none>
fouramf-rzbls-90-3702-milvus-indexcoord-5b9bcfc55-77jfp           1/1     Running     1 (2m18s ago)   5m50s   10.104.1.29    4am-node10   <none>           <none>
fouramf-rzbls-90-3702-milvus-indexnode-67c999d5dd-87lg8           1/1     Running     0               5m50s   10.104.1.30    4am-node10   <none>           <none>
fouramf-rzbls-90-3702-milvus-proxy-6964d7b687-rqt6b               1/1     Running     1 (2m18s ago)   5m50s   10.104.1.28    4am-node10   <none>           <none>
fouramf-rzbls-90-3702-milvus-querycoord-6f6884478d-lpzrm          1/1     Running     1 (109s ago)    5m50s   10.104.1.27    4am-node10   <none>           <none>
fouramf-rzbls-90-3702-milvus-querynode-84bc79d8d6-jrpjx           1/1     Running     0               5m50s   10.104.5.80    4am-node12   <none>           <none>
fouramf-rzbls-90-3702-milvus-rootcoord-7468d58474-8l679           1/1     Running     1 (2m19s ago)   5m50s   10.104.1.25    4am-node10   <none>           <none>
fouramf-rzbls-90-3702-minio-0                                     1/1     Running     0               5m50s   10.104.1.33    4am-node10   <none>           <none>
fouramf-rzbls-90-3702-minio-1                                     1/1     Running     0               5m50s   10.104.9.16    4am-node14   <none>           <none>
fouramf-rzbls-90-3702-minio-2                                     1/1     Running     0               5m50s   10.104.6.122   4am-node13   <none>           <none>
fouramf-rzbls-90-3702-minio-3                                     1/1     Running     0               5m50s   10.104.4.235   4am-node11   <none>           <none>
fouramf-rzbls-90-3702-pulsar-bookie-0                             1/1     Running     0               5m50s   10.104.9.19    4am-node14   <none>           <none>
fouramf-rzbls-90-3702-pulsar-bookie-1                             1/1     Running     0               5m49s   10.104.6.126   4am-node13   <none>           <none>
fouramf-rzbls-90-3702-pulsar-bookie-2                             1/1     Running     0               5m49s   10.104.4.238   4am-node11   <none>           <none>
fouramf-rzbls-90-3702-pulsar-bookie-init-2rncw                    0/1     Completed   0               5m50s   10.104.9.14    4am-node14   <none>           <none>
fouramf-rzbls-90-3702-pulsar-broker-0                             1/1     Running     0               5m50s   10.104.9.13    4am-node14   <none>           <none>
fouramf-rzbls-90-3702-pulsar-proxy-0                              1/1     Running     0               5m50s   10.104.5.78    4am-node12   <none>           <none>
fouramf-rzbls-90-3702-pulsar-pulsar-init-rbz5r                    0/1     Completed   0               5m50s   10.104.5.79    4am-node12   <none>           <none>
fouramf-rzbls-90-3702-pulsar-recovery-0                           1/1     Running     0               5m50s   10.104.1.24    4am-node10   <none>           <none>
fouramf-rzbls-90-3702-pulsar-zookeeper-0                          1/1     Running     0               5m50s   10.104.5.82    4am-node12   <none>           <none>
fouramf-rzbls-90-3702-pulsar-zookeeper-1                          1/1     Running     0               5m12s   10.104.9.21    4am-node14   <none>           <none>
fouramf-rzbls-90-3702-pulsar-zookeeper-2                          1/1     Running     0               4m32s   10.104.6.128   4am-node13   <none>           <none> 

log:

[2022-09-21 10:02:38 - DEBUG - fouram]: (api_response) : None  (api_request.py:24)
[2022-09-21 10:02:38 -  INFO - fouram]: [Time] Collection.load run in 202.4105s (api_request.py:29)
[2022-09-21 10:02:42 -  INFO - fouram]: [PerfTemplate] Actual parameters used: {'collection_params': {'other_fields': ['int64_1', 'int64_2', 'float_1', 'double_1']}, 'load_params': {}, 'search_params': {'nq': 1, 'param': {'metric_type': 'L2', 'params': {'nprobe': 8}}, 'top_k': 1, 'expr': {'float_1': {'GT': -1.0, 'LT': 5000000.0}}}, 'dataset_params': {'dataset_name': 'sift', 'dim': 128, 'dataset_size': 50000000, 'ni_per': 50000, 'metric_type': 'L2', 'req_run_counts': 10}, 'index_params': {'index_type': 'IVF_FLAT', 'index_param': {'nlist': 2048}}} (performance_template.py:51)
[2022-09-21 10:02:42 -  INFO - fouram]: [Base] Params of search: nq:1, anns_field:float_vector, param:{'metric_type': 'L2', 'params': {'nprobe': 8}}, limit:1, expr:"float_1 > -1.0 && float_1 < 5000000.0" (base.py:258)
[2022-09-21 10:02:42 - DEBUG - fouram]: (api_request)  : [Collection.search] args: [[[0.013725490196078431, 0.03725490196078431, 0.06862745098039216, 0.307843137254902, 0.32745098039215687, 0.060784313725490195, 0.041176470588235294, 0.03333333333333333, 0.3980392156862745, 0.4294117647058823, 0.08431372549019608, 0.03333333333333333, 0.013725490196078431, 0.00980392156862745, 0.0......, kwargs: {} (api_request.py:53)
[2022-09-21 10:02:42 - ERROR - fouram]: Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 92, in handler
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 74, in handler
    raise e
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 48, in handler
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 446, in search
    return self._execute_search_requests(requests, timeout, **_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 410, in _execute_search_requests
    raise pre_err
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 401, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=collection:fouram_tnQkem2i or partition:[] not loaded into memory when search)>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/src/fouram/client/util/api_request.py", line 21, in inner_wrapper
    res = func(*args, **kwargs)
  File "/src/fouram/client/util/api_request.py", line 57, in api_request
    return func(*arg, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 712, in search
    res = conn.search(self._name, data, anns_field, param, limit, expr,
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 95, in handler
    LOGGER.error(f"RPC error: [{inner_name}], {e}, <Time:{record_dict}>")
  File "/usr/lib/python3.8/logging/__init__.py", line 1463, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/lib/python3.8/logging/__init__.py", line 1577, in _log
    self.handle(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 1587, in handle
    self.callHandlers(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 1649, in callHandlers
    hdlr.handle(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 950, in handle
    self.emit(record)
  File "/usr/local/lib/python3.8/dist-packages/_pytest/logging.py", line 331, in emit
    super().emit(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 1089, in emit
    self.handleError(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 1081, in emit
    msg = self.format(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 925, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.8/dist-packages/_pytest/logging.py", line 92, in format
    return super().format(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 667, in format
    s = self.formatMessage(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 636, in formatMessage
    return self._style.format(record)
  File "/usr/local/lib/python3.8/dist-packages/_pytest/logging.py", line 190, in format
    return self._fmt % record.__dict__
KeyError: 'fn'
 (api_request.py:35)
[2022-09-21 10:02:42 - ERROR - fouram]: (api_response) : 'fn' (api_request.py:36)
[2022-09-21 10:02:42 - ERROR - fouram]: [CheckFunc] Request check failed, response:'fn' (func_check.py:40)

jingkl avatar Sep 22 '22 11:09 jingkl

@weiliu1031 it reproduced on master-20221006-e1124765 https://argo-workflows.zilliz.cc/workflows/qa/fouramf-cron-1665158400?tab=workflow&nodeId=fouramf-cron-1665158400-1787986870&nodePanelView=summary

NAME                                                              READY   STATUS      RESTARTS   AGE     IP             NODE
 NOMINATED NODE   READINESS GATES
fouramf-cron-1665158400-51-5917-etcd-0                            1/1     Running     0          2m52s   10.104.9.204   4am-node14
 <none>           <none>
fouramf-cron-1665158400-51-5917-milvus-standalone-795cbf84jgplc   1/1     Running     0          2m52s   10.104.6.80    4am-node13
 <none>           <none>
fouramf-cron-1665158400-51-5917-minio-67766fb449-zqmzv            1/1     Running     0          2m52s   10.104.6.81    4am-node13
[2022-10-07 16:41:15,829 -  INFO - fouram]: [Base] Start load collection fouram_htNTW2mu, replica_number:1 (base.py:95)
[2022-10-07 16:49:43,670 -  INFO - fouram]: [Time] Collection.load run in 507.8403s (api_request.py:29)
[2022-10-07 16:49:49,398 -  INFO - fouram]: [PerfTemplate] Actual parameters used: {'collection_params': {'other_fields': ['int64_1', 'int64_2', 'float_1', 'double_1', 'varchar_1']}, 'load_params': {
}, 'search_params': {'nq': 1, 'param': {'metric_type': 'L2', 'params': {'nprobe': 8}}, 'top_k': 1, 'expr': 'float_1 > -1.0 && float_1 < 5000000.0'}, 'dataset_params': {'dataset_name': 'sift', 'dim':
128, 'dataset_size': 50000000, 'ni_per': 50000, 'metric_type': 'L2', 'req_run_counts': 10}, 'index_params': {'index_type': 'IVF_FLAT', 'index_param': {'nlist': 2048}}} (performance_template.py:57)
[2022-10-07 16:49:49,398 -  INFO - fouram]: [Base] Params of search: nq:1, anns_field:float_vector, param:{'metric_type': 'L2', 'params': {'nprobe': 8}}, limit:1, expr:"float_1 > -1.0 && float_1 < 50
00000.0" (base.py:261)
[2022-10-07 16:52:24,331 - ERROR - fouram]: Traceback (most recent call last):
  File "/src/fouram/client/util/api_request.py", line 21, in inner_wrapper
    res = func(*args, **kwargs)
  File "/src/fouram/client/util/api_request.py", line 57, in api_request
    return func(*arg, **kwargs)
   ... ...
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 472, in search
    return self._execute_search_requests(requests, timeout, **_kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 436, in _execute_search_requests
    raise pre_err
 File "/usr/local/lib/python3.8/dist-packages/pymilvus/client/grpc_handler.py", line 427, in _execute_search_requests
    raise MilvusException(response.status.error_code, response.status.reason)
pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=collection:fouram_htNTW2mu or partition:[] not loaded into memory when search)>
 (api_request.py:35)
[2022-10-07 16:52:24,332 - ERROR - fouram]: (api_response) : <MilvusException: (code=1, message=collection:fouram_htNTW2mu or partition:[] not loaded into memory when search)> (api_request.py:36)
[2022-10-07 16:52:24,332 - ERROR - fouram]: [CheckFunc] search request check failed, response:<MilvusException: (code=1, message=collection:fouram_htNTW2mu or partition:[] not loaded into memory when
 search)> (func_check.py:40)

yanliang567 avatar Oct 09 '22 07:10 yanliang567

[2022-10-07 16:52:24,331 - ERROR - fouram]: Traceback (most recent call last):

milvus restarts at 16:49:49, and after that query coord try to load the collection back, but the collection is still loading until the client submit query

weiliu1031 avatar Oct 10 '22 07:10 weiliu1031

/assign @jingkl

weiliu1031 avatar Oct 12 '22 02:10 weiliu1031

load issue not reproduced

yanliang567 avatar Oct 14 '22 02:10 yanliang567