milvus
milvus copied to clipboard
[Bug]: [benchmark][cluster]Milvus is inserted and queried at the same time ,the querynode memory gradually rises to OOM
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:2.1.0-20220727-7169256c
- Deployment mode(standalone or cluster)cluster
- SDK version(e.g. pymilvus v2.0.0rc2):pymilvus 2.1.0dev103
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
server-instance fouram-6dgds-1 server-configmap server-cluster-8c64m-compaction client-configmap client-random-locust-compaction-5h
fouram-6dgds-1-etcd-0 1/1 Running 0 13h 10.104.4.179 4am-node11 <none> <none>
fouram-6dgds-1-etcd-1 1/1 Running 0 13h 10.104.6.170 4am-node13 <none> <none>
fouram-6dgds-1-etcd-2 1/1 Running 0 13h 10.104.1.101 4am-node10 <none> <none>
fouram-6dgds-1-milvus-datacoord-68c7f8f8fc-t984h 1/1 Running 1 (12h ago) 13h 10.104.4.172 4am-node11 <none> <none>
fouram-6dgds-1-milvus-datanode-79fb9687c6-26pb6 1/1 Running 1 (12h ago) 13h 10.104.4.173 4am-node11 <none> <none>
fouram-6dgds-1-milvus-indexcoord-5cf68cff5d-jg969 1/1 Running 1 (12h ago) 13h 10.104.6.167 4am-node13 <none> <none>
fouram-6dgds-1-milvus-indexnode-6dd4566667-2wnhg 1/1 Running 0 13h 10.104.5.165 4am-node12 <none> <none>
fouram-6dgds-1-milvus-proxy-6b68c77696-x2426 1/1 Running 1 (12h ago) 13h 10.104.4.171 4am-node11 <none> <none>
fouram-6dgds-1-milvus-querycoord-6459dd997c-pq7r8 1/1 Running 1 (12h ago) 13h 10.104.5.168 4am-node12 <none> <none>
fouram-6dgds-1-milvus-querynode-b4fd78f45-24d2r 1/1 Running 8 (10h ago) 13h 10.104.5.171 4am-node12 <none> <none>
fouram-6dgds-1-milvus-rootcoord-6f5687c6fc-ccwwr 1/1 Running 0 13h 10.104.5.167 4am-node12 <none> <none>
fouram-6dgds-1-minio-0 1/1 Running 0 13h 10.104.4.177 4am-node11 <none> <none>
fouram-6dgds-1-minio-1 1/1 Running 0 13h 10.104.1.98 4am-node10 <none> <none>
fouram-6dgds-1-minio-2 1/1 Running 0 13h 10.104.6.175 4am-node13 <none> <none>
fouram-6dgds-1-minio-3 1/1 Running 0 13h 10.104.5.173 4am-node12 <none> <none>
fouram-6dgds-1-pulsar-bookie-0 1/1 Running 0 13h 10.104.6.174 4am-node13 <none> <none>
fouram-6dgds-1-pulsar-bookie-1 1/1 Running 0 13h 10.104.1.102 4am-node10 <none> <none>
fouram-6dgds-1-pulsar-bookie-2 1/1 Running 0 13h 10.104.4.182 4am-node11 <none> <none>
fouram-6dgds-1-pulsar-bookie-init-wxwtl 0/1 Completed 0 13h 10.104.5.169 4am-node12 <none> <none>
fouram-6dgds-1-pulsar-broker-0 1/1 Running 0 13h 10.104.6.168 4am-node13 <none> <none>
fouram-6dgds-1-pulsar-proxy-0 1/1 Running 0 13h 10.104.5.166 4am-node12 <none> <none>
fouram-6dgds-1-pulsar-pulsar-init-vj9ql 0/1 Completed 0 13h 10.104.5.170 4am-node12 <none> <none>
fouram-6dgds-1-pulsar-recovery-0 1/1 Running 0 13h 10.104.1.95 4am-node10 <none> <none>
fouram-6dgds-1-pulsar-zookeeper-0 1/1 Running 0 13h 10.104.4.178 4am-node11 <none> <none>
fouram-6dgds-1-pulsar-zookeeper-1 1/1 Running 0 13h 10.104.6.177 4am-node13 <none> <none>
fouram-6dgds-1-pulsar-zookeeper-2 1/1 Running 0 13h 10.104.1.104 4am-node10 <none> <none>
querynode mem:
tart': '2022-07-28 13:15:01.422718', 'RPC error': '2022-07-28 13:15:01.456803'}> (pymilvus.decorators:95)
[2022-07-28 13:15:01,462] [ DEBUG] - Milvus get_info run in 0.0208s (milvus_benchmark.client:56)
[2022-07-28 13:15:01,470] [ DEBUG] - [scene_insert_delete_flush] Start insert : sift_10w_128_l2 (milvus_benchmark.client:651)
[2022-07-28 13:15:01,472] [ ERROR] - RPC error: [search], <MilvusException: (code=1, message=Invalid shard leader)>, <Time:{'RPC s
tart': '2022-07-28 13:15:01.422871', 'RPC error': '2022-07-28 13:15:01.472392'}> (pymilvus.decorators:95)
[2022-07-28 13:15:01,472] [ ERROR] - RPC error: [search], <MilvusException: (code=1, message=Invalid shard leader)>, <Time:{'RPC s
tart': '2022-07-28 13:15:01.423086', 'RPC error': '2022-07-28 13:15:01.472789'}> (pymilvus.decorators:95)
[2022-07-28 13:15:01,472] [ ERROR] - RPC error: [search], <MilvusException: (code=1, message=Invalid shard leader)>, <Time:{'RPC s
tart': '2022-07-28 13:15:01.422981', 'RPC error': '2022-07-28 13:15:01.472983'}> (pymilvus.decorators:95)
[2022-07-28 13:15:05,773] [ ERROR] - RPC error: [query], <MilvusException: (code=1, message=fail to search on all shard leaders, e
rr=fail to Query, QueryNode ID = 3, reason=query shard(channel) by-dev-rootcoord-dml_1_434900411607354689v1 does not exist
)>, <Time:{'RPC start': '2022-07-28 13:15:01.443969', 'RPC error': '2022-07-28 13:15:05.773038'}> (pymilvus.decorators:95)
[2022-07-28 13:15:05,773] [ ERROR] - RPC error: [query], <MilvusException: (code=1, message=fail to search on all shard leaders, e
rr=fail to Query, QueryNode ID = 3, reason=query shard(channel) by-dev-rootcoord-dml_1_434900411607354689v1 does not exist
)>, <Time:{'RPC start': '2022-07-28 13:15:01.444121', 'RPC error': '2022-07-28 13:15:05.773791'}> (pymilvus.decorators:95)
[2022-07-28 13:15:05,774] [ ERROR] - RPC error: [query], <MilvusException: (code=1, message=fail to search on all shard leaders, e
rr=fail to Query, QueryNode ID = 3, reason=query shard(channel) by-dev-rootcoord-dml_1_434900411607354689v1 does not exist
)>, <Time:{'RPC start': '2022-07-28 13:15:01.445513', 'RPC error': '2022-07-28 13:15:05.774003'}> (pymilvus.decorators:95)
[2022-07-28 13:15:05,774] [ ERROR] - RPC error: [query], <MilvusException: (code=1, message=fail to search on all shard leaders, e
rr=fail to Query, QueryNode ID = 3, reason=query shard(channel) by-dev-rootcoord-dml_1_434900411607354689v1 does not exist
)>, <Time:{'RPC start': '2022-07-28 13:15:01.445369', 'RPC error': '2022-07-28 13:15:05.774165'}> (pymilvus.decorators:95)
[2022-07-28 13:15:05,774] [ ERROR] - RPC error: [query], <MilvusException: (code=1, message=fail to search on all shard leaders, e
rr=fail to Query, QueryNode ID = 3, reason=query shard(channel) by-dev-rootcoord-dml_0_434900411607354689v0 does not exist
)>, <Time:{'RPC start': '2022-07-28 13:15:01.444540', 'RPC error': '2022-07-28 13:15:05.774319'}> (pymilvus.decorators:95)
[2022-07-28 13:15:05,774] [ ERROR] - RPC error: [search], <MilvusException: (code=1, message=Invalid shard leader)>, <Time:{'RPC s
tart': '2022-07-28 13:15:01.457276', 'RPC error': '2022-07-28 13:15:05.774819'}> (pymilvus.decorators:95)
[2022-07-28 13:15:05,775] [ ERROR] - RPC error: [search], <MilvusException: (code=1, message=Invalid shard leader)>, <Time:{'RPC s
tart': '2022-07-28 13:15:01.443467', 'RPC error': '2022-07-28 13:15:05.775050'}> (pymilvus.decorators:95)
[2022-07-28 13:15:05,775] [ ERROR] - RPC error: [search], <MilvusException: (code=1, message=Invalid shard leader)>, <Time:{'RPC s
tart': '2022-07-28 13:15:01.444339', 'RPC error': '2022-07-28 13:15:05.775224'}> (pymilvus.decorators:95)
[2022-07-28 13:15:05,775] [ ERROR] - RPC error: [search], <MilvusException: (code=1, message=Invalid shard leader)>, <Time:{'RPC s
tart': '2022-07-28 13:15:01.445701', 'RPC error': '2022-07-28 13:15:05.775389'}> (pymilvus.decorators:95)
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
client-random-locust-compaction-5h:
{
"config.yaml": "locust_random_performance:
collections:
-
collection_name: sift_10w_128_l2
ni_per: 50000
# other_fields: int1,int2,float1,double1
other_fields: float1
build_index: true
index_type: ivf_sq8
index_param:
nlist: 2048
task:
types:
-
type: query
weight: 20
params:
top_k: 10
nq: 10
search_param:
nprobe: 16
filters:
-
range: \"{'range': {'float1': {'GT': -1.0, 'LT': collection_size * 0.5}}}\"
-
type: load
weight: 1
-
type: get
weight: 10
params:
ids_length: 10
-
type: scene_insert_delete_flush
weight: 1
connection_num: 1
clients_num: 20
spawn_rate: 2
# during_time: 84h
during_time: 5h
"
}
from analyzing Log and memory, we can figure out that the load
action reads the vector field other than index files, so the memory usage is much larger than the 2.1 branch.
the following steps should find out why the index did not create as we expected
@aoiasd pls follow up this issue
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.