milvus
milvus copied to clipboard
[Bug]: [benchmark][cluster] indexNode OOM in VARCHAR scalars build default index scene
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20240301-36d78e3d-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):2.4.0rc36
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
argo task: fouramf-multi-vector-kx5gb
server:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
inverted-scene-dql-default-etcd-0 1/1 Running 0 7h32m 10.104.25.151 4am-node30 <none> <none>
inverted-scene-dql-default-etcd-1 1/1 Running 0 7h32m 10.104.16.163 4am-node21 <none> <none>
inverted-scene-dql-default-etcd-2 1/1 Running 0 7h32m 10.104.19.238 4am-node28 <none> <none>
inverted-scene-dql-default-milvus-datacoord-5fcf4b8695-86frx 1/1 Running 0 7h32m 10.104.6.174 4am-node13 <none> <none>
inverted-scene-dql-default-milvus-datanode-57c7f6bb77-qprxw 1/1 Running 0 7h32m 10.104.6.173 4am-node13 <none> <none>
inverted-scene-dql-default-milvus-indexcoord-6f8fb88cb6-btlb9 1/1 Running 0 7h32m 10.104.32.14 4am-node39 <none> <none>
inverted-scene-dql-default-milvus-indexnode-84549545cc-mdzst 0/1 CrashLoopBackOff 68 (2m37s ago) 7h32m 10.104.32.13 4am-node39 <none> <none>
inverted-scene-dql-default-milvus-proxy-8dbcf7b58-54fwx 1/1 Running 1 (7h28m ago) 7h32m 10.104.32.12 4am-node39 <none> <none>
inverted-scene-dql-default-milvus-querycoord-54d7d994bf-lwrdx 1/1 Running 0 7h32m 10.104.6.171 4am-node13 <none> <none>
inverted-scene-dql-default-milvus-querynode-f9f58cf98-mnsjw 1/1 Running 0 7h32m 10.104.32.15 4am-node39 <none> <none>
inverted-scene-dql-default-milvus-rootcoord-b7d565f47-tfpcr 1/1 Running 0 7h32m 10.104.6.172 4am-node13 <none> <none>
inverted-scene-dql-default-minio-0 1/1 Running 0 7h32m 10.104.18.58 4am-node25 <none> <none>
inverted-scene-dql-default-minio-1 1/1 Running 0 7h32m 10.104.25.150 4am-node30 <none> <none>
inverted-scene-dql-default-minio-2 1/1 Running 0 7h32m 10.104.23.99 4am-node27 <none> <none>
inverted-scene-dql-default-minio-3 1/1 Running 0 7h32m 10.104.29.248 4am-node35 <none> <none>
inverted-scene-dql-default-pulsar-bookie-0 1/1 Running 0 7h32m 10.104.18.60 4am-node25 <none> <none>
inverted-scene-dql-default-pulsar-bookie-1 1/1 Running 0 7h32m 10.104.25.152 4am-node30 <none> <none>
inverted-scene-dql-default-pulsar-bookie-2 1/1 Running 0 7h32m 10.104.29.249 4am-node35 <none> <none>
inverted-scene-dql-default-pulsar-bookie-init-m4bxl 0/1 Completed 0 7h32m 10.104.18.52 4am-node25 <none> <none>
inverted-scene-dql-default-pulsar-broker-0 1/1 Running 0 7h32m 10.104.5.8 4am-node12 <none> <none>
inverted-scene-dql-default-pulsar-proxy-0 1/1 Running 0 7h32m 10.104.18.53 4am-node25 <none> <none>
inverted-scene-dql-default-pulsar-pulsar-init-cjzn7 0/1 Completed 0 7h32m 10.104.18.50 4am-node25 <none> <none>
inverted-scene-dql-default-pulsar-recovery-0 1/1 Running 0 7h32m 10.104.18.51 4am-node25 <none> <none>
inverted-scene-dql-default-pulsar-zookeeper-0 1/1 Running 0 7h32m 10.104.18.59 4am-node25 <none> <none>
inverted-scene-dql-default-pulsar-zookeeper-1 1/1 Running 0 7h32m 10.104.25.154 4am-node30 <none> <none>
inverted-scene-dql-default-pulsar-zookeeper-2 1/1 Running 0 7h31m 10.104.28.4 4am-node33 <none> <none>
kubectl describe pod inverted-scene-dql-default-milvus-indexnode-84549545cc-mdzst -n qa-milvus
client pod name: fouramf-multi-vector-kx5gb-141787295
Expected Behavior
No response
Steps To Reproduce
concurrent test and calculation of RT and QPS
:purpose: `varchar: different max_length`
verify concurrent DQL scenario which has 3 VARCHAR scalars fields and creating INVERTED index
:test steps:
1. create collection with fields:
'float_vector': 3dim,
'varchar_1': max_length=256, varchar_filled=True
'varchar_2': max_length=32768, varchar_filled=True
'varchar_3': max_length=65535, varchar_filled=True
2. build indexes:
IVF_FLAT: 'float_vector'
DEFAULT index: 'varchar_1', 'varchar_2', 'varchar_3'
3. insert 300k data <- indexNode OOM
Milvus Log
No response
Anything else?
server config:
queryNode:
resources:
limits:
cpu: '8'
memory: 64Gi
requests:
cpu: '8'
memory: 32Gi
replicas: 1
indexNode:
resources:
limits:
cpu: '4.0'
memory: 16Gi
requests:
cpu: '3.0'
memory: 9Gi
replicas: 1
dataNode:
resources:
limits:
cpu: '2.0'
memory: 4Gi
requests:
cpu: '2.0'
memory: 3Gi
cluster:
enabled: true
pulsar: {}
kafka: {}
minio:
metrics:
podMonitor:
enabled: true
etcd:
metrics:
enabled: true
podMonitor:
enabled: true
metrics:
serviceMonitor:
enabled: true
log:
level: debug
image:
all:
repository: harbor.milvus.io/milvus/milvus
tag: master-20240301-36d78e3d-amd64
client config:
dataset_params:
metric_type: L2
dim: 3
scalars_index:
- varchar_1
- varchar_2
- varchar_3
scalars_params:
varchar_1:
params:
max_length: 256
other_params:
varchar_filled: true
varchar_2:
params:
max_length: 32768
other_params:
varchar_filled: true
varchar_3:
params:
max_length: 65535
other_params:
varchar_filled: true
dataset_name: local
dataset_size: 300000
ni_per: 50
collection_params:
other_fields:
- varchar_1
- varchar_2
- varchar_3
shards_num: 2
index_params:
index_type: IVF_FLAT
index_param:
nlist: 1024
concurrent_params:
concurrent_number:
- 50
during_time: 1h
interval: 20
concurrent_tasks:
- type: search
weight: 1
params:
nq: 1000
top_k: 10
search_param:
nprobe: 32
expr: ' varchar_1 like "a%" && varchar_2 like "A%" && varchar_3 like "0%" && id > 0 '
timeout: 60
random_data: true
- type: query
weight: 1
params:
expr: id > -1 &&
output_fields:
- float_vector
timeout: 60
random_data: true
random_count: 10
random_range:
- 0
- 2500000
field_name: id
field_type: int64
reproduced on my local pc.
building Trie index with 100,000 rows whose length is all 65535, the peak memory is almost 14.5GB.
And I also profiled the memory allocations using heaptrack:
It indicates that reading binlog from remote contributes the most peak memory.
That looks weird because ideally each binlog size is fixed.
- we should check when payload write each file, what is it original size, by default is should be less than 64MB.
- if it is 64MB, then the memory it consumed should fully depend on. The concurrency calculation is 10(coefficient) * number of cpus. if you are running on a 4 core cpus, then it might take at most 40 * 64MB data a most which is around 2.5GB. IndexNode has 16GB memories which should be ok?
changing the file size to smaller or reduce the coefficient might help
Marisa-trie doesn't support building index streamingly, then we need to read all binlog into memory. So the peak memory will double the total binlog size now.
Sorry, I made a mistake that I use same data when trying to reproduce this issue on my local pc. If data is almost same, the trie index in fact won't occupy too much memory, since they will share the common prefix. So when trying to build trie, in theory, the peak memory can be three times the size of the total binlog size.
but indexnode should be big right? @wangting0128 how large is the indexnode under you test?
Probably because change segment size to 1GB in 2.4?
16Gi
In this test, the memory resource of indexNode is 16Gi.
I'm doing several comparative tests, test scenes as follows:
- Increase the indexNode memory and verify the memory size used to build the index.
- Using the image before changing the segment size, run the same test scenario to verify the memory size used to build the index.(change segment size pr #30139 )
The comparison test results will be synchronized here after they come out.
16Gi
In this test, the memory resource of indexNode is 16Gi.
I'm doing several comparative tests, test scenes as follows:
- Increase the indexNode memory and verify the memory size used to build the index.
- Using the image before changing the segment size, run the same test scenario to verify the memory size used to build the index.(change segment size pr enhance: Set segment.maxSize param to 1024M #30139 )
The comparison test results will be synchronized here after they come out.
The verification process was blocked by a new issue #31168, the verification will continue after the new issue is fixed.
We can't solve this issue until the segment can be really controlled by the dataCoord.segment.maxSize
. By default, the max size of a segment is 1G, however, in our case, the size of only one varchar column can reach to 3G, which already far exceeds the segment size.
@longjiquan Maybe we should change the flush/compaction size control from estimation to report? This might be a big project but worth to do in the future.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.