milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark][cluster] indexNode OOM in VARCHAR scalars build default index scene

Open wangting0128 opened this issue 11 months ago • 9 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:master-20240301-36d78e3d-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): pulsar   
- SDK version(e.g. pymilvus v2.0.0rc2):2.4.0rc36
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: fouramf-multi-vector-kx5gb

server:

NAME                                                              READY   STATUS             RESTARTS          AGE     IP              NODE         NOMINATED NODE   READINESS GATES
inverted-scene-dql-default-etcd-0                                 1/1     Running            0                 7h32m   10.104.25.151   4am-node30   <none>           <none>
inverted-scene-dql-default-etcd-1                                 1/1     Running            0                 7h32m   10.104.16.163   4am-node21   <none>           <none>
inverted-scene-dql-default-etcd-2                                 1/1     Running            0                 7h32m   10.104.19.238   4am-node28   <none>           <none>
inverted-scene-dql-default-milvus-datacoord-5fcf4b8695-86frx      1/1     Running            0                 7h32m   10.104.6.174    4am-node13   <none>           <none>
inverted-scene-dql-default-milvus-datanode-57c7f6bb77-qprxw       1/1     Running            0                 7h32m   10.104.6.173    4am-node13   <none>           <none>
inverted-scene-dql-default-milvus-indexcoord-6f8fb88cb6-btlb9     1/1     Running            0                 7h32m   10.104.32.14    4am-node39   <none>           <none>
inverted-scene-dql-default-milvus-indexnode-84549545cc-mdzst      0/1     CrashLoopBackOff   68 (2m37s ago)    7h32m   10.104.32.13    4am-node39   <none>           <none>
inverted-scene-dql-default-milvus-proxy-8dbcf7b58-54fwx           1/1     Running            1 (7h28m ago)     7h32m   10.104.32.12    4am-node39   <none>           <none>
inverted-scene-dql-default-milvus-querycoord-54d7d994bf-lwrdx     1/1     Running            0                 7h32m   10.104.6.171    4am-node13   <none>           <none>
inverted-scene-dql-default-milvus-querynode-f9f58cf98-mnsjw       1/1     Running            0                 7h32m   10.104.32.15    4am-node39   <none>           <none>
inverted-scene-dql-default-milvus-rootcoord-b7d565f47-tfpcr       1/1     Running            0                 7h32m   10.104.6.172    4am-node13   <none>           <none>
inverted-scene-dql-default-minio-0                                1/1     Running            0                 7h32m   10.104.18.58    4am-node25   <none>           <none>
inverted-scene-dql-default-minio-1                                1/1     Running            0                 7h32m   10.104.25.150   4am-node30   <none>           <none>
inverted-scene-dql-default-minio-2                                1/1     Running            0                 7h32m   10.104.23.99    4am-node27   <none>           <none>
inverted-scene-dql-default-minio-3                                1/1     Running            0                 7h32m   10.104.29.248   4am-node35   <none>           <none>
inverted-scene-dql-default-pulsar-bookie-0                        1/1     Running            0                 7h32m   10.104.18.60    4am-node25   <none>           <none>
inverted-scene-dql-default-pulsar-bookie-1                        1/1     Running            0                 7h32m   10.104.25.152   4am-node30   <none>           <none>
inverted-scene-dql-default-pulsar-bookie-2                        1/1     Running            0                 7h32m   10.104.29.249   4am-node35   <none>           <none>
inverted-scene-dql-default-pulsar-bookie-init-m4bxl               0/1     Completed          0                 7h32m   10.104.18.52    4am-node25   <none>           <none>
inverted-scene-dql-default-pulsar-broker-0                        1/1     Running            0                 7h32m   10.104.5.8      4am-node12   <none>           <none>
inverted-scene-dql-default-pulsar-proxy-0                         1/1     Running            0                 7h32m   10.104.18.53    4am-node25   <none>           <none>
inverted-scene-dql-default-pulsar-pulsar-init-cjzn7               0/1     Completed          0                 7h32m   10.104.18.50    4am-node25   <none>           <none>
inverted-scene-dql-default-pulsar-recovery-0                      1/1     Running            0                 7h32m   10.104.18.51    4am-node25   <none>           <none>
inverted-scene-dql-default-pulsar-zookeeper-0                     1/1     Running            0                 7h32m   10.104.18.59    4am-node25   <none>           <none>
inverted-scene-dql-default-pulsar-zookeeper-1                     1/1     Running            0                 7h32m   10.104.25.154   4am-node30   <none>           <none>
inverted-scene-dql-default-pulsar-zookeeper-2                     1/1     Running            0                 7h31m   10.104.28.4     4am-node33   <none>           <none>

kubectl describe pod inverted-scene-dql-default-milvus-indexnode-84549545cc-mdzst -n qa-milvus image

client pod name: fouramf-multi-vector-kx5gb-141787295

Expected Behavior

No response

Steps To Reproduce

concurrent test and calculation of RT and QPS

        :purpose:  `varchar: different max_length`
            verify concurrent DQL scenario which has 3 VARCHAR scalars fields and creating INVERTED index

        :test steps:
            1. create collection with fields:
                'float_vector': 3dim,
                'varchar_1': max_length=256, varchar_filled=True
                'varchar_2': max_length=32768, varchar_filled=True
                'varchar_3': max_length=65535, varchar_filled=True
            2. build indexes:
                IVF_FLAT: 'float_vector'
                DEFAULT index: 'varchar_1', 'varchar_2', 'varchar_3'
            3. insert 300k data <- indexNode OOM

Milvus Log

No response

Anything else?

server config:

queryNode:
  resources:
    limits:
      cpu: '8'
      memory: 64Gi
    requests:
      cpu: '8'
      memory: 32Gi
  replicas: 1
indexNode:
  resources:
    limits:
      cpu: '4.0'
      memory: 16Gi
    requests:
      cpu: '3.0'
      memory: 9Gi
  replicas: 1
dataNode:
  resources:
    limits:
      cpu: '2.0'
      memory: 4Gi
    requests:
      cpu: '2.0'
      memory: 3Gi
cluster:
  enabled: true
pulsar: {}
kafka: {}
minio:
  metrics:
    podMonitor:
      enabled: true
etcd:
  metrics:
    enabled: true
    podMonitor:
      enabled: true
metrics:
  serviceMonitor:
    enabled: true
log:
  level: debug
image:
  all:
    repository: harbor.milvus.io/milvus/milvus
    tag: master-20240301-36d78e3d-amd64

client config:

dataset_params:
  metric_type: L2
  dim: 3
  scalars_index:
    - varchar_1
    - varchar_2
    - varchar_3
  scalars_params:
    varchar_1:
      params:
        max_length: 256
      other_params:
        varchar_filled: true
    varchar_2:
      params:
        max_length: 32768
      other_params:
        varchar_filled: true
    varchar_3:
      params:
        max_length: 65535
      other_params:
        varchar_filled: true
  dataset_name: local
  dataset_size: 300000
  ni_per: 50
collection_params:
  other_fields:
    - varchar_1
    - varchar_2
    - varchar_3
  shards_num: 2
index_params:
  index_type: IVF_FLAT
  index_param:
    nlist: 1024
concurrent_params:
  concurrent_number:
    - 50
  during_time: 1h
  interval: 20
concurrent_tasks:
  - type: search
    weight: 1
    params:
      nq: 1000
      top_k: 10
      search_param:
        nprobe: 32
      expr: ' varchar_1 like "a%" && varchar_2 like "A%" && varchar_3 like "0%" && id > 0 '
      timeout: 60
      random_data: true
  - type: query
    weight: 1
    params:
      expr: id > -1 &&
      output_fields:
        - float_vector
      timeout: 60
      random_data: true
      random_count: 10
      random_range:
        - 0
        - 2500000
      field_name: id
      field_type: int64

wangting0128 avatar Mar 01 '24 11:03 wangting0128

reproduced on my local pc. building Trie index with 100,000 rows whose length is all 65535, the peak memory is almost 14.5GB. image

And I also profiled the memory allocations using heaptrack: image It indicates that reading binlog from remote contributes the most peak memory.

longjiquan avatar Mar 06 '24 09:03 longjiquan

That looks weird because ideally each binlog size is fixed.

  1. we should check when payload write each file, what is it original size, by default is should be less than 64MB.
  2. if it is 64MB, then the memory it consumed should fully depend on. The concurrency calculation is 10(coefficient) * number of cpus. if you are running on a 4 core cpus, then it might take at most 40 * 64MB data a most which is around 2.5GB. IndexNode has 16GB memories which should be ok?

changing the file size to smaller or reduce the coefficient might help

xiaofan-luan avatar Mar 06 '24 19:03 xiaofan-luan

Marisa-trie doesn't support building index streamingly, then we need to read all binlog into memory. So the peak memory will double the total binlog size now.

longjiquan avatar Mar 07 '24 02:03 longjiquan

Sorry, I made a mistake that I use same data when trying to reproduce this issue on my local pc. If data is almost same, the trie index in fact won't occupy too much memory, since they will share the common prefix. So when trying to build trie, in theory, the peak memory can be three times the size of the total binlog size.

longjiquan avatar Mar 07 '24 09:03 longjiquan

but indexnode should be big right? @wangting0128 how large is the indexnode under you test?

Probably because change segment size to 1GB in 2.4?

xiaofan-luan avatar Mar 07 '24 19:03 xiaofan-luan

16Gi

In this test, the memory resource of indexNode is 16Gi.

I'm doing several comparative tests, test scenes as follows:

  1. Increase the indexNode memory and verify the memory size used to build the index.
  2. Using the image before changing the segment size, run the same test scenario to verify the memory size used to build the index.(change segment size pr #30139 )

The comparison test results will be synchronized here after they come out.

wangting0128 avatar Mar 08 '24 02:03 wangting0128

16Gi

In this test, the memory resource of indexNode is 16Gi.

I'm doing several comparative tests, test scenes as follows:

  1. Increase the indexNode memory and verify the memory size used to build the index.
  2. Using the image before changing the segment size, run the same test scenario to verify the memory size used to build the index.(change segment size pr enhance: Set segment.maxSize param to 1024M #30139

The comparison test results will be synchronized here after they come out.

The verification process was blocked by a new issue #31168, the verification will continue after the new issue is fixed.

wangting0128 avatar Mar 11 '24 03:03 wangting0128

We can't solve this issue until the segment can be really controlled by the dataCoord.segment.maxSize. By default, the max size of a segment is 1G, however, in our case, the size of only one varchar column can reach to 3G, which already far exceeds the segment size.

longjiquan avatar Mar 20 '24 08:03 longjiquan

@longjiquan Maybe we should change the flush/compaction size control from estimation to report? This might be a big project but worth to do in the future.

xiaofan-luan avatar Mar 22 '24 00:03 xiaofan-luan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jun 10 '24 06:06 stale[bot]