milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [benchmark] milvus insert data datanode memory rise

Open elstic opened this issue 1 year ago • 17 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.2.0-20230410-d845175f
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

1b data continuous search test scenario, before inserting 1b data datanode memory use no more than 1.4G, but now the same insert frequency, datanode memory use more than 3G, and is still growing.

image: 2.1.0-20220726-1b33c731 datanode memory usage (709m + 733m combined does not exceed 1.4G) :
image image

image: 2.2.0-20230410-d845175f

qtp-1b-test-lbcyr-etcd-0                                          1/1     Running     0               6m23s   10.104.5.110    4am-node12   <none>           <none>
qtp-1b-test-lbcyr-etcd-1                                          1/1     Running     0               6m22s   10.104.6.235    4am-node13   <none>           <none>
qtp-1b-test-lbcyr-etcd-2                                          1/1     Running     0               6m22s   10.104.4.48     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-milvus-datacoord-5b489d5f57-dnvw7               1/1     Running     0               6m23s   10.104.4.31     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-milvus-datanode-5888986546-9kkjt                1/1     Running     1 (2m22s ago)   6m23s   10.104.6.228    4am-node13   <none>           <none>
qtp-1b-test-lbcyr-milvus-indexcoord-6bd8dd4d7-jrjj2               1/1     Running     1 (2m22s ago)   6m23s   10.104.4.33     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-milvus-indexnode-58b84db4c8-ctldx               1/1     Running     0               6m23s   10.104.9.122    4am-node14   <none>           <none>
qtp-1b-test-lbcyr-milvus-proxy-7b9c67b545-st8ch                   1/1     Running     1 (2m22s ago)   6m23s   10.104.5.101    4am-node12   <none>           <none>
qtp-1b-test-lbcyr-milvus-querycoord-695fb8f5b-nfmws               1/1     Running     1 (2m22s ago)   6m23s   10.104.4.35     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-milvus-querynode-8574948fc4-kn8jz               1/1     Running     0               6m23s   10.104.6.227    4am-node13   <none>           <none>
qtp-1b-test-lbcyr-milvus-querynode-8574948fc4-l24r2               1/1     Running     0               6m23s   10.104.9.119    4am-node14   <none>           <none>
qtp-1b-test-lbcyr-milvus-querynode-8574948fc4-p5cbs               1/1     Running     0               6m23s   10.104.4.37     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-milvus-querynode-8574948fc4-p5rjt               1/1     Running     0               6m23s   10.104.5.100    4am-node12   <none>           <none>
qtp-1b-test-lbcyr-milvus-querynode-8574948fc4-vr7vc               1/1     Running     0               6m23s   10.104.4.40     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-milvus-querynode-8574948fc4-xf8cn               1/1     Running     0               6m23s   10.104.1.120    4am-node10   <none>           <none>
qtp-1b-test-lbcyr-milvus-rootcoord-5fb7645c68-mnpcb               1/1     Running     1 (2m22s ago)   6m23s   10.104.4.34     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-minio-0                                         1/1     Running     0               6m23s   10.104.5.109    4am-node12   <none>           <none>
qtp-1b-test-lbcyr-minio-1                                         1/1     Running     0               6m23s   10.104.6.234    4am-node13   <none>           <none>
qtp-1b-test-lbcyr-minio-2                                         1/1     Running     0               6m22s   10.104.4.46     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-minio-3                                         1/1     Running     0               6m22s   10.104.9.124    4am-node14   <none>           <none>
qtp-1b-test-lbcyr-pulsar-bookie-0                                 1/1     Running     0               6m23s   10.104.5.107    4am-node12   <none>           <none>
qtp-1b-test-lbcyr-pulsar-bookie-1                                 1/1     Running     0               6m23s   10.104.6.233    4am-node13   <none>           <none>
qtp-1b-test-lbcyr-pulsar-bookie-2                                 1/1     Running     0               6m22s   10.104.4.45     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-pulsar-bookie-init-qk8wj                        0/1     Completed   0               6m23s   10.104.4.38     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-pulsar-broker-0                                 1/1     Running     0               6m23s   10.104.4.36     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-pulsar-proxy-0                                  1/1     Running     0               6m23s   10.104.5.99     4am-node12   <none>           <none>
qtp-1b-test-lbcyr-pulsar-pulsar-init-q6n6t                        0/1     Completed   0               6m23s   10.104.4.32     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-pulsar-recovery-0                               1/1     Running     0               6m23s   10.104.4.39     4am-node11   <none>           <none>
qtp-1b-test-lbcyr-pulsar-zookeeper-0                              1/1     Running     0               6m23s   10.104.5.106    4am-node12   <none>           <none>
qtp-1b-test-lbcyr-pulsar-zookeeper-1                              1/1     Running     0               5m44s   10.104.6.237    4am-node13   <none>           <none>
qtp-1b-test-lbcyr-pulsar-zookeeper-2                              1/1     Running     0               5m2s    10.104.4.54     4am-node11   <none>           <none>

datanode memory usage: image

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

elstic avatar Apr 11 '23 08:04 elstic

can we change the pod limit to 2G and increase the ingest concurrency and see if it OOMed?

xiaofan-luan avatar Apr 11 '23 18:04 xiaofan-luan

/assign @elstic please retry as suggested above, and did this run with datanode.memory.forceSynceEnable=True?

yanliang567 avatar Apr 12 '23 01:04 yanliang567

can we change the pod limit to 2G and increase the ingest concurrency and see if it OOMed?

/assign @elstic please retry as suggested above, and did this run with datanode.memory.forceSynceEnable=True?

I will retry and comment the results here. @yanliang567 Yes, forceEnable is true on images starting with 2.2.0

elstic avatar Apr 12 '23 02:04 elstic

can we change the pod limit to 2G and increase the ingest concurrency and see if it OOMed?

Set limit 2G and no OOM occurs. Just insert deny . use image: 2.2.0-20230412-51f5a128

image

client error log :

[2023-04-13 15:45:28,238 -  INFO - fouram]: [Base] Start inserting, ids: 69450000 - 69499999, data size: 1,000,000,000 (base.py:157)
[2023-04-13 15:45:29,277 -  INFO - fouram]: [Time] Collection.insert run in 1.0378s (api_request.py:41)
[2023-04-13 15:45:29,279 -  INFO - fouram]: [Base] Number of vectors in the collection(fouram_4sfUyz8B): 69400000 (base.py:305)
[2023-04-13 15:45:30,629 -  INFO - fouram]: [Base] Start inserting, ids: 69500000 - 69549999, data size: 1,000,000,000 (base.py:157)
[2023-04-13 15:45:31,350 - ERROR - fouram]: RPC error: [batch_insert], <MilvusException: (code=53, message=deny to write, reason: memory quota exhausted, please allocate more resources, req: /milvus.proto.milvus.MilvusService/Insert)>, <Time:{'RPC start': '2023-04-13 15:45:30.652223', 'RPC error': '2023-04-13 15:45:31.350141'}> (decorators.py:108)
[2023-04-13 15:45:31,504 - ERROR - fouram]: Traceback (most recent call last):
  File "/src/fouram/client/util/api_request.py", line 33, in inner_wrapper
    res = func(*args, **kwargs)
  File "/src/fouram/client/util/api_request.py", line 70, in api_request
    return func(*arg, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/orm/collection.py", line 430, in insert
    res = conn.batch_insert(self._name, entities, partition_name,
  File "/usr/local/lib/python3.8/dist-packages/pymilvus/decorators.py", line 109, in handler
    raise e


pymilvus.exceptions.MilvusException: <MilvusException: (code=53, message=deny to write, reason: memory quota exhausted, please allocate more resources, req: /milvus.proto.milvus.MilvusService/Insert)>
 (api_request.py:48)
[2023-04-13 15:45:31,505 - ERROR - fouram]: (api_response) : <MilvusException: (code=53, message=deny to write, reason: memory quota exhausted, please allocate more resources, req: /milvus.proto.milvus.MilvusService/Insert)> (api_request.py:49)
[2023-04-13 15:45:31,505 - ERROR - fouram]: [CheckFunc] insert request check failed, response:<MilvusException: (code=53, message=deny to write, reason: memory quota exhausted, please allocate more resources, req: /milvus.proto.milvus.MilvusService/Insert)> (func_check.py:49)

server:

fouramf-9m29m-3-9553-etcd-0                                       1/1     Running     0               6m4s    10.104.6.112   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-etcd-1                                       1/1     Running     0               6m4s    10.104.9.38    4am-node14   <none>           <none>
fouramf-9m29m-3-9553-etcd-2                                       1/1     Running     0               6m4s    10.104.5.58    4am-node12   <none>           <none>
fouramf-9m29m-3-9553-milvus-datacoord-559f6ff7f5-wppvc            1/1     Running     1 (2m3s ago)    6m4s    10.104.6.99    4am-node13   <none>           <none>
fouramf-9m29m-3-9553-milvus-datanode-599d7f56f4-2hqrp             1/1     Running     1 (2m2s ago)    6m4s    10.104.6.103   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-milvus-indexcoord-57658f74-wsd79             1/1     Running     1 (2m3s ago)    6m4s    10.104.6.98    4am-node13   <none>           <none>
fouramf-9m29m-3-9553-milvus-indexnode-5b47f64857-q4v7d            1/1     Running     0               6m4s    10.104.1.171   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-milvus-proxy-6fdb748466-kjtsw                1/1     Running     1 (2m2s ago)    6m4s    10.104.1.172   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-milvus-querycoord-65b45474f-9dfb4            1/1     Running     1 (2m2s ago)    6m4s    10.104.1.169   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-milvus-querynode-846997cfbf-2d8x4            1/1     Running     0               6m4s    10.104.9.35    4am-node14   <none>           <none>
fouramf-9m29m-3-9553-milvus-querynode-846997cfbf-44qzk            1/1     Running     0               6m4s    10.104.6.106   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-milvus-querynode-846997cfbf-fdbqb            1/1     Running     0               6m4s    10.104.5.55    4am-node12   <none>           <none>
fouramf-9m29m-3-9553-milvus-querynode-846997cfbf-h8cwd            0/1     Pending     0               6m4s    <none>         <none>       <none>           <none>
fouramf-9m29m-3-9553-milvus-querynode-846997cfbf-l7zll            1/1     Running     0               6m4s    10.104.1.174   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-milvus-querynode-846997cfbf-r2vtb            1/1     Running     0               6m4s    10.104.4.211   4am-node11   <none>           <none>
fouramf-9m29m-3-9553-milvus-rootcoord-69c697f96c-fpqg9            1/1     Running     1 (2m2s ago)    6m4s    10.104.6.104   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-minio-0                                      1/1     Running     0               6m4s    10.104.6.116   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-minio-1                                      1/1     Running     0               6m3s    10.104.1.183   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-minio-2                                      1/1     Running     0               6m3s    10.104.5.60    4am-node12   <none>           <none>
fouramf-9m29m-3-9553-minio-3                                      1/1     Running     0               6m3s    10.104.9.42    4am-node14   <none>           <none>
fouramf-9m29m-3-9553-pulsar-bookie-0                              1/1     Running     0               6m4s    10.104.6.114   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-pulsar-bookie-1                              1/1     Running     0               6m4s    10.104.9.40    4am-node14   <none>           <none>
fouramf-9m29m-3-9553-pulsar-bookie-2                              1/1     Running     0               6m3s    10.104.1.184   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-pulsar-bookie-init-rxn5h                     0/1     Completed   0               6m4s    10.104.6.101   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-pulsar-broker-0                              1/1     Running     0               6m4s    10.104.6.97    4am-node13   <none>           <none>
fouramf-9m29m-3-9553-pulsar-proxy-0                               1/1     Running     0               6m4s    10.104.1.173   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-pulsar-pulsar-init-ln7hh                     0/1     Completed   0               6m4s    10.104.6.102   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-pulsar-recovery-0                            1/1     Running     0               6m4s    10.104.6.105   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-pulsar-zookeeper-0                           1/1     Running     0               6m4s    10.104.6.113   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-pulsar-zookeeper-1                           1/1     Running     0               5m6s    10.104.1.186   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-pulsar-zookeeper-2                           1/1     Running     0               4m17s   10.104.4.215   4am-node11   <none>           <none> (base.py:173)

NAME                                                              READY   STATUS      RESTARTS        AGE     IP             NODE         NOMINATED NODE   READINESS GATES
fouramf-9m29m-3-9553-etcd-0                                       1/1     Running     0               57m     10.104.6.112   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-etcd-1                                       1/1     Running     0               57m     10.104.9.38    4am-node14   <none>           <none>
fouramf-9m29m-3-9553-etcd-2                                       1/1     Running     0               57m     10.104.5.58    4am-node12   <none>           <none>
fouramf-9m29m-3-9553-milvus-datacoord-559f6ff7f5-wppvc            1/1     Running     1 (53m ago)     57m     10.104.6.99    4am-node13   <none>           <none>
fouramf-9m29m-3-9553-milvus-datanode-599d7f56f4-2hqrp             0/1     Running     2 (9s ago)      57m     10.104.6.103   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-milvus-indexcoord-57658f74-wsd79             1/1     Running     1 (53m ago)     57m     10.104.6.98    4am-node13   <none>           <none>
fouramf-9m29m-3-9553-milvus-indexnode-5b47f64857-q4v7d            1/1     Running     0               57m     10.104.1.171   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-milvus-proxy-6fdb748466-kjtsw                1/1     Running     1 (53m ago)     57m     10.104.1.172   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-milvus-querycoord-65b45474f-9dfb4            1/1     Running     1 (53m ago)     57m     10.104.1.169   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-milvus-querynode-846997cfbf-2d8x4            1/1     Running     0               57m     10.104.9.35    4am-node14   <none>           <none>
fouramf-9m29m-3-9553-milvus-querynode-846997cfbf-44qzk            1/1     Running     0               57m     10.104.6.106   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-milvus-querynode-846997cfbf-fdbqb            1/1     Running     0               57m     10.104.5.55    4am-node12   <none>           <none>
fouramf-9m29m-3-9553-milvus-querynode-846997cfbf-h8cwd            1/1     Running     0               57m     10.104.5.62    4am-node12   <none>           <none>
fouramf-9m29m-3-9553-milvus-querynode-846997cfbf-l7zll            1/1     Running     0               57m     10.104.1.174   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-milvus-querynode-846997cfbf-r2vtb            1/1     Running     0               57m     10.104.4.211   4am-node11   <none>           <none>
fouramf-9m29m-3-9553-milvus-rootcoord-69c697f96c-fpqg9            1/1     Running     1 (53m ago)     57m     10.104.6.104   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-minio-0                                      1/1     Running     0               57m     10.104.6.116   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-minio-1                                      1/1     Running     0               57m     10.104.1.183   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-minio-2                                      1/1     Running     0               57m     10.104.5.60    4am-node12   <none>           <none>
fouramf-9m29m-3-9553-minio-3                                      1/1     Running     0               57m     10.104.9.42    4am-node14   <none>           <none>
fouramf-9m29m-3-9553-pulsar-bookie-0                              1/1     Running     0               57m     10.104.6.114   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-pulsar-bookie-1                              1/1     Running     0               57m     10.104.9.40    4am-node14   <none>           <none>
fouramf-9m29m-3-9553-pulsar-bookie-2                              1/1     Running     0               57m     10.104.1.184   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-pulsar-bookie-init-rxn5h                     0/1     Completed   0               57m     10.104.6.101   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-pulsar-broker-0                              1/1     Running     0               57m     10.104.6.97    4am-node13   <none>           <none>
fouramf-9m29m-3-9553-pulsar-proxy-0                               1/1     Running     0               57m     10.104.1.173   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-pulsar-pulsar-init-ln7hh                     0/1     Completed   0               57m     10.104.6.102   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-pulsar-recovery-0                            1/1     Running     0               57m     10.104.6.105   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-pulsar-zookeeper-0                           1/1     Running     0               57m     10.104.6.113   4am-node13   <none>           <none>
fouramf-9m29m-3-9553-pulsar-zookeeper-1                           1/1     Running     0               56m     10.104.1.186   4am-node10   <none>           <none>
fouramf-9m29m-3-9553-pulsar-zookeeper-2                           1/1     Running     0               55m     10.104.4.215   4am-node11   <none>           <none>

elstic avatar Apr 14 '23 02:04 elstic

/assign @jiaoew1991 /unassign @elstic @yanliang567

yanliang567 avatar Apr 14 '23 02:04 yanliang567

/assign @bigsheeper

could you take a look why the force flush is not happen?

xiaofan-luan avatar Apr 14 '23 16:04 xiaofan-luan

working on it

bigsheeper avatar Apr 15 '23 11:04 bigsheeper

Test 1 Using pprof to analyze DataNode memory. Result:

  • About 50% memory consumed by insertBuffer;
  • About 50% memory consumed by msgstream receive buffer;

pprof001(1)

bigsheeper avatar Apr 27 '23 06:04 bigsheeper

Test 2 Setting msgstream receive buffer size from 1024 to 10, then using pprof to analyze. Result:

  • About 90% memory consumed by insertBuffer;

pprof001

InsertBuffer consumed about 2GB, however, DataNode memory usage is about 5GB, so I guess C/C++ consumed a lot of memory which we still didn’t know.

image

bigsheeper avatar Apr 27 '23 06:04 bigsheeper

Test 3 I assume that the arrow payload writer in C++ consumed a lot of memroy, so I added a log to print the arrow memory pool size. Result:

  • Arrow memory pool consumed about only 80MB.

image

image

I'll try jeprof/heaptrack to analyze the C/C++ memory.

bigsheeper avatar Apr 27 '23 06:04 bigsheeper

Test 4 I'm attempting to replicate this issue on my local machine and I have observed that DataNode's memory usage increases gradully over time. I used heaptrack to analyze. Result:

  • There is an unusual cmalloc memory consumption in the FlameGraph.

image

heaptrack.milvus.175568.gz

bigsheeper avatar Apr 28 '23 08:04 bigsheeper

https://github.com/milvus-io/milvus/pull/23138 upgraded arrow, may resolve this issue, please help to make a verify @elstic

bigsheeper avatar May 24 '23 13:05 bigsheeper

/assign @elstic

bigsheeper avatar May 24 '23 13:05 bigsheeper

/assign @elstic

After verification, datanode memory rise still exists . use image: 2.2.0-20230525-ef1a671d argo task : fouramf-4vkjr

elstic avatar May 26 '23 02:05 elstic

/unassign @bigsheeper

elstic avatar May 26 '23 02:05 elstic

https://github.com/milvus-io/milvus/pull/24656 with go payload wirter, datanode oom issue may be resolved.

bigsheeper avatar Jun 08 '23 11:06 bigsheeper

/assign @elstic please help to make a verify @elstic

bigsheeper avatar Jun 08 '23 11:06 bigsheeper

This issue still exists.

image: 2.2.0-20230608-a03ebcff

image

elstic avatar Jun 12 '23 02:06 elstic

this is as expected? unless datanode reaches certain threshold ton do auto flush?

xiaofan-luan avatar Jun 12 '23 05:06 xiaofan-luan

this is as expected? unless datanode reaches certain threshold ton do auto flush?

This is not expected, before we 20 concurrently inserted data, daatanode memory does not rise

elstic avatar Jun 12 '23 06:06 elstic

datanode should reserve pod total memory * 0.5, anything below this is as expected

xiaofan-luan avatar Jun 12 '23 06:06 xiaofan-luan

we only flush under memory pressure. If the pod memory limit is large, milvus will try to use 50% of the memory as data cache

xiaofan-luan avatar Jun 12 '23 06:06 xiaofan-luan

if your data pod is 8g memory, we would expect to use a little bit more than 4GB memory

xiaofan-luan avatar Jun 12 '23 06:06 xiaofan-luan

This issue still exists. Reboot after datanode memory reaches limit .

case: test_concurrent_locust_100m_hnsw_ddl_dql_filter_output_kafka_cluster argo task : fouramf-t24hx during time : 192h

memory usage : image

datanode memory usage: image

Steps:

1. create a collection or use an existing collection  
  2. build an HNSW index on the vector column
  3. insert 100 million vectors
  4. flush collection
  5. build index on vector column with the same parameters  
  6. count the total number of rows
  7. load collection
  8. execute concurrent search, query,load,scene_test 
    (scene_test steps: 
       1) Create a collection 2) Insert 3000 pieces of data  3) flush collection 
       4) Create an index  5) drop collection)
  9.  step 8 lasts 192h 

server:

fouramf-t24hx-30-7593-etcd-0                                      1/1     Running                       0                  8d      10.104.17.130   4am-node23   <none>           <none>
fouramf-t24hx-30-7593-etcd-1                                      1/1     Running                       0                  8d      10.104.4.163    4am-node11   <none>           <none>
fouramf-t24hx-30-7593-etcd-2                                      1/1     Running                       0                  8d      10.104.14.213   4am-node18   <none>           <none>
fouramf-t24hx-30-7593-kafka-0                                     1/1     Running                       1 (8d ago)         8d      10.104.14.212   4am-node18   <none>           <none>
fouramf-t24hx-30-7593-kafka-1                                     1/1     Running                       0                  8d      10.104.1.104    4am-node10   <none>           <none>
fouramf-t24hx-30-7593-kafka-2                                     1/1     Running                       0                  8d      10.104.13.175   4am-node16   <none>           <none>
fouramf-t24hx-30-7593-milvus-datacoord-6d67b686b5-xnx96           1/1     Running                       0                  8d      10.104.19.52    4am-node28   <none>           <none>
fouramf-t24hx-30-7593-milvus-datanode-7c469b8bdc-vstfh            1/1     Running                       2 (2d2h ago)       8d      10.104.19.53    4am-node28   <none>           <none>
fouramf-t24hx-30-7593-milvus-indexcoord-6c847cf9f8-4r6mm          1/1     Running                       0                  8d      10.104.4.160    4am-node11   <none>           <none>
fouramf-t24hx-30-7593-milvus-indexnode-86bdff87b7-nqrgn           1/1     Running                       0                  8d      10.104.17.124   4am-node23   <none>           <none>
fouramf-t24hx-30-7593-milvus-proxy-6958c8fcb4-q7m2v               1/1     Running                       0                  8d      10.104.4.158    4am-node11   <none>           <none>
fouramf-t24hx-30-7593-milvus-querycoord-7595cdff77-p9w8d          1/1     Running                       0                  8d      10.104.19.54    4am-node28   <none>           <none>
fouramf-t24hx-30-7593-milvus-querynode-7f6d9d6bf5-p6tk6           1/1     Running                       0                  8d      10.104.19.55    4am-node28   <none>           <none>
fouramf-t24hx-30-7593-milvus-querynode-7f6d9d6bf5-vt4gq           1/1     Running                       0                  8d      10.104.4.161    4am-node11   <none>           <none>
fouramf-t24hx-30-7593-milvus-rootcoord-7b4cb8fd4c-kz976           1/1     Running                       0                  8d      10.104.19.51    4am-node28   <none>           <none>
fouramf-t24hx-30-7593-minio-0                                     1/1     Running                       0                  8d      10.104.17.132   4am-node23   <none>           <none>
fouramf-t24hx-30-7593-minio-1                                     1/1     Running                       0                  8d      10.104.4.165    4am-node11   <none>           <none>
fouramf-t24hx-30-7593-minio-2                                     1/1     Running                       0                  8d      10.104.5.186    4am-node12   <none>           <none>
fouramf-t24hx-30-7593-minio-3                                     1/1     Running                       0                  8d      10.104.21.223   4am-node24   <none> 

elstic avatar Oct 08 '23 06:10 elstic

didn't really understand. Datanode memory is only a few giga bytes. It rise becasue datanode accumulate data in memory and decrease when flush happened

xiaofan-luan avatar Oct 08 '23 08:10 xiaofan-luan

3g memory seems to be very reasonable. You can tune the param of datanode to decrease memory usage

xiaofan-luan avatar Oct 08 '23 08:10 xiaofan-luan

didn't really understand. Datanode memory is only a few giga bytes. It rise becasue datanode accumulate data in memory and decrease when flush happened

Actually the memory usage decreased not because of flush, but OOM and restarted. image

yanliang567 avatar Oct 08 '23 08:10 yanliang567

didn't really understand. Datanode memory is only a few giga bytes. It rise becasue datanode accumulate data in memory and decrease when flush happened

Actually the memory usage decreased not because of flush, but OOM and restarted. image

How much memory does datanode have?

xiaofan-luan avatar Oct 08 '23 08:10 xiaofan-luan

datanode start to flush with 50% of the datanode allocated memory.

xiaofan-luan avatar Oct 08 '23 08:10 xiaofan-luan

didn't really understand. Datanode memory is only a few giga bytes. It rise becasue datanode accumulate data in memory and decrease when flush happened

Actually the memory usage decreased not because of flush, but OOM and restarted. image

How much memory does datanode have?

dataNode.resources.limits.cpu=2.0,dataNode.resources.limits.memory=4Gi,dataNode.resources.requests.cpu=2.0,dataNode.resources.requests.memory=3Gi

elstic avatar Oct 08 '23 09:10 elstic