milvus
milvus copied to clipboard
[Bug]: [benchmark][cluster]Milvus datanode memory grows suddenly when it is inserted
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20220617-074ec306
- Deployment mode(standalone or cluster):cluster
- SDK version(e.g. pymilvus v2.0.0rc2):2.1.0dev78
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
argo
test-etcd-no-clean-zmhf6
server-configmap
server-cluster-8c64m
client-configmap
client-random-locust-100m-ddl-r8-w2
server:
test-etcd-no-clean-zmhf6-1-0 1/1 Running 0 4m12s 10.97.17.140 qa-node014.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-1 1/1 Running 0 4m12s 10.97.16.218 qa-node013.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-2 1/1 Running 0 4m12s 10.97.17.145 qa-node014.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-milvus-datacoord-fd959869f-fsdp7 1/1 Running 0 4m12s 10.97.5.217 qa-node003.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-milvus-datanode-f58fd88c4-r82w5 1/1 Running 0 4m12s 10.97.20.196 qa-node018.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-milvus-indexcoord-585468c64-gb4wh 1/1 Running 0 4m12s 10.97.5.216 qa-node003.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-milvus-indexnode-84965bc4bf-kssm4 1/1 Running 0 4m12s 10.97.11.27 qa-node009.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-milvus-proxy-574875d9fd-h9lxp 1/1 Running 0 4m12s 10.97.5.215 qa-node003.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-milvus-querycoord-858d8c9c95-wfpdg 1/1 Running 0 4m12s 10.97.4.163 qa-node002.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-milvus-querynode-7fb886d44c-pwm4p 1/1 Running 0 4m12s 10.97.17.138 qa-node014.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-milvus-rootcoord-7cbb97f4f5-8sfnl 1/1 Running 0 4m12s 10.97.4.162 qa-node002.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-minio-0 1/1 Running 0 4m12s 10.97.19.174 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-minio-1 1/1 Running 0 4m12s 10.97.19.175 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-minio-2 1/1 Running 0 4m11s 10.97.19.178 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-minio-3 1/1 Running 0 4m11s 10.97.19.180 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-pulsar-bookie-0 1/1 Running 0 4m12s 10.97.19.176 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-pulsar-bookie-1 1/1 Running 0 4m12s 10.97.17.144 qa-node014.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-pulsar-bookie-2 1/1 Running 0 4m11s 10.97.16.221 qa-node013.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-pulsar-bookie-init-hnmk2 0/1 Completed 0 4m12s 10.97.17.137 qa-node014.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-pulsar-broker-0 1/1 Running 0 4m12s 10.97.10.94 qa-node008.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-pulsar-proxy-0 1/1 Running 0 4m12s 10.97.19.169 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-pulsar-pulsar-init-5mbnm 0/1 Completed 0 4m12s 10.97.11.26 qa-node009.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-pulsar-recovery-0 1/1 Running 0 4m12s 10.97.20.195 qa-node018.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-pulsar-zookeeper-0 1/1 Running 0 4m12s 10.97.10.96 qa-node008.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-pulsar-zookeeper-1 1/1 Running 0 3m32s 10.97.19.182 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-zmhf6-1-pulsar-zookeeper-2 1/1 Running 0 3m 10.97.11.29 qa-node009.zilliz.local <none> <none>
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response
server-instance test-etcd-no-clean-p9vrm-1 server-configmap server-cluster-8c64m client-configmap client-random-locust-100m-ddl-r8-w2-1h
master-20220620-f123d657 2.1.0dev78
test-etcd-no-clean-p9vrm-1-0 1/1 Running 6 72m 10.97.17.160 qa-node014.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-1 1/1 Running 4 72m 10.97.16.209 qa-node013.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-2 1/1 Running 0 72m 10.97.17.161 qa-node014.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-milvus-datacoord-7dbdc67476-29pdk 1/1 Running 12 72m 10.97.10.161 qa-node008.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-milvus-datanode-88b998d5c-2mbrd 1/1 Running 13 72m 10.97.17.157 qa-node014.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-milvus-indexcoord-7d6985579f-9q2vk 1/1 Running 13 72m 10.97.12.57 qa-node015.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-milvus-indexnode-9496f574b-njp6t 1/1 Running 10 72m 10.97.20.220 qa-node018.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-milvus-proxy-589494fd79-b94wt 1/1 Running 13 72m 10.97.10.160 qa-node008.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-milvus-querycoord-58b755795b-mdjwc 1/1 Running 12 72m 10.97.11.220 qa-node009.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-milvus-querynode-594b594d44-6gvd9 1/1 Running 10 72m 10.97.11.221 qa-node009.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-milvus-rootcoord-69c77fcf9b-pckmk 1/1 Running 12 72m 10.97.12.58 qa-node015.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-minio-0 1/1 Running 0 72m 10.97.19.73 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-minio-1 1/1 Running 0 72m 10.97.19.78 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-minio-2 1/1 Running 0 72m 10.97.19.75 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-minio-3 1/1 Running 0 72m 10.97.19.97 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-pulsar-bookie-0 1/1 Running 0 72m 10.97.10.176 qa-node008.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-pulsar-bookie-1 1/1 Running 0 72m 10.97.19.79 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-pulsar-bookie-2 1/1 Running 0 72m 10.97.10.181 qa-node008.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-pulsar-bookie-init-64897 0/1 Completed 0 72m 10.97.12.55 qa-node015.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-pulsar-broker-0 1/1 Running 0 72m 10.97.12.56 qa-node015.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-pulsar-proxy-0 1/1 Running 0 72m 10.97.19.64 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-pulsar-pulsar-init-nffqv 0/1 Completed 0 72m 10.97.10.162 qa-node008.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-pulsar-recovery-0 1/1 Running 0 72m 10.97.12.54 qa-node015.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-pulsar-zookeeper-0 1/1 Running 0 72m 10.97.10.175 qa-node008.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-pulsar-zookeeper-1 1/1 Running 0 58m 10.97.10.183 qa-node008.zilliz.local <none> <none>
test-etcd-no-clean-p9vrm-1-pulsar-zookeeper-2 1/1 Running 0 16m 10.97.5.25 qa-node003.zilliz.local <none> <none>

I have discovered that the memory peak matches exactly with every compaction:
Compaction log:
Jun 20, 2022 @ 19:37:26.900 [2022/06/20 11:37:26.900 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434038386687475714] ["timeout in seconds"=180]
Jun 20, 2022 @ 19:36:29.081 [2022/06/20 11:36:29.081 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434038371535290370] ["timeout in seconds"=180]
Jun 20, 2022 @ 19:29:49.278 [2022/06/20 11:29:49.277 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434038266717011970] ["timeout in seconds"=180]
Jun 20, 2022 @ 19:29:45.824 [2022/06/20 11:29:45.824 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434038265812615170] ["timeout in seconds"=180]
Jun 20, 2022 @ 19:23:04.028 [2022/06/20 11:23:04.028 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434038160496263169] ["timeout in seconds"=180]
Jun 20, 2022 @ 19:23:00.650 [2022/06/20 11:23:00.650 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434038159605235714] ["timeout in seconds"=180]
Jun 20, 2022 @ 19:16:26.379 [2022/06/20 11:16:26.379 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434038056254701570] ["timeout in seconds"=180]
Jun 20, 2022 @ 19:16:23.030 [2022/06/20 11:16:23.030 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434038055376519170] ["timeout in seconds"=180]
Jun 20, 2022 @ 19:09:39.579 [2022/06/20 11:09:39.579 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037949614522369] ["timeout in seconds"=180]
Jun 20, 2022 @ 19:09:28.634 [2022/06/20 11:09:28.634 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037946744307714] ["timeout in seconds"=180]
Jun 20, 2022 @ 19:02:30.435 [2022/06/20 11:02:30.435 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037837115686914] ["timeout in seconds"=180]
Jun 20, 2022 @ 19:02:23.626 [2022/06/20 11:02:23.626 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037835319738370] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:55:48.090 [2022/06/20 10:55:48.090 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037731641786372] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:55:48.089 [2022/06/20 10:55:48.089 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037731641786370] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:49:04.882 [2022/06/20 10:49:04.882 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037625945587715] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:49:04.881 [2022/06/20 10:49:04.881 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037625945587713] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:41:48.291 [2022/06/20 10:41:48.291 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037511493255170] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:41:48.291 [2022/06/20 10:41:48.291 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037511493255172] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:33:53.690 [2022/06/20 10:33:53.690 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037387079974916] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:33:53.689 [2022/06/20 10:33:53.689 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037387079974914] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:26:01.345 [2022/06/20 10:26:01.344 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037263243149314] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:26:00.226 [2022/06/20 10:26:00.226 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037262954528774] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:18:11.283 [2022/06/20 10:18:11.283 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037140035207170] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:18:11.283 [2022/06/20 10:18:11.283 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037140035207172] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:11:26.900 [2022/06/20 10:11:26.900 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037034024435719] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:10:30.628 [2022/06/20 10:10:30.628 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434037019278573569] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:02:43.087 [2022/06/20 10:02:43.087 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434036896713408517] ["timeout in seconds"=180]
Jun 20, 2022 @ 18:02:43.080 [2022/06/20 10:02:43.080 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434036896713408514] ["timeout in seconds"=180]
Jun 20, 2022 @ 17:54:42.080 [2022/06/20 09:54:42.080 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434036770621882371] ["timeout in seconds"=180]
Jun 20, 2022 @ 17:54:42.078 [2022/06/20 09:54:42.078 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434036770608775170] ["timeout in seconds"=180]
Jun 20, 2022 @ 17:46:56.773 [2022/06/20 09:46:56.773 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434036648633171970] ["timeout in seconds"=180]
Jun 20, 2022 @ 17:46:55.431 [2022/06/20 09:46:55.431 +00:00] [DEBUG] [compactor.go:350] ["compaction start"] [planID=434036648292646914] ["timeout in seconds"=180]
```
434038265812615170
Good catch, we probably want to add what's inside a compaction plan each time. Mean while we will quickly go through the compaction plan see if anything we can improve to avoid memory cpy
We might has some issue calculating when is the right segment to run compaction.
/assign @xiaofan-luan
/assign @jingkl
pls help on verification. All index/data OOM should be fixed by 17689
argo test-etcd-no-clean-qrlvn-1 server-configmap server-cluster-8c64m client-configmap client-random-locust-100m-ddl-r8-w2-12h master-20220622-6fdf88f4 pymilvus 2.1.0dev78
benchmark-tag-no-clean-56dcc-1-etcd-0 1/1 Running 0 5m59s 10.97.17.236 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-etcd-1 1/1 Running 0 5m59s 10.97.16.92 qa-node013.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-etcd-2 1/1 Running 0 5m59s 10.97.16.102 qa-node013.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-milvus-datacoord-56cfc46df96cxg9 1/1 Running 0 5m59s 10.97.3.192 qa-node001.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-milvus-datanode-57d79c494b-rpklt 1/1 Running 1 5m59s 10.97.17.228 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-milvus-indexcoord-65c855f49l7v6c 1/1 Running 0 5m59s 10.97.17.219 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-milvus-indexnode-685c969fbfmmcgw 1/1 Running 0 5m59s 10.97.17.218 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-milvus-proxy-6c99589c4b-jhjrd 1/1 Running 1 5m59s 10.97.17.233 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-milvus-querycoord-67df8f96bjhbxq 1/1 Running 1 5m59s 10.97.17.226 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-milvus-querynode-6f86c549674xt4b 1/1 Running 0 5m59s 10.97.17.230 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-milvus-rootcoord-6c5446867-dh562 1/1 Running 1 5m59s 10.97.17.221 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-minio-0 1/1 Running 0 5m59s 10.97.19.204 qa-node016.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-minio-1 1/1 Running 0 5m59s 10.97.19.201 qa-node016.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-minio-2 1/1 Running 0 5m59s 10.97.19.206 qa-node016.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-minio-3 1/1 Running 0 5m58s 10.97.19.210 qa-node016.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-pulsar-bookie-0 1/1 Running 0 5m59s 10.97.5.195 qa-node003.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-pulsar-bookie-1 1/1 Running 0 5m58s 10.97.16.104 qa-node013.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-pulsar-bookie-2 1/1 Running 0 5m58s 10.97.20.112 qa-node018.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-pulsar-bookie-init-bfkjp 0/1 Completed 0 5m59s 10.97.17.232 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-pulsar-broker-0 1/1 Running 0 5m59s 10.97.17.234 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-pulsar-proxy-0 1/1 Running 0 5m59s 10.97.17.231 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-pulsar-pulsar-init-2kt2c 0/1 Completed 0 5m59s 10.97.17.229 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-pulsar-recovery-0 1/1 Running 0 5m59s 10.97.17.227 qa-node014.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-pulsar-zookeeper-0 1/1 Running 0 5m59s 10.97.3.194 qa-node001.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-pulsar-zookeeper-1 1/1 Running 0 5m20s 10.97.12.47 qa-node015.zilliz.local <none> <none>
benchmark-tag-no-clean-56dcc-1-pulsar-zookeeper-2 1/1 Running 0 4m51s 10.97.9.158 qa-node007.zilliz.local <none> <none>
As the graph shows Datanode memory usage is no longer oom, this is 12 hours of datanode memory usage for 100 million data
argo test-etcd-no-clean-n5wpb-1 server-configmap server-cluster-8c64m-kafka client-configmap client-random-locust-100m-ddl-r8-w2
master-20220622-b4f21259 pymilvus 2.1.0dev78
test-etcd-no-clean-n5wpb-1-0 1/1 Running 0 3m31s 10.97.17.254 qa-node014.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-1 1/1 Running 0 3m31s 10.97.16.143 qa-node013.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-2 1/1 Running 0 3m31s 10.97.17.2 qa-node014.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-kafka-0 1/1 Running 2 3m31s 10.97.19.211 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-kafka-1 1/1 Running 2 3m31s 10.97.18.247 qa-node017.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-kafka-2 1/1 Running 1 3m31s 10.97.4.195 qa-node002.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-milvus-datacoord-5b54ccbbfd-9cghr 1/1 Running 0 3m31s 10.97.11.61 qa-node009.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-milvus-datanode-6cc89595b9-xsprv 1/1 Running 0 3m31s 10.97.16.141 qa-node013.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-milvus-indexcoord-7954b98675-g76c7 1/1 Running 0 3m32s 10.97.3.134 qa-node001.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-milvus-indexnode-746f6b9bf6-bdstb 1/1 Running 0 3m31s 10.97.17.251 qa-node014.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-milvus-proxy-58577857d9-2qhns 1/1 Running 0 3m32s 10.97.18.245 qa-node017.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-milvus-querycoord-5dcd77bc68-mlgqz 1/1 Running 0 3m32s 10.97.12.85 qa-node015.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-milvus-querynode-675fcf996b-r92j5 1/1 Running 0 3m32s 10.97.20.216 qa-node018.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-milvus-rootcoord-59fd64679d-77p94 1/1 Running 0 3m32s 10.97.18.246 qa-node017.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-minio-0 1/1 Running 0 3m31s 10.97.12.89 qa-node015.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-minio-1 1/1 Running 0 3m31s 10.97.19.227 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-minio-2 1/1 Running 0 3m31s 10.97.19.225 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-minio-3 1/1 Running 0 3m31s 10.97.19.229 qa-node016.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-zookeeper-0 1/1 Running 0 3m31s 10.97.3.135 qa-node001.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-zookeeper-1 1/1 Running 0 3m31s 10.97.12.88 qa-node015.zilliz.local <none> <none>
test-etcd-no-clean-n5wpb-1-zookeeper-2 1/1 Running 0 3m31s 10.97.19.213 qa-node016.zilliz.local <none> <none>
However, the memory of the datanode in the following scenario grows gradually, up to about 2.08GB
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
2.1.0-20220726-1b33c731 pymilus 2.1.0dev103
server-instance fouram-tag-no-clean-dn9tn-1 server-configmap server-cluster-8c64m-kafka client-configmap client-random-locust-100m-ddl-r8-w2-100h
fouram-tag-no-clean-dn9tn-1-etcd-0 1/1 Running 0 2m43s 10.104.5.15 4am-node12 <none> <none>
fouram-tag-no-clean-dn9tn-1-etcd-1 1/1 Running 0 2m43s 10.104.4.66 4am-node11 <none> <none>
fouram-tag-no-clean-dn9tn-1-etcd-2 1/1 Running 0 2m43s 10.104.9.62 4am-node14 <none> <none>
fouram-tag-no-clean-dn9tn-1-kafka-0 1/1 Running 1 (2m30s ago) 2m43s 10.104.9.59 4am-node14 <none> <none>
fouram-tag-no-clean-dn9tn-1-kafka-1 1/1 Running 1 (2m30s ago) 2m43s 10.104.4.64 4am-node11 <none> <none>
fouram-tag-no-clean-dn9tn-1-kafka-2 1/1 Running 1 (2m31s ago) 2m43s 10.104.6.67 4am-node13 <none> <none>
fouram-tag-no-clean-dn9tn-1-milvus-datacoord-6d864d76d5-bpmq9 1/1 Running 0 2m43s 10.104.1.4 4am-node10 <none> <none>
fouram-tag-no-clean-dn9tn-1-milvus-datanode-5864ddd55b-hhq9l 1/1 Running 0 2m43s 10.104.6.66 4am-node13 <none> <none>
fouram-tag-no-clean-dn9tn-1-milvus-indexcoord-66497695c6-jlsn4 1/1 Running 0 2m43s 10.104.9.56 4am-node14 <none> <none>
fouram-tag-no-clean-dn9tn-1-milvus-indexnode-7c8bc6f69-zh9nx 1/1 Running 0 2m43s 10.104.1.5 4am-node10 <none> <none>
fouram-tag-no-clean-dn9tn-1-milvus-proxy-6ff77f88d6-zt84x 1/1 Running 0 2m43s 10.104.1.2 4am-node10 <none> <none>
fouram-tag-no-clean-dn9tn-1-milvus-querycoord-999d776b7-mj9b2 1/1 Running 0 2m43s 10.104.1.3 4am-node10 <none> <none>
fouram-tag-no-clean-dn9tn-1-milvus-querynode-84589847bc-n6s9x 1/1 Running 0 2m43s 10.104.5.11 4am-node12 <none> <none>
fouram-tag-no-clean-dn9tn-1-milvus-rootcoord-8f4ccc977-qcvtq 1/1 Running 0 2m43s 10.104.9.55 4am-node14 <none> <none>
fouram-tag-no-clean-dn9tn-1-minio-0 1/1 Running 0 2m43s 10.104.5.14 4am-node12 <none> <none>
fouram-tag-no-clean-dn9tn-1-minio-1 1/1 Running 0 2m43s 10.104.9.61 4am-node14 <none> <none>
fouram-tag-no-clean-dn9tn-1-minio-2 1/1 Running 0 2m43s 10.104.6.70 4am-node13 <none> <none>
fouram-tag-no-clean-dn9tn-1-minio-3 1/1 Running 0 2m43s 10.104.4.70 4am-node11 <none> <none>
fouram-tag-no-clean-dn9tn-1-zookeeper-0 1/1 Running 0 2m43s 10.104.9.58 4am-node14 <none> <none>
fouram-tag-no-clean-dn9tn-1-zookeeper-1 1/1 Running 0 2m43s 10.104.4.63 4am-node11 <none> <none>
fouram-tag-no-clean-dn9tn-1-zookeeper-2 1/1 Running 0 2m43s 10.104.6.68 4am-node13 <none> <none>
datanode memory:
The datanode's memory still keeps growing
This issue will be kept open
/unassign /assign @wayblink
@wayblink any progress on this one?
@wayblink any progress on this one?
Still working on it. Current status is: This case only occurs in Kafka cluster and the Go memory usage is actually not that large. So it probably related to kafka CGO things instead of compaction. We are reproducing the test with heaptrack and will analyze it later.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.