matrixone [Bug]: create hnsw index cn pod killed

Is there an existing issue for the same bug?

[x] I have checked the existing issues.

Branch Name

main

Commit ID

f9c6677b6

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

2cn relica /mem 8G/cpu 8cores, sift 128dim 1million data create hnsw index sql： CREATE INDEX hnsw USING hnsw on ann.items_sift (embedding) M = 8 EF_CONSTRUCTION = 200 EF_SEARCH = 64 OP_TYPE 'vector_l2_ops';

cpu exceed limit 8 and memory below 8G: https://shanghai.idc.matrixorigin.cn:30001/d/cluster-detail-namespaced/cluster-detail-namespaced?orgId=1&from=1741672944203&to=1741673448313

profile： https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22K9V%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-hnsw-test%5C%22,pod%3D%5C%22stability-regression-dis-tp-cn-h4h5k%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22profileTypeId%22:%22process_cpu:cpu:nanoseconds:cpu:nanoseconds%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D%7D%5D,%22range%22:%7B%22from%22:%221741673094453%22,%22to%22:%221741673198866%22%7D%7D%7D&schemaVersion=1&orgId=1

This is an occasional problem, and the probability of reproduction is relatively high

Expected Behavior

No response

Steps to Reproduce

1.2cn relica /mem 8G/cpu 8cores
2. load sift 128dim 1million data
3.create hnsw index

Additional information

No response

Mar 11 '25 06:03 heni02

The above scenario is tested again, and this time the configuration is as follows： set hnsw_threads_build=4 and cn requests/limits values

but still appeared CN pod was killed

https://shanghai.idc.matrixorigin.cn:30001/d/85a562078cdf77779eaa1add43ccec1e/kubernetes-compute-resources-namespace-pods?orgId=1&var-datasource=prometheus&var-cluster=&var-namespace=mo-hnsw-test&from=1741679496324&to=1741680732357

Mar 11 '25 08:03 heni02

the issue should be reach pod memory limit.

Mar 11 '25 14:03 cpegeric

this should be fixed. I can see max memory use is 4G from "top".

Mar 11 '25 14:03 cpegeric

commit:6c625cc16 cpu 8core,mem 8G,create index cn oom sql: CREATE INDEX hnsw USING hnsw on ann.items_sift (embedding) M = 8 EF_CONSTRUCTION = 200 EF_SEARCH = 64 OP_TYPE 'vector_l2_ops' https://shanghai.idc.matrixorigin.cn:30001/d/cluster-detail-namespaced/cluster-detail-namespaced?orgId=1&from=1742369643658&to=1742370810740

profile: https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22mkB%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-hnsw-test%5C%22,pod%3D%5C%22stability-regression-dis-tp-cn-vdf8c%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22profileTypeId%22:%22memory:alloc_objects:count:space:bytes%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D%7D%5D,%22range%22:%7B%22from%22:%221742370158518%22,%22to%22:%221742370705000%22%7D%7D%7D&schemaVersion=1&orgId=1

Mar 19 '25 08:03 heni02

I found high memory usage when save file to database.

The generated sql uses a lot of load_file() to read chunk of data by offset and size. I only give 10000 data chunks per sql. If I give more chunks, it will have 2G limit reach. I guess INSERT cache the data before actual insert.

If Idecrease the number of chunks to smaller number like 1000 chunks per sql, no OOM happens. I can have a temporary fix with smaller number of chunk.

For 10000 chunks per sql, OOM.

Please pass to query team to take a look of this issue. We should not wait keep all data into memory before save to disk?

SQL generated.

INSERT INTO db.__secondary_index VALUES ('localhost:1:0:0', 0, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=0&size=65536' as datalink)), 0), ('localhost:1:0:0', 1, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=65536&size=65536' as datalink)), 0), ('localhost:1:0:0', 2, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=131072&size=65536' as datalink)), 0), ('localhost:1:0:0', 3, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=196608&size=65536' as datalink)), 0), ('localhost:1:0:0', 4, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=262144&size=65536' as datalink)), 0), ('localhost:1:0:0', 5, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=327680&size=65536' as datalink)), 0), ('localhost:1:0:0', 6, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=393216&size=65536' as datalink)), 0), ('localhost:1:0:0', 7, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=458752&size=65536' as datalink)), 0),....

Mar 19 '25 12:03 cpegeric

heap profile inuse_space

alloc_space

Mar 19 '25 19:03 cpegeric

Please verify and close.

Apr 27 '25 08:04 fengttt

目前资源长期在跑稳定性和故障注入测试，挪到一下版本验证

May 30 '25 06:05 heni02

main commit:81d0a0f9a 验证cn还是被killed，现象也是cn cpu限制8core，实际使用超出8core，内存使用3.25G

配置：

Aug 07 '25 06:08 heni02

3.0-dev 发布版本：6bb57b13d 配置分别1个cn和2个cn （8core/8G）create hnsw index时都会oom，和之前的现象一致 https://shanghai.idc.matrixorigin.cn:30001/d/cluster-detail-namespaced/cluster-detail-namespaced?orgId=1&var-namespace=mo-hnsw-test&var-account=All&var-interval=$__auto_interval_interval&var-cluster=.%2A&var-loki=loki&from=1756692922345&to=1756700122345 https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%229qa%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-hnsw-test%5C%22,pod%3D%5C%22stability-regression-dis-tp-cn-jwtjg%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D,%22profileTypeId%22:%22process_cpu:cpu:nanoseconds:cpu:nanoseconds%22%7D%5D,%22range%22:%7B%22from%22:%221756696345000%22,%22to%22:%221756696717000%22%7D%7D%7D&schemaVersion=1&orgId=1

Sep 01 '25 03:09 heni02

query时也会oom，cc @iamlinjunhong 是否是一个原因 https://github.com/matrixorigin/matrixone/issues/22465

Sep 03 '25 10:09 heni02

query时也会oom，cc @iamlinjunhong 是否是一个原因 #22465

这个是查询，不是一个原因

Sep 04 '25 07:09 iamlinjunhong

create index hnsw 执行前已经在疯狂gc了，也就是说有sql在跑，mo占用内存1G以上 11:12:15.348 gc 4441 @240653.114s 0%: 0.090+76+0.097 ms clock, 0.72+0.35/150/350+0.78 ms cpu, 1169->1219->459 MB, 1290 MB goal, 0 MB stacks, 0 MB globals, 8 P

create index hnsw 执行时堆内存达到5G，mo占用内存5G以上，然后就oom了 11:15:38.324 gc 4451 @240856.060s 0%: 0.079+106+0.058 ms clock, 0.63+349/211/166+0.47 ms cpu, 5030->5030->2164 MB, 5030 MB goal, 0 MB stacks, 0 MB globals, 8 P

内存膨胀最严重的是在 value_scan 上，create index hnsw 会生成 SELECT * FROM ann.t3 AS src CROSS APPLY hnsw_create(...); 执行。 hnsw_create 会计算index然后写入本地文件，然后构建sql insert values，insert values 需要调用 value_scan 计算表达式(load_file(data_link)) 然后再 copy，相当于数据翻倍。

目前想到的解决方案：把 hnsw_create 改成阻塞算子，然后通过 insert select 来插入index，避免value_scan多余计算和 copy。不过改动很大。

eric 建议把insert batch 10000行改成2000行，峰值内存还是从 3.5g 降到了 1741.65MB，执行时间没变

Sep 04 '25 07:09 iamlinjunhong