[Bug]: create hnsw index cn pod killed
Is there an existing issue for the same bug?
- [x] I have checked the existing issues.
Branch Name
main
Commit ID
f9c6677b6
Other Environment Information
- Hardware parameters:
- OS type:
- Others:
Actual Behavior
2cn relica /mem 8G/cpu 8cores, sift 128dim 1million data create hnsw index sql: CREATE INDEX hnsw USING hnsw on ann.items_sift (embedding) M = 8 EF_CONSTRUCTION = 200 EF_SEARCH = 64 OP_TYPE 'vector_l2_ops';
cpu exceed limit 8 and memory below 8G:
https://shanghai.idc.matrixorigin.cn:30001/d/cluster-detail-namespaced/cluster-detail-namespaced?orgId=1&from=1741672944203&to=1741673448313
profile: https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22K9V%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-hnsw-test%5C%22,pod%3D%5C%22stability-regression-dis-tp-cn-h4h5k%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22profileTypeId%22:%22process_cpu:cpu:nanoseconds:cpu:nanoseconds%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D%7D%5D,%22range%22:%7B%22from%22:%221741673094453%22,%22to%22:%221741673198866%22%7D%7D%7D&schemaVersion=1&orgId=1
This is an occasional problem, and the probability of reproduction is relatively high
Expected Behavior
No response
Steps to Reproduce
1.2cn relica /mem 8G/cpu 8cores
2. load sift 128dim 1million data
3.create hnsw index
Additional information
No response
The above scenario is tested again, and this time the configuration is as follows:
set hnsw_threads_build=4 and cn requests/limits values
but still appeared CN pod was killed
https://shanghai.idc.matrixorigin.cn:30001/d/85a562078cdf77779eaa1add43ccec1e/kubernetes-compute-resources-namespace-pods?orgId=1&var-datasource=prometheus&var-cluster=&var-namespace=mo-hnsw-test&from=1741679496324&to=1741680732357
the issue should be reach pod memory limit.
this should be fixed. I can see max memory use is 4G from "top".
commit:6c625cc16
cpu 8core,mem 8G,create index cn oom
sql: CREATE INDEX hnsw USING hnsw on ann.items_sift (embedding) M = 8 EF_CONSTRUCTION = 200 EF_SEARCH = 64 OP_TYPE 'vector_l2_ops'
https://shanghai.idc.matrixorigin.cn:30001/d/cluster-detail-namespaced/cluster-detail-namespaced?orgId=1&from=1742369643658&to=1742370810740
profile: https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22mkB%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-hnsw-test%5C%22,pod%3D%5C%22stability-regression-dis-tp-cn-vdf8c%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22profileTypeId%22:%22memory:alloc_objects:count:space:bytes%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D%7D%5D,%22range%22:%7B%22from%22:%221742370158518%22,%22to%22:%221742370705000%22%7D%7D%7D&schemaVersion=1&orgId=1
I found high memory usage when save file to database.
The generated sql uses a lot of load_file() to read chunk of data by offset and size. I only give 10000 data chunks per sql. If I give more chunks, it will have 2G limit reach. I guess INSERT cache the data before actual insert.
If Idecrease the number of chunks to smaller number like 1000 chunks per sql, no OOM happens. I can have a temporary fix with smaller number of chunk.
For 10000 chunks per sql, OOM.
Please pass to query team to take a look of this issue. We should not wait keep all data into memory before save to disk?
SQL generated.
INSERT INTO db.__secondary_index VALUES ('localhost:1:0:0', 0, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=0&size=65536' as datalink)), 0), ('localhost:1:0:0', 1, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=65536&size=65536' as datalink)), 0), ('localhost:1:0:0', 2, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=131072&size=65536' as datalink)), 0), ('localhost:1:0:0', 3, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=196608&size=65536' as datalink)), 0), ('localhost:1:0:0', 4, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=262144&size=65536' as datalink)), 0), ('localhost:1:0:0', 5, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=327680&size=65536' as datalink)), 0), ('localhost:1:0:0', 6, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=393216&size=65536' as datalink)), 0), ('localhost:1:0:0', 7, load_file(cast('file:///var/folders/gm/g90yr85n3tz2vh0sdbtlps180000gn/T/hnsw1938057483?offset=458752&size=65536' as datalink)), 0),....
heap profile
inuse_space
alloc_space
Please verify and close.
目前资源长期在跑稳定性和故障注入测试,挪到一下版本验证
main commit:81d0a0f9a 验证cn还是被killed,现象也是cn cpu限制8core,实际使用超出8core,内存使用3.25G
配置:
3.0-dev 发布版本:6bb57b13d
配置分别1个cn和2个cn (8core/8G)create hnsw index时都会oom,和之前的现象一致
https://shanghai.idc.matrixorigin.cn:30001/d/cluster-detail-namespaced/cluster-detail-namespaced?orgId=1&var-namespace=mo-hnsw-test&var-account=All&var-interval=$__auto_interval_interval&var-cluster=.%2A&var-loki=loki&from=1756692922345&to=1756700122345
https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%229qa%22:%7B%22datasource%22:%22pyroscope%22,%22queries%22:%5B%7B%22groupBy%22:%5B%5D,%22labelSelector%22:%22%7Bnamespace%3D%5C%22mo-hnsw-test%5C%22,pod%3D%5C%22stability-regression-dis-tp-cn-jwtjg%5C%22%7D%22,%22queryType%22:%22both%22,%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22grafana-pyroscope-datasource%22,%22uid%22:%22pyroscope%22%7D,%22profileTypeId%22:%22process_cpu:cpu:nanoseconds:cpu:nanoseconds%22%7D%5D,%22range%22:%7B%22from%22:%221756696345000%22,%22to%22:%221756696717000%22%7D%7D%7D&schemaVersion=1&orgId=1
query时也会oom,cc @iamlinjunhong 是否是一个原因 https://github.com/matrixorigin/matrixone/issues/22465
create index hnsw 执行前已经在疯狂gc了,也就是说有sql在跑,mo占用内存1G以上 11:12:15.348 gc 4441 @240653.114s 0%: 0.090+76+0.097 ms clock, 0.72+0.35/150/350+0.78 ms cpu, 1169->1219->459 MB, 1290 MB goal, 0 MB stacks, 0 MB globals, 8 P
create index hnsw 执行时堆内存达到5G,mo占用内存5G以上,然后就oom了 11:15:38.324 gc 4451 @240856.060s 0%: 0.079+106+0.058 ms clock, 0.63+349/211/166+0.47 ms cpu, 5030->5030->2164 MB, 5030 MB goal, 0 MB stacks, 0 MB globals, 8 P
内存膨胀最严重的是在 value_scan 上,create index hnsw 会生成 SELECT * FROM ann.t3 AS src CROSS APPLY hnsw_create(...); 执行。 hnsw_create 会计算index然后写入本地文件,然后构建sql insert values,insert values 需要调用 value_scan 计算表达式(load_file(data_link)) 然后再 copy,相当于数据翻倍。
目前想到的解决方案: 把 hnsw_create 改成阻塞算子,然后通过 insert select 来插入index,避免value_scan多余计算和 copy。不过改动很大。
eric 建议把insert batch 10000行改成2000行,峰值内存还是从 3.5g 降到了 1741.65MB,执行时间没变