matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

[Bug]: select count(*) from fulltext index table oom.

Open Ariznawlll opened this issue 11 months ago • 11 comments

Is there an existing issue for the same bug?

  • [x] I have checked the existing issues.

Branch Name

main

Commit ID

06d0ed2

Other Environment Information

- Hardware parameters:
- OS type:
- Others:

Actual Behavior

Image

profile.zip

Expected Behavior

No response

Steps to Reproduce

CREATE TABLE `test` (
  `id` int NOT NULL,
  `title` varchar(255) DEFAULT NULL,
  `content` text DEFAULT NULL,
  PRIMARY KEY (`id`),
 FULLTEXT `f01`(`content`)
) ;
load data local infile '/root/weilu/zhwiki-latest-pages-articles-multistream6.xml-p7096380p8596379.csv' into table test fields terminated by ',' enclosed by '"' escaped by '';

select name, index_table_name from  mo_catalog.mo_indexes where name = 'f01';

select count(*), word from `__mo_index_secondary_01951c1d-1589-756a-a9dc-671e85e2fae4` group by word;


the load file is huge, please contact me privately.

Additional information

No response

Ariznawlll avatar Feb 19 '25 06:02 Ariznawlll

this is wrong SQL. better verify again.

cpegeric avatar Feb 19 '25 09:02 cpegeric

错误的sql也不应该导致oom

the correct sql is:select count(*) as word_count, word from __mo_index_secondary_01951cd8-ceb6-7119-953c-50c4244bf512 group by word order by word_count desc limit 200;

Ariznawlll avatar Feb 19 '25 09:02 Ariznawlll

repro:

CREATE TABLE test ( id int NOT NULL, title varchar(255) DEFAULT NULL, content text DEFAULT NULL, PRIMARY KEY (id), FULLTEXT f01(content) ) ; load data local infile '/root/weilu/zhwiki-latest-pages-articles-multistream6.xml-p7096380p8596379.csv' into table test fields terminated by ',' enclosed by '"' escaped by '';

select name, index_table_name from mo_catalog.mo_indexes where name = 'f01';

select count(*), word from __mo_index_secondary_0195227b-e5a4-79f1-970f-3bf68bd1dda5 group by word;

Image

Ariznawlll avatar Feb 20 '25 10:02 Ariznawlll

偶发,OOM是由于shuffle算子造成。先想办法切成多个batch

ouyuanning avatar Mar 05 '25 07:03 ouyuanning

报bug对应的SQL,目前应该好了。但是还有2个可能要再处理的。 1、shuffle算子也要切batch(bug对应SQL使用的是shuffleV2) 2、bug对应的sql,走的range shuffle,但是对应的结果90%以上都在一个桶里,不知道是否编译时候选择错了?似乎哈希分桶会更好

ouyuanning avatar Mar 07 '25 03:03 ouyuanning

inner join目前也会OOM

https://shanghai.idc.matrixorigin.cn:30001/explore?panes=%7B%22GYP%22:%7B%22datasource%22:%22loki%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bhost%3D%5C%2210-222-1-129%5C%22%7D%20%7C%3D%20%60mpool%20memory%20allocation%20exceed%60%22,%22queryType%22:%22range%22,%22datasource%22:%7B%22type%22:%22loki%22,%22uid%22:%22loki%22%7D,%22editorMode%22:%22builder%22%7D%5D,%22range%22:%7B%22from%22:%22now-1h%22,%22to%22:%22now%22%7D%7D%7D&schemaVersion=1&orgId=1

ouyuanning avatar Mar 12 '25 08:03 ouyuanning

这个语句能复现OOM

SELECT id from db1.test where match(content) against ('第八届理事会' in natural language mode) limit 100;

fulltext有执行这条SQL

WITH 
kw0 AS (SELECT doc_id, pos FROM `db1`.`__mo_index_secondary_01958975-b1b2-789b-a4f4-c5a82cadee07` WHERE word = '第八届'), 
kw1 AS (SELECT doc_id, pos FROM `db1`.`__mo_index_secondary_01958975-b1b2-789b-a4f4-c5a82cadee07` WHERE word = '八届理'), 
kw2 AS (SELECT doc_id, pos FROM `db1`.`__mo_index_secondary_01958975-b1b2-789b-a4f4-c5a82cadee07` WHERE word = '届理事'), 
kw3 AS (SELECT doc_id, pos FROM `db1`.`__mo_index_secondary_01958975-b1b2-789b-a4f4-c5a82cadee07` WHERE word = '理事会'), 
kw4 AS (SELECT doc_id, pos FROM `db1`.`__mo_index_secondary_01958975-b1b2-789b-a4f4-c5a82cadee07` WHERE prefix_eq(word,'事会')), 
kw5 AS (SELECT doc_id, pos FROM `db1`.`__mo_index_secondary_01958975-b1b2-789b-a4f4-c5a82cadee07` WHERE prefix_eq(word,'会')) 
SELECT kw0.doc_id, CAST(0 as int) FROM kw0, kw1, kw2, kw3, kw4, kw5 WHERE kw0.doc_id = kw1.doc_id AND kw1.pos - kw0.pos = 3 AND kw0.doc_id = kw2.doc_id AND kw2.pos - kw0.pos = 6 AND kw0.doc_id = kw3.doc_id AND kw3.pos - kw0.pos = 9 AND kw0.doc_id = kw4.doc_id AND kw4.pos - kw0.pos = 12 AND kw0.doc_id = kw5.doc_id AND kw5.pos - kw0.pos = 15

这里会OOM。最后的join,拿到的hashbuild有上亿行的数据。

不过直接执行 explain analyze 又会很快完成,且hashbuild的数据在千级别。应该是哪个地方有bug。

ouyuanning avatar Mar 12 '25 10:03 ouyuanning

SELECT id from db1.test where match(content) against ('第八届理事会' in natural language mode) limit 100;

这个的问题有2个 1、内部执行器调用的SQL,getStatsCache有问题,导致走了AP模式 2、AP模式下,不太确定什么问题,可能是runtimefilter?导致build hash的结果非常大。超过2G限制。

1的部分,这个PR处理:https://github.com/matrixorigin/matrixone/pull/21518 2的部分,还要再看看,不过1处理后,目前的用例应该不会OOM了

ouyuanning avatar Mar 13 '25 08:03 ouyuanning

2的部分等碰到相关问题再做处理吧。

ouyuanning avatar Mar 14 '25 07:03 ouyuanning

Should continue investigating the build hash 2G bug.

fengttt avatar Apr 27 '25 08:04 fengttt

please verify again.

cpegeric avatar Jun 12 '25 10:06 cpegeric

CREATE TABLE test ( id int NOT NULL, title varchar(255) DEFAULT NULL, content text DEFAULT NULL, PRIMARY KEY (id), FULLTEXT f01(content) ) ; load data local infile '/root/weilu/zhwiki-latest-pages-articles-multistream6.xml-p7096380p8596379.csv' into table test fields terminated by ',' enclosed by '"' escaped by '';

select name, index_table_name from mo_catalog.mo_indexes where name = 'f01';

select count(*), word from __mo_index_secondary_0198085e-3722-7bdb-a57f-1241541f5874 group by word;

Image

SELECT id from test where match(content) against ('第八届理事会' in natural language mode) limit 100; Image

Ariznawlll avatar Jul 14 '25 10:07 Ariznawlll