[Bug]: concurrency performance of full-text index is poor.
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Branch Name
main
Commit ID
f7093cc
Other Environment Information
- Hardware parameters:
- OS type:
- Others:
machine: 64c 256G(129)
Actual Behavior
结论:1. 以下测试,并发50 qps达到极限;测试期间mem使用率低于50%.
数据集:200
M,4579025行
测试时长:5min
50并发memory: https://shanghai.idc.matrixorigin.cn:30001/d/rYdddlPWk/node-exporter-full?orgId=1&var-datasource=prometheus-standalone&var-job=agents&var-node=10.222.1.129:9100&var-diskdevices=%5Ba-z%5D%2B%7Cnvme%5B0-9%5D%2Bn%5B0-9%5D%2B%7Cmmcblk%5B0-9%5D%2B&from=1736924899000&to=1736925199000
50并发memory: https://shanghai.idc.matrixorigin.cn:30001/d/rYdddlPWk/node-exporter-full?orgId=1&var-datasource=prometheus-standalone&var-job=agents&var-node=10.222.1.129:9100&var-diskdevices=%5Ba-z%5D%2B%7Cnvme%5B0-9%5D%2Bn%5B0-9%5D%2B%7Cmmcblk%5B0-9%5D%2B&from=1736927368000&to=1736927668000
数据集:1.16G,24202046行
测试时长:5min
50并发memory: https://shanghai.idc.matrixorigin.cn:30001/d/rYdddlPWk/node-exporter-full?orgId=1&var-datasource=prometheus-standalone&var-job=agents&var-node=10.222.1.129:9100&var-diskdevices=%5Ba-z%5D%2B%7Cnvme%5B0-9%5D%2Bn%5B0-9%5D%2B%7Cmmcblk%5B0-9%5D%2B&from=1736930233000&to=1736930533000
测试步骤:
steps:
git clone https://github.com/matrixorigin/mo-load.git
cd mo-load
mkdir fulltext
cd fulltext
vi case.yml
name: "fulltext_test"
runMins: 5
runTimes: 0
transaction:
- name: "fulltext_test"
terminal: 10
scripts:
- sql: "SELECT col3 from test.fulltext_test where match(col3) against ('+{str1}');"
vi replace.yml
replace:
- name: str1
type: file
path: /root/weilu/mo-load/replace/output_test.txt
dtype: "varchar"
单索引列单关键词检索(200M):
mysql -h 127.0.0.1 -P 6001 -udump -p111 --local-infile
set experimental_fulltext_index=1;
create database test;
use test;
CREATE TABLE `fulltext_test` (
`col1` bigint DEFAULT NULL,
`col2` int NOT NULL,
`col3` varchar(200) DEFAULT NULL,
`col4` varchar(200) DEFAULT NULL,
PRIMARY KEY (`col2`),
FULLTEXT(`col3`) WITH PARSER ngram
);
load data local infile '/Users/ariznawl/Downloads/zhwiki-20241201-pages-articles-multistream-index.txt' into table test.fulltext_test fields terminated by ':' ESCAPED BY '\t' lines terminated by '\n';
双索引列单关键词检索(200M):
mysql -h 127.0.0.1 -P 6001 -udump -p111 --local-infile
set experimental_fulltext_index=1;
create database test;
use test;
CREATE TABLE `fulltext_test` (
`col1` bigint DEFAULT NULL,
`col2` int NOT NULL,
`col3` varchar(200) DEFAULT NULL,
`col4` varchar(200) DEFAULT NULL,
PRIMARY KEY (`col2`),
FULLTEXT(`col3`,`col4`) WITH PARSER ngram
);
load data local infile '/Users/ariznawl/Downloads/zhwiki-20241201-pages-articles-multistream-index.txt' into table test.fulltext_test fields terminated by ':' ESCAPED BY '\t' lines terminated by '\n';
单索引列单关键词检索(1.16G):
mysql -h 127.0.0.1 -P 6001 -udump -p111 --local-infile
set experimental_fulltext_index=1;
create database test;
use test;
CREATE TABLE `fulltext_test` (
`col1` bigint DEFAULT NULL,
`col2` int NOT NULL,
`col3` varchar(200) DEFAULT NULL,
PRIMARY KEY (`col2`),
FULLTEXT(`col3`) WITH PARSER ngram
);
load data local infile '/Users/ariznawl/Downloads/enwiki-20241201-pages-articles-multistream-index.txt' into table test.fulltext_test fields terminated by ':' ESCAPED BY '\t' lines terminated by '\n';
cd mo-load
./run.sh -c cases/fulltext -b test -t 10/50/100/500/1000
Expected Behavior
No response
Steps to Reproduce
detailed in actual behavior
注:boolean mode下全文索引并发性能测试,对标数据库ES并发性能测试进行中...
Additional information
No response
commit:3742ad8875529237f5ef508b5126ed1d6cbf910f
测试步骤:
mysql> show create table test;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| test | CREATE TABLE `test` (
`id` int NOT NULL,
`title` varchar(255) DEFAULT NULL,
`content` text DEFAULT NULL,
PRIMARY KEY (`id`),
FULLTEXT `f01`(`content`)
) |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> load data local infile '/root/weilu/zhwiki-latest-pages-articles-multistream6.xml-p7096380p8596379.csv' into table test fields terminated by ',' enclosed by '"' escaped by '';
Query OK, 561662 rows affected (49.31 sec)
mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
| 561662 |
+----------+
1 row in set (0.01 sec)
根据全文索引名称查找全文索引表名
mysql> select name, index_table_name from mo_catalog.mo_indexes where name = 'f01';
+------+-----------------------------------------------------------+
| name | index_table_name |
+------+-----------------------------------------------------------+
| f01 | __mo_index_secondary_01951cd8-ceb6-7119-953c-50c4244bf512 |
+------+-----------------------------------------------------------+
1 row in set (0.02 sec)
查找索引计数,取前200个作为keywords文件内容
mysql> select count(*) as word_count,word from `__mo_index_secondary_01951c1d-1589-756a-a9dc-671e85e2fae4` where word like '%是%' group by word order by word_count desc limit 200 ;
+------------+-----------+
| word_count | word |
+------------+-----------+
| 76444 | 是 |
| 24569 | 一用是 |
| 24564 | 是書生 |
| 24561 | 用是書 |
| 18777 | 是位於 |
| 10235 | 是一种 |
| 9254 | 是一個 |
| 8942 | 是一个 |
| 8699 | 可能是 |
| 8090 | 是一名 |
| 7021 | 是一位 |
| 6174 | 是法国 |
| 5826 | 而不是 |
| 5443 | 是一座 |
| 5394 | 的是 |
| 5354 | 是日本 |
| 5218 | 是中国 |
| 4696 | 是在 |
| 4416 | 尤其是 |
| 4199 | 认为是 |
| 4140 | 是一部 |
| 4045 | 是一種 |
| 3970 | 此站是 |
| 3673 | 主要是 |
| 3537 | 認為是 |
| 3466 | 是因為 |
| 3311 | 但是 |
| 3250 | 是美国 |
.......
创建keywords.txt文档
使用mo-load工具测试并发:
git clone https://github.com/matrixorigin/mo-load.git
cd mo-load
mkdir fulltext
cd fulltext
vi case.yml
name: "fulltext_test"
runMins: 5
runTimes: 0
transaction:
- name: "fulltext_test"
terminal: 10
scripts:
- sql: "SELECT * from test.test where match(content) against ('{str1}' in natural language mode) order by id desc limit 100;"
vi replace.yml
replace:
- name: str1
type: file
path: /root/weilu/mo-load/replace/temp_fulltext_output.txt
dtype: "varchar"
cd mo-load
mkdir replace
cd replace
将文件temp_fulltext_output.txt放入该目录
cd mo-load
./run.sh -c cases/fulltext -b test -t 10 (-t后指定测试并发数)
测试结果:
heap and malloc profile: 10并发: heap_0219_10.zip malloc_0219_10.zip
We have to give the stats start from 1 wiki files to all wiki files. We want to see the limit of our system.
数据量800M
table schema:
CREATE TABLE `test` (
`id` int NOT NULL,
`title` varchar(255) DEFAULT NULL,
`content` text DEFAULT NULL,
PRIMARY KEY (`id`),
FULLTEXT `f01`(`content`)
)
并发测试sql: SELECT * from test.test where match(content) against ('+string1 +string2' in boolean mode) limit 100;
并发测试sql: SELECT * from test.test where match(content) against ('+string1 string2' in boolean mode) limit 100;
the performance is still poor
we have also fixed the issue https://github.com/matrixorigin/matrixone/issues/21678 https://github.com/matrixorigin/matrixone/issues/21774
@Ariznawlll we try again and see any improvement.