matrixone icon indicating copy to clipboard operation
matrixone copied to clipboard

[Bug]: concurrency performance of full-text index is poor.

Open Ariznawlll opened this issue 1 year ago • 4 comments

Is there an existing issue for the same bug?

  • [X] I have checked the existing issues.

Branch Name

main

Commit ID

f7093cc

Other Environment Information

- Hardware parameters:
- OS type:
- Others:
machine: 64c 256G(129)

Actual Behavior

结论:1. 以下测试,并发50 qps达到极限;测试期间mem使用率低于50%.

数据集:200
M,4579025行 测试时长:5min image 50并发memory: https://shanghai.idc.matrixorigin.cn:30001/d/rYdddlPWk/node-exporter-full?orgId=1&var-datasource=prometheus-standalone&var-job=agents&var-node=10.222.1.129:9100&var-diskdevices=%5Ba-z%5D%2B%7Cnvme%5B0-9%5D%2Bn%5B0-9%5D%2B%7Cmmcblk%5B0-9%5D%2B&from=1736924899000&to=1736925199000

image

50并发memory: https://shanghai.idc.matrixorigin.cn:30001/d/rYdddlPWk/node-exporter-full?orgId=1&var-datasource=prometheus-standalone&var-job=agents&var-node=10.222.1.129:9100&var-diskdevices=%5Ba-z%5D%2B%7Cnvme%5B0-9%5D%2Bn%5B0-9%5D%2B%7Cmmcblk%5B0-9%5D%2B&from=1736927368000&to=1736927668000

数据集:1.16G,24202046行 测试时长:5min image

50并发memory: https://shanghai.idc.matrixorigin.cn:30001/d/rYdddlPWk/node-exporter-full?orgId=1&var-datasource=prometheus-standalone&var-job=agents&var-node=10.222.1.129:9100&var-diskdevices=%5Ba-z%5D%2B%7Cnvme%5B0-9%5D%2Bn%5B0-9%5D%2B%7Cmmcblk%5B0-9%5D%2B&from=1736930233000&to=1736930533000

测试步骤:
steps:
git clone https://github.com/matrixorigin/mo-load.git
cd mo-load
mkdir fulltext
cd fulltext

vi case.yml
name: "fulltext_test"
runMins: 5
runTimes: 0
transaction:
  - name: "fulltext_test"
    terminal: 10
    scripts:
      - sql: "SELECT col3 from test.fulltext_test where match(col3) against ('+{str1}');"

vi replace.yml
replace:
  - name: str1
    type: file
    path: /root/weilu/mo-load/replace/output_test.txt
    dtype: "varchar"

单索引列单关键词检索(200M):
mysql -h 127.0.0.1 -P 6001 -udump -p111 --local-infile
set experimental_fulltext_index=1;
create database test;
use test;
CREATE TABLE `fulltext_test` (
  `col1` bigint DEFAULT NULL,
  `col2` int NOT NULL,
  `col3` varchar(200) DEFAULT NULL,
  `col4` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`col2`),
 FULLTEXT(`col3`) WITH PARSER ngram
);
load data local infile '/Users/ariznawl/Downloads/zhwiki-20241201-pages-articles-multistream-index.txt' into table test.fulltext_test fields terminated by ':' ESCAPED BY '\t' lines terminated by '\n';


双索引列单关键词检索(200M):
mysql -h 127.0.0.1 -P 6001 -udump -p111 --local-infile
set experimental_fulltext_index=1;
create database test;
use test;
CREATE TABLE `fulltext_test` (
  `col1` bigint DEFAULT NULL,
  `col2` int NOT NULL,
  `col3` varchar(200) DEFAULT NULL,
  `col4` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`col2`),
 FULLTEXT(`col3`,`col4`) WITH PARSER ngram
);
load data local infile '/Users/ariznawl/Downloads/zhwiki-20241201-pages-articles-multistream-index.txt' into table test.fulltext_test fields terminated by ':' ESCAPED BY '\t' lines terminated by '\n';

单索引列单关键词检索(1.16G):
mysql -h 127.0.0.1 -P 6001 -udump -p111 --local-infile
set experimental_fulltext_index=1;
create database test;
use test;
CREATE TABLE `fulltext_test` (
  `col1` bigint DEFAULT NULL,
  `col2` int NOT NULL,
  `col3` varchar(200) DEFAULT NULL,
  PRIMARY KEY (`col2`),
 FULLTEXT(`col3`) WITH PARSER ngram
);
load data local infile '/Users/ariznawl/Downloads/enwiki-20241201-pages-articles-multistream-index.txt' into table test.fulltext_test fields terminated by ':' ESCAPED BY '\t' lines terminated by '\n';

cd mo-load
./run.sh -c cases/fulltext -b test -t 10/50/100/500/1000

Expected Behavior

No response

Steps to Reproduce

detailed in actual behavior

注:boolean mode下全文索引并发性能测试,对标数据库ES并发性能测试进行中...

Additional information

No response

Ariznawlll avatar Jan 15 '25 11:01 Ariznawlll

commit:3742ad8875529237f5ef508b5126ed1d6cbf910f

测试步骤:

mysql> show create table test;
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                      |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| test  | CREATE TABLE `test` (
  `id` int NOT NULL,
  `title` varchar(255) DEFAULT NULL,
  `content` text DEFAULT NULL,
  PRIMARY KEY (`id`),
 FULLTEXT `f01`(`content`)
) |
+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

mysql> load data local infile '/root/weilu/zhwiki-latest-pages-articles-multistream6.xml-p7096380p8596379.csv' into table test fields terminated by ',' enclosed by '"' escaped by '';
Query OK, 561662 rows affected (49.31 sec)

mysql> select count(*) from test;
+----------+
| count(*) |
+----------+
|   561662 |
+----------+
1 row in set (0.01 sec)

根据全文索引名称查找全文索引表名
mysql> select name, index_table_name from  mo_catalog.mo_indexes where name = 'f01';
+------+-----------------------------------------------------------+
| name | index_table_name                                          |
+------+-----------------------------------------------------------+
| f01  | __mo_index_secondary_01951cd8-ceb6-7119-953c-50c4244bf512 |
+------+-----------------------------------------------------------+
1 row in set (0.02 sec)

查找索引计数,取前200个作为keywords文件内容
mysql> select count(*) as word_count,word from `__mo_index_secondary_01951c1d-1589-756a-a9dc-671e85e2fae4` where word like '%是%' group by word order by word_count desc limit 200 ;
+------------+-----------+
| word_count | word      |
+------------+-----------+
|      76444 | 是        |
|      24569 | 一用是    |
|      24564 | 是書生    |
|      24561 | 用是書    |
|      18777 | 是位於    |
|      10235 | 是一种    |
|       9254 | 是一個    |
|       8942 | 是一个    |
|       8699 | 可能是    |
|       8090 | 是一名    |
|       7021 | 是一位    |
|       6174 | 是法国    |
|       5826 | 而不是    |
|       5443 | 是一座    |
|       5394 | 的是      |
|       5354 | 是日本    |
|       5218 | 是中国    |
|       4696 | 是在      |
|       4416 | 尤其是    |
|       4199 | 认为是    |
|       4140 | 是一部    |
|       4045 | 是一種    |
|       3970 | 此站是    |
|       3673 | 主要是    |
|       3537 | 認為是    |
|       3466 | 是因為    |
|       3311 | 但是      |
|       3250 | 是美国    |
.......

创建keywords.txt文档

使用mo-load工具测试并发:
git clone https://github.com/matrixorigin/mo-load.git
cd mo-load
mkdir fulltext
cd fulltext
vi case.yml
name: "fulltext_test"
runMins: 5
runTimes: 0
transaction:
  - name: "fulltext_test"
    terminal: 10
    scripts:
      - sql: "SELECT * from test.test where match(content) against ('{str1}' in natural language mode) order by id desc limit 100;"

vi replace.yml
replace:
  - name: str1
    type: file
    path: /root/weilu/mo-load/replace/temp_fulltext_output.txt
    dtype: "varchar"


cd mo-load
mkdir replace
cd replace
将文件temp_fulltext_output.txt放入该目录

cd mo-load
./run.sh -c cases/fulltext -b test -t 10 (-t后指定测试并发数)

测试结果: Image

heap and malloc profile: 10并发: heap_0219_10.zip malloc_0219_10.zip

50并发: heap_0219_50.zip malloc_0219_50.zip

100并发: heap_0219_100.zip malloc_0219_100.zip

Ariznawlll avatar Feb 20 '25 06:02 Ariznawlll

We have to give the stats start from 1 wiki files to all wiki files. We want to see the limit of our system.

cpegeric avatar Feb 20 '25 09:02 cpegeric

数据量800M
table schema:
CREATE TABLE `test` (
  `id` int NOT NULL,
  `title` varchar(255) DEFAULT NULL,
  `content` text DEFAULT NULL,
  PRIMARY KEY (`id`),
 FULLTEXT `f01`(`content`)
)

并发测试sql: SELECT * from test.test where match(content) against ('+string1 +string2' in boolean mode) limit 100;

Image

并发测试sql: SELECT * from test.test where match(content) against ('+string1 string2' in boolean mode) limit 100;

Image

the performance is still poor

Ariznawlll avatar Mar 25 '25 09:03 Ariznawlll

we have also fixed the issue https://github.com/matrixorigin/matrixone/issues/21678 https://github.com/matrixorigin/matrixone/issues/21774

@Ariznawlll we try again and see any improvement.

cpegeric avatar May 13 '25 08:05 cpegeric