matrixone [Feature Request]: Secondary index Key support

Is there an existing issue for the same feature request?

[X] I have checked the existing issues.

Is your feature request related to a problem?

we have showcase SQL using a lot of secondary index Key and Unique Key:

CREATE TABLE IF NOT EXISTS `t_code_rule` (
  `code_id` bigint(20) NOT NULL AUTO_INCREMENT,
  `code_no` varchar(50) NOT NULL,
  `org_no` varchar(50) NOT NULL,
  `org_name` varchar(50) NOT NULL,
  `ancestors` varchar(255) NOT NULL,
  `code_rule_no` varchar(50) NOT NULL,
  `code_rule_name` varchar(50) NOT NULL,
  `code_name` varchar(50) NOT NULL,
  `code_type` int(11) NOT NULL DEFAULT '0',
  `split` varchar(50) DEFAULT NULL,
  `remark` varchar(255),
  `create_time` datetime NOT NULL,
  `create_user` varchar(50) DEFAULT NULL,
  `last_update_time` datetime DEFAULT NULL,
  `last_update_user` varchar(50) DEFAULT NULL,
  `is_system_code` varchar(2)  NOT NULL DEFAULT 'N',
  PRIMARY KEY (`code_id`),
  UNIQUE KEY `code_type` (`code_type`),
  KEY `code_no` (`code_no`),
  KEY `code_rule_no` (`code_rule_no`),
  KEY `org_no` (`org_no`)
);

Describe the feature you'd like

As MatrixOne doesn't work as other databases, the secondary index's implementation details are hidden from users. So for Key, we just need to support the SQL.

This feature is for users to avoid modifying their DDL SQLs.

Describe implementation you've considered

No response

Documentation, Adoption, Use Case, Migration Strategy

No response

Additional information

No response

Jul 01 '22 03:07 dengn

we are concerned unique key is still very important and maybe we can discuss about the priority. @JinHai-CN

Jul 06 '22 08:07 florashi181

@XuPeng-SH Consider a simple strategy like the following

Data in DN always indexed by primary key. No secondary key, or seconday key points back to primary key.
Data stored in S3, in each segment, sort by primary key. Build zonemap for ALL columns. Build bloomfilter for primary key columns.
If table has a secondary index, store another copy of data, sort by secondary key, build zonemap for ALL columns, no bloomfilter.

Background jobs compact/merge both copies of data.

Jul 09 '22 22:07 fengttt

will support later

Nov 09 '22 04:11 bbbearxyz

will solve it after unique index is complete

Dec 20 '22 14:12 bbbearxyz

will solve it after unique index pr is merge

Dec 23 '22 11:12 bbbearxyz

will solve it after unique index pr is merge

Dec 28 '22 11:12 bbbearxyz

the secondary key will first support grammer, the index table is need or not will be discuss.

Jan 03 '23 09:01 bbbearxyz

now secondary key is support in the ddl, don't create index table, I think when the pr is merge, the issue can be close.

Jan 09 '23 11:01 bbbearxyz

now secondary key is support in the ddl, don't create index table, I think when the pr is merge, the issue can be close.

Jan 12 '23 10:01 bbbearxyz

now secondary key is support in the ddl, issue is https://github.com/matrixorigin/matrixone/issues/7492

Jan 13 '23 07:01 bbbearxyz

duplicate with #7490

Feb 16 '23 06:02 XuPeng-SH

Update Oct 16

Done

Support duplicates in the Secondary Index (by appending PK)
PK aliasing to resolve "show create table" and "show index from" filtering issues https://github.com/matrixorigin/matrixone/pull/12093/commits/4688a443876a45a55e8f9448b4ed0989a5da80bd
Fixed Existing BVT failures https://github.com/matrixorigin/matrixone/pull/12093/commits/a8988491b7c3dd6842cccacc32ed9a70ca41b7ad
Adding new BVT for secondary index. [Almost done]

Pending

Secondary Index table is not deleted during column drop. https://github.com/matrixorigin/matrixone/issues/12139
Secondary Index table doesn't support inserting NULL [WIP]
Secondary Index table removes NULL'ed records after Updation. [WIP]

Oct 17 '23 04:10 arjunsk

The PR https://github.com/matrixorigin/matrixone/pull/12093 is merged into the master. Transfering it to @aressu1985

The document is here.

Oct 24 '23 04:10 arjunsk

add with the implementation of https://github.com/matrixorigin/matrixone/issues/12517

Nov 14 '23 07:11 florashi181

结论：

单并发单表场景下，若k值较大，有次级索引性能较无次级索引性能明显提升；若K值较小，次级索引效果不明显，甚至有下降；
多并发多表场景下，测试sysbench 1000w性能获取qps以及cost，有次级索引性能较无次级索引性能有明显提升。

测试详情： 测试commit：b49cc70872b55f58b368c7c4c91b33c2a6f5057b 测试环境：128 单机 It works in the current sql type：select c from sbtest%u where k=?

测试场景：

单并发单表：

1.1 800w行： k = 100000，无次级索引测试结果为0.04s，有次级索引测试结果为0.02s； k = 9999，无次级索引测试结果为0.21s，有次级索引测试结果为0.42s； 1.2 1500w行： k = 999982，无次级索引测试结果为0.44s，有次级索引测试结果为0.04s； 1.3 2000w行： k = 50，无次级索引测试结果为0.32s，有次级索引测试结果为0.31s.
多并发多表：

sysbench1000w下的点查场景，共涉及10张表（sbtest1-sbtest10），无次级索引表结构如下：

100并发：

sysbench --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111 --db-ps-mode=disable oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=10000000 --threads=100 --time=300 --report-interval=10 --create_secondary=off --auto_inc=off --range_selects=off --point_selects=1 run

测试结果：

500并发：

sysbench --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111 --db-ps-mode=disable oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=10000000 --threads=500 --time=300 --report-interval=10 --create_secondary=off --auto_inc=off --range_selects=off --point_selects=1 run

测试结果：

sysbench1000w下的点查场景，共涉及10张表（sbtest1-sbtest10），带有次级索引表结构如下（设置列k为次级索引列）：

100并发：

sysbench --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111 --db-ps-mode=disable oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=10000000 --threads=100 --time=300 --report-interval=10 --create_secondary=off --auto_inc=off --range_selects=off --point_selects=1 run

测试结果：

500并发：

sysbench --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111 --db-ps-mode=disable oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=10000000 --threads=500 --time=300 --report-interval=10 --create_secondary=off --auto_inc=off --range_selects=off --point_selects=1 run

测试结果：

性能结果汇总：

Nov 24 '23 11:11 Ariznawlll

track with new https://github.com/matrixorigin/matrixone/issues/12517

Jan 15 '24 12:01 florashi181

matrixone matrixone copied to clipboard

[Feature Request]: Secondary index Key support

Is there an existing issue for the same feature request?

Is your feature request related to a problem?

Describe the feature you'd like

Describe implementation you've considered

Documentation, Adoption, Use Case, Migration Strategy

Additional information

Update Oct 16

Done

Pending

matrixone
matrixone copied to clipboard