matrixone
matrixone copied to clipboard
[Feature Request]: Secondary index Key support
Is there an existing issue for the same feature request?
- [X] I have checked the existing issues.
Is your feature request related to a problem?
we have showcase SQL using a lot of secondary index Key and Unique Key:
CREATE TABLE IF NOT EXISTS `t_code_rule` (
`code_id` bigint(20) NOT NULL AUTO_INCREMENT,
`code_no` varchar(50) NOT NULL,
`org_no` varchar(50) NOT NULL,
`org_name` varchar(50) NOT NULL,
`ancestors` varchar(255) NOT NULL,
`code_rule_no` varchar(50) NOT NULL,
`code_rule_name` varchar(50) NOT NULL,
`code_name` varchar(50) NOT NULL,
`code_type` int(11) NOT NULL DEFAULT '0',
`split` varchar(50) DEFAULT NULL,
`remark` varchar(255),
`create_time` datetime NOT NULL,
`create_user` varchar(50) DEFAULT NULL,
`last_update_time` datetime DEFAULT NULL,
`last_update_user` varchar(50) DEFAULT NULL,
`is_system_code` varchar(2) NOT NULL DEFAULT 'N',
PRIMARY KEY (`code_id`),
UNIQUE KEY `code_type` (`code_type`),
KEY `code_no` (`code_no`),
KEY `code_rule_no` (`code_rule_no`),
KEY `org_no` (`org_no`)
);
Describe the feature you'd like
As MatrixOne doesn't work as other databases, the secondary index's implementation details are hidden from users. So for Key, we just need to support the SQL.
This feature is for users to avoid modifying their DDL SQLs.
Describe implementation you've considered
No response
Documentation, Adoption, Use Case, Migration Strategy
No response
Additional information
No response
we are concerned unique key is still very important and maybe we can discuss about the priority. @JinHai-CN
@XuPeng-SH Consider a simple strategy like the following
- Data in DN always indexed by primary key. No secondary key, or seconday key points back to primary key.
- Data stored in S3, in each segment, sort by primary key. Build zonemap for ALL columns. Build bloomfilter for primary key columns.
- If table has a secondary index, store another copy of data, sort by secondary key, build zonemap for ALL columns, no bloomfilter.
Background jobs compact/merge both copies of data.
will support later
will solve it after unique index is complete
will solve it after unique index pr is merge
will solve it after unique index pr is merge
the secondary key will first support grammer, the index table is need or not will be discuss.
now secondary key is support in the ddl, don't create index table, I think when the pr is merge, the issue can be close.
now secondary key is support in the ddl, don't create index table, I think when the pr is merge, the issue can be close.
now secondary key is support in the ddl, issue is https://github.com/matrixorigin/matrixone/issues/7492
duplicate with #7490
Update Oct 16
Done
- Support duplicates in the Secondary Index (by appending PK)
- PK aliasing to resolve "show create table" and "show index from" filtering issues https://github.com/matrixorigin/matrixone/pull/12093/commits/4688a443876a45a55e8f9448b4ed0989a5da80bd
- Fixed Existing BVT failures https://github.com/matrixorigin/matrixone/pull/12093/commits/a8988491b7c3dd6842cccacc32ed9a70ca41b7ad
- Adding new BVT for secondary index. [Almost done]
Pending
- Secondary Index table is not deleted during column drop. https://github.com/matrixorigin/matrixone/issues/12139
- Secondary Index table doesn't support inserting NULL [WIP]
- Secondary Index table removes NULL'ed records after Updation. [WIP]
The PR https://github.com/matrixorigin/matrixone/pull/12093 is merged into the master. Transfering it to @aressu1985
The document is here.
add with the implementation of https://github.com/matrixorigin/matrixone/issues/12517
结论:
- 单并发单表场景下,若k值较大,有次级索引性能较无次级索引性能明显提升;若K值较小,次级索引效果不明显,甚至有下降;
- 多并发多表场景下,测试sysbench 1000w性能获取qps以及cost,有次级索引性能较无次级索引性能有明显提升。
测试详情: 测试commit:b49cc70872b55f58b368c7c4c91b33c2a6f5057b 测试环境:128 单机 It works in the current sql type:select c from sbtest%u where k=?
测试场景:
-
单并发单表:
1.1 800w行: k = 100000,无次级索引测试结果为0.04s,有次级索引测试结果为0.02s; k = 9999, 无次级索引测试结果为0.21s,有次级索引测试结果为0.42s; 1.2 1500w行: k = 999982, 无次级索引测试结果为0.44s,有次级索引测试结果为0.04s; 1.3 2000w行: k = 50,无次级索引测试结果为0.32s,有次级索引测试结果为0.31s.
-
多并发多表:
sysbench1000w下的点查场景,共涉及10张表(sbtest1-sbtest10),无次级索引表结构如下:
100并发:
sysbench --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111 --db-ps-mode=disable oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=10000000 --threads=100 --time=300 --report-interval=10 --create_secondary=off --auto_inc=off --range_selects=off --point_selects=1 run
测试结果:
500并发:
sysbench --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111 --db-ps-mode=disable oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=10000000 --threads=500 --time=300 --report-interval=10 --create_secondary=off --auto_inc=off --range_selects=off --point_selects=1 run
测试结果:
sysbench1000w下的点查场景,共涉及10张表(sbtest1-sbtest10),带有次级索引表结构如下(设置列k为次级索引列):
100并发:
sysbench --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111 --db-ps-mode=disable oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=10000000 --threads=100 --time=300 --report-interval=10 --create_secondary=off --auto_inc=off --range_selects=off --point_selects=1 run
测试结果:
500并发:
sysbench --mysql-host=127.0.0.1 --mysql-port=6001 --mysql-user=dump --mysql-password=111 --db-ps-mode=disable oltp_point_select.lua --mysql-db=sysbench_db --tables=10 --table_size=10000000 --threads=500 --time=300 --report-interval=10 --create_secondary=off --auto_inc=off --range_selects=off --point_selects=1 run
测试结果:
性能结果汇总:
track with new https://github.com/matrixorigin/matrixone/issues/12517