stonedb
stonedb copied to clipboard
bug: 执行外连接的查询过于耗时, tianmu需要执行十四分钟, innodb执行外连接仅需要执行一分钟
###sql_text
select a.*
FROM c_acct a
LEFT JOIN (SELECT b.account_id, b.fiscal_date, b.balance
FROM (SELECT account_id, max(fiscal_date) fiscal_date
FROM c_day
WHERE deleted_flag = '0'
GROUP BY account_id) a
JOIN c_day b
ON a.account_id = b.account_id
AND a.fiscal_date = b.fiscal_date
WHERE b.deleted_flag = '0') c
ON a.row_id = c.account_id
mysql> select count(*) from c1md_bank_acct;
+----------+
| count(*) |
+----------+
| 1016 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from c1am_acct_day;
+----------+
| count(*) |
+----------+
| 3251200 |
+----------+
1 row in set (0.00 sec)
###InnoDB explain plan
+----+-------------+---------------+------------+-------+-------------------------------------------------+-----------------+---------+----------------------------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+-------+-------------------------------------------------+-----------------+---------+----------------------------+---------+----------+-------------+
| 1 | PRIMARY | a | NULL | ALL | NULL | NULL | NULL | NULL | 1022 | 100.00 | NULL |
| 1 | PRIMARY | <derived3> | NULL | ref | <auto_key1> | <auto_key1> | 8 | mbs.a.ROW_ID | 409 | 100.00 | NULL |
| 1 | PRIMARY | b | NULL | ref | IDX_AMACCTDAY02,IDX_AMACCTDAY03,IDX_AMACCTDAY04 | IDX_AMACCTDAY02 | 12 | mbs.a.ROW_ID,a.fiscal_date | 1 | 100.00 | Using where |
| 3 | DERIVED | c_day | NULL | index | IDX_AMACCTDAY02,IDX_AMACCTDAY03 | IDX_AMACCTDAY03 | 8 | NULL | 3136482 | 10.00 | Using where |
+----+-------------+---------------+------------+-------+-------------------------------------------------+-----------------+---------+----------------------------+---------+----------+-------------+
4 rows in set, 1 warning (0.00 sec)
The optimizer of InnoDB prioritizes selecting small tables for association, with an execution time of 6 seconds.
###Tianmu explain plan
+----+-------------+---------------+------------+------+-------------------------------------------------+-------------+---------+------------------------------------+---------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+-------------------------------------------------+-------------+---------+------------------------------------+---------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | PRIMARY | a | NULL | ALL | NULL | NULL | NULL | NULL | 1016 | 100.00 | NULL |
| 1 | PRIMARY | b | NULL | ALL | IDX_AMACCTDAY02,IDX_AMACCTDAY03,IDX_AMACCTDAY04 | NULL | NULL | NULL | 3251200 | 0.14 | Using where |
| 1 | PRIMARY | <derived3> | NULL | ref | <auto_key0> | <auto_key0> | 12 | mbs.b.ACCOUNT_ID,mbs.b.FISCAL_DATE | 320 | 100.00 | Using where; Using index |
| 3 | DERIVED | c_day | NULL | ALL | NULL | NULL | NULL | NULL | 3251200 | 10.00 | Using where with pushed condition (`mbs`.`c_day`.`DELETED_FLAG` = '0')(t0) Pckrows: 50, susp. 50 (0 empty 0 full). Conditions: 1; Using temporary; Using filesort |
+----+-------------+---------------+------------+------+-------------------------------------------------+-------------+---------+------------------------------------+---------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
4 rows in set, 1 warning (0.03 sec)
The optimizer of Tianmu correlates according to the order of table connections, with an execution time of 10 minutes.
ACK
T:-2.LEFT_JOIN_ON({T:-1},{T:-4,T:-5},C:1)
The problem is that tianmu executes LEFT_JOIN_ON. Changing the table order of the execution plan has little impact on the query time
+----+-------------+---------------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1 | PRIMARY | a | NULL | ALL | NULL | NULL | NULL | NULL | 1016 | 100.00 | NULL |
| 1 | PRIMARY | <derived3> | NULL | ALL | NULL | NULL | NULL | NULL | 325120 | 100.00 | Using where; Using join buffer (Block Nested Loop) |
| 1 | PRIMARY | xx | NULL | ALL | NULL | NULL | NULL | NULL | 3251200 | 100.00 | Using where; Using join buffer (Block Nested Loop) |
| 3 | DERIVED | c1am_acct_day | NULL | ALL | NULL | NULL | NULL | NULL | 3251200 | 10.00 | Using where with pushed condition (`mbs`.`c1am_acct_day`.`DELETED_FLAG` = '0')(t0) Pckrows: 50, susp. 50 (0 empty 0 full). Conditions: 1; Using temporary; Using filesort |
+----+-------------+---------------+------------+------+---------------+------+---------+------+---------+----------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Add the WHERE c.account_id IS NOT NULL condition so that null does not exist in the result set, thus converting the outer join to the inner join. In this way, the join condition can be pushed up and down according to the situation, so that the query takes less than 1 second
But for the slow left join, we need to find a calm point. If the problem is nested loop, it needs to be redesigned
同样是客户POC的问题, 影响客户是否使用天幕的决策,必须要尽快解决,本问题单继续对该问题做分析, 尽快处理该问题以满足客户的需求
对比该SQL的外连接,进行内连接则具有显著的查询性能提升,此因素需要被操作符的物理计划上的逐个分析
可以对比下内连接的执行计划,几乎所有的条件都被上提,而外连接却是在执行nested loop, 由于物化表没有聚集处理,导致时间复杂度暴增
我想想,外连接和内连接差别如此巨大,有点意思,代码写的有点屌
由于为客户POC拿下合同的紧迫性,在客户的现场,建议客户在自己的业务SQL中添加空值拒绝的语义以转为内连接,以在客户现场演示环节向客户展现连接查询的性能
在进行POC以拿下客户的同时,对外连接进行彻底的分析
构建的查询计划倒是符合预期,重点在于物理计划的执行
void CompiledQuery::LeftJoinOn(const TabID &temp_table, std::vector<TabID> &left_tables,
std::vector<TabID> &right_tables, const CondID &cond_id) {
CompiledQuery::CQStep s;
s.type = StepType::LEFT_JOIN_ON;
s.t1 = temp_table;
s.c1 = cond_id;
s.tables1 = left_tables;
s.tables2 = right_tables;
steps.push_back(s);
}
此处尤其要对比内连接的表结构见的关系, 与外连接不同,内连接对于表做了不同的处理
void CompiledQuery::InnerJoinOn(const TabID &temp_table, std::vector<TabID> &left_tables,
std::vector<TabID> &right_tables, const CondID &cond_id) {
CompiledQuery::CQStep s;
s.type = StepType::INNER_JOIN_ON;
s.t1 = temp_table;
s.c1 = cond_id;
s.tables1 = left_tables;
s.tables1.insert(s.tables1.end(), right_tables.begin(), right_tables.end());
steps.push_back(s);
}
这段代码写的有点水平,有点意思,虽说关系型数据库的关系不是标准的集合论,是采用的包模型,不过这块代码确实出彩,干开源的弟兄们看代码时候,可以多看看这块,有两下子
此连接操作在物理执行的时候在执行多重循环,更为麻烦的地方在于内连接不是一个连接操作,而是多个列组成的虚拟表的扫描过滤
void TempTable::AddInnerConds(Condition *cond, std::vector<TabID> &dim_aliases) {
for (uint i = 0; i < cond->Size(); i++)
for (uint j = 0; j < dim_aliases.size(); j++) (*cond)[i].left_dims[GetDimension(dim_aliases[j])] = true;
filter.AddConditions(cond, CondType::ON_INNER_FILTER);
}
void TempTable::AddLeftConds(Condition *cond, std::vector<TabID> &left_aliases, std::vector<TabID> &right_aliases) {
for (uint i = 0; i < cond->Size(); i++) {
for (int d = 0; d < (*cond)[i].left_dims.Size(); d++) (*cond)[i].left_dims[d] = (*cond)[i].right_dims[d] = false;
for (uint j = 0; j < left_aliases.size(); j++) (*cond)[i].left_dims[GetDimension(left_aliases[j])] = true;
for (uint j = 0; j < right_aliases.size(); j++) (*cond)[i].right_dims[GetDimension(right_aliases[j])] = true;
}
filter.AddConditions(cond, CondType::ON_LEFT_FILTER);
}
if (false_desc_found && outer_dims.IsEmpty()) {
all_dims.Plus(cond[0].left_dims); // for FALSE join condition
// DimensionUsed() does not mark anything
for (int i = 0; i < mind->NumOfDimensions(); i++)
if (all_dims[i])
mind->Empty(i);
return; // all done
}
可以看出当将T4表和T5表都与T2表做了内连接后,随后的条件过滤都只针对T2表进行
之所以能这么做,涉及到集合论中的语义,可以参考韦恩图中的示意
通过将join操作符转换为限制的做法,成功的避免了对join的耗时处理
但是这玩意说的简单,具体是怎么做的,以及对于外连接的语义的处理
由此成功的将两趟算法,改变为了一趟算法
join操作符的物理计划至少是个两趟的算法,此处对比innodb对于外连接的处理,很明显是由于物化表的数据非聚集
T:-2.LEFT_JOIN_ON({T:-1},{T:-4,T:-5},C:1)
#0 Tianmu::core::ParameterizedFilter::UpdateMultiIndex (this=0x7f25983959e0, count_only=false, limit=-1) at /root/work/stonedb-dev-202305026/storage/tianmu/core/parameterized_filter.cpp:1117
#1 0x0000000002cff56e in Tianmu::core::Query::Preexecute (this=0x7f28c1d507d0, qu=..., sender=0x7f25fd1212f0, display_now=true)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/query.cpp:797
#2 0x0000000002ceba5e in Tianmu::core::Engine::Execute (this=0x53dbeb0, thd=0x7f25fc000e10, lex=0x7f25fc003138, result_output=0x7f25fc006ab8, unit_for_union=0x0)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/engine_execute.cpp:513
#3 0x0000000002cea74b in Tianmu::core::Engine::HandleSelect (this=0x53dbeb0, thd=0x7f25fc000e10, lex=0x7f25fc003138, result=@0x7f28c1d50dc8: 0x7f25fc006ab8, setup_tables_done_option=0,
res=@0x7f28c1d50dc4: 0, is_optimize_after_tianmu=@0x7f28c1d50dbc: 1, tianmu_free_join=@0x7f28c1d50dc0: 1, with_insert=0)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/engine_execute.cpp:243
#4 0x0000000003084438 in Tianmu::DBHandler::ha_my_tianmu_query (thd=0x7f25fc000e10, lex=0x7f25fc003138, result_output=@0x7f28c1d50dc8: 0x7f25fc006ab8, setup_tables_done_option=0,
res=@0x7f28c1d50dc4: 0, is_optimize_after_tianmu=@0x7f28c1d50dbc: 1, tianmu_free_join=@0x7f28c1d50dc0: 1, with_insert=0)
at /root/work/stonedb-dev-202305026/storage/tianmu/sql/ha_my_tianmu.cpp:95
#5 0x0000000002427a28 in execute_sqlcom_select (thd=0x7f25fc000e10, all_tables=0x7f25fc005ee0) at /root/work/stonedb-dev-202305026/sql/sql_parse.cc:5204
#6 0x0000000002420d9e in mysql_execute_command (thd=0x7f25fc000e10, first_level=true) at /root/work/stonedb-dev-202305026/sql/sql_parse.cc:2847
#7 0x0000000002428a8d in mysql_parse (thd=0x7f25fc000e10, parser_state=0x7f28c1d51f90) at /root/work/stonedb-dev-202305026/sql/sql_parse.cc:5642
#8 0x000000000241da84 in dispatch_command (thd=0x7f25fc000e10, com_data=0x7f28c1d52730, command=COM_QUERY) at /root/work/stonedb-dev-202305026/sql/sql_parse.cc:1495
#9 0x000000000241c8c5 in do_command (thd=0x7f25fc000e10) at /root/work/stonedb-dev-202305026/sql/sql_parse.cc:1034
#10 0x000000000254ee35 in handle_connection (arg=0x9467150) at /root/work/stonedb-dev-202305026/sql/conn_handler/connection_handler_per_thread.cc:313
#11 0x0000000002c1e674 in pfs_spawn_thread (arg=0x929f420) at /root/work/stonedb-dev-202305026/storage/perfschema/pfs.cc:2197
#12 0x00007f2911a3d1ca in start_thread () from /lib64/libpthread.so.0
#13 0x00007f290e8b3e73 in clone () from /lib64/libc.so.6
尤其要小心对于join的操作的过滤处理
void ParameterizedFilter::PrepareJoiningStep(Condition &join_desc, Condition &desc, int desc_no, MultiIndex &mind_) {
// join parameters based on the first joining condition
DimensionVector dims1(mind_.NumOfDimensions());
desc[desc_no].DimensionUsed(dims1);
mind_.MarkInvolvedDimGroups(dims1);
DimensionVector cur_outer_dim(desc[desc_no].right_dims);
bool outer_present = !cur_outer_dim.IsEmpty();
// add join (two-table_) conditions first
for (uint i = desc_no; i < desc.Size(); i++) {
if (desc[i].done || desc[i].IsDelayed() || !desc[i].IsType_JoinSimple())
continue;
DimensionVector dims2(mind_.NumOfDimensions());
desc[i].DimensionUsed(dims2);
if (desc[i].right_dims == cur_outer_dim && (outer_present || dims1.Includes(dims2))) {
// can be executed together if all dimensions of the other condition are
// present in the base one or in case of outer join
join_desc.AddDescriptor(desc[i]);
desc[i].done = true; // setting in advance, as we already copied the
// descriptor to be processed
}
}
// add the rest of conditions (e.g. one-dimensional outer conditions), which
// are not "done" yet
for (uint i = desc_no; i < desc.Size(); i++) {
if (desc[i].done || desc[i].IsDelayed())
continue;
DimensionVector dims2(mind_.NumOfDimensions());
desc[i].DimensionUsed(dims2);
if (desc[i].right_dims == cur_outer_dim && (outer_present || dims1.Includes(dims2))) {
// can be executed together if all dimensions of the other condition are
// present in the base one or in case of outer join
join_desc.AddDescriptor(desc[i]);
desc[i].done = true; // setting in advance, as we already copied the descriptor to be processed
}
}
}
内连接也是使用了hash join
#0 Tianmu::core::ParallelHashJoiner::ExecuteJoinConditions (this=0x7f3084905fe0, cond=...) at /root/work/stonedb-dev-202305026/storage/tianmu/core/parallel_hash_join.cpp:236
#1 0x00000000030e76a3 in Tianmu::core::ParameterizedFilter::UpdateJoinCondition (this=0x7f30848f91a0, cond=..., tips=...)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/parameterized_filter.cpp:718
#2 0x00000000030eb532 in Tianmu::core::ParameterizedFilter::UpdateMultiIndex (this=0x7f30848f91a0, count_only=false, limit=-1)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/parameterized_filter.cpp:1444
#3 0x0000000002cff56e in Tianmu::core::Query::Preexecute (this=0x7f33962517d0, qu=..., sender=0x7f3084009ed0, display_now=true)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/query.cpp:797
#4 0x0000000002ceba5e in Tianmu::core::Engine::Execute (this=0x644aeb0, thd=0x7f3084000e10, lex=0x7f3084003138, result_output=0x7f3084006ab8, unit_for_union=0x0)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/engine_execute.cpp:513
#5 0x0000000002cea74b in Tianmu::core::Engine::HandleSelect (this=0x644aeb0, thd=0x7f3084000e10, lex=0x7f3084003138, result=@0x7f3396251dc8: 0x7f3084006ab8, setup_tables_done_option=0,
res=@0x7f3396251dc4: 0, is_optimize_after_tianmu=@0x7f3396251dbc: 1, tianmu_free_join=@0x7f3396251dc0: 1, with_insert=0)
at /root/work/stonedb-dev-202305026/storage/tianmu/core/engine_execute.cpp:243
#6 0x0000000003084438 in Tianmu::DBHandler::ha_my_tianmu_query (thd=0x7f3084000e10, lex=0x7f3084003138, result_output=@0x7f3396251dc8: 0x7f3084006ab8, setup_tables_done_option=0,
res=@0x7f3396251dc4: 0, is_optimize_after_tianmu=@0x7f3396251dbc: 1, tianmu_free_join=@0x7f3396251dc0: 1, with_insert=0)
at /root/work/stonedb-dev-202305026/storage/tianmu/sql/ha_my_tianmu.cpp:95
#7 0x0000000002427a28 in execute_sqlcom_select (thd=0x7f3084000e10, all_tables=0x7f3084005ee0) at /root/work/stonedb-dev-202305026/sql/sql_parse.cc:5204
#8 0x0000000002420d9e in mysql_execute_command (thd=0x7f3084000e10, first_level=true) at /root/work/stonedb-dev-202305026/sql/sql_parse.cc:2847
#9 0x0000000002428a8d in mysql_parse (thd=0x7f3084000e10, parser_state=0x7f3396252f90) at /root/work/stonedb-dev-202305026/sql/sql_parse.cc:5642
#10 0x000000000241da84 in dispatch_command (thd=0x7f3084000e10, com_data=0x7f3396253730, command=COM_QUERY) at /root/work/stonedb-dev-202305026/sql/sql_parse.cc:1495
#11 0x000000000241c8c5 in do_command (thd=0x7f3084000e10) at /root/work/stonedb-dev-202305026/sql/sql_parse.cc:1034
#12 0x000000000254ee35 in handle_connection (arg=0xa4d3440) at /root/work/stonedb-dev-202305026/sql/conn_handler/connection_handler_per_thread.cc:313
#13 0x0000000002c1e674 in pfs_spawn_thread (arg=0xa30b5c0) at /root/work/stonedb-dev-202305026/storage/perfschema/pfs.cc:2197
#14 0x00007f33a05751ca in start_thread () from /lib64/libpthread.so.0
#15 0x00007f339d3ebe73 in clone () from /lib64/libc.so.6
在hash join对内连接和外连接做了完全不同的处理,外连接简直就是nested loop的另一种实现
void ParallelHashJoiner::InitOuter(Condition &cond) {
DimensionVector outer_dims(cond[0].right_dims); // outer_dims will be filled with nulls for
// non-matching tuples
if (!outer_dims.IsEmpty()) {
if (traversed_dims_.Includes(outer_dims)) {
// Watch the non-outer dim for unmatched tuples and add them with nulls on
// outer dim.
watch_matched_ = true;
uint64_t origin_size = mind->NumOfTuples(matched_dims_);
for (int index = 0; index < mind->NumOfDimensions(); index++) {
if (matched_dims_[index]) {
origin_size = std::max(origin_size, mind->OrigSize(index));
}
}
outer_matched_filter_.reset(new MutexFilter(origin_size, pack_power_, true));
} else if (matched_dims_.Includes(outer_dims)) {
watch_traversed_ = true;
}
outer_nulls_only_ = true;
for (int j = 0; j < outer_dims.Size(); j++)
if (outer_dims[j] && tips.null_only[j] == false)
outer_nulls_only_ = false;
}
}
(gdb) p traversed_hash_tables_[0].outer_filter_.get()[0].filter_.get()[0]
$10 = (Tianmu::core::Filter) {
<Tianmu::mm::TraceableObject> = {
_vptr.TraceableObject = 0x4411118 <vtable for Tianmu::core::Filter+16>,
next = 0x0,
prev = 0x0,
tracker = 0x0,
static m_MemHandling = 0x8137820,
m_preUnused = false,
static globalFreeable = {
<std::__atomic_base<unsigned long>> = {
static _S_alignment = 8,
_M_i = 8
},
members of std::atomic<unsigned long>:
static is_always_lock_free = true
},
static globalUnFreeable = {
<std::__atomic_base<unsigned long>> = {
static _S_alignment = 8,
_M_i = 535880
},
members of std::atomic<unsigned long>:
static is_always_lock_free = true
},
m_sizeAllocated = 0,
owner = 0x0,
m_locking_mutex = @0x81378c0,
m_coord = {
ID = Tianmu::core::COORD_TYPE::COORD_UNKNOWN,
co = {
pack = {
<Tianmu::core::object_id_helper::empty> = {<No data fields>},
members of Tianmu::core::ObjectId<(Tianmu::core::COORD_TYPE)0, 3, Tianmu::core::object_id_helper::empty>:
_coord = {0, 0, 0},
static ID = <optimized out>
},
rcattr = {
<Tianmu::core::object_id_helper::empty> = {<No data fields>},
members of Tianmu::core::ObjectId<(Tianmu::core::COORD_TYPE)4, 2, Tianmu::core::object_id_helper::empty>:
_coord = {0, 0},
static ID = <optimized out>
}
}
},
m_lock_count = 1
},
members of Tianmu::core::Filter:
static FB_FULL = 0 '\000',
--Type <RET> for more, q to quit, c to continue without paging--
static FB_EMPTY = 1 '\001',
static FB_MIXED = 2 '\002',
static bitBlockSize = 8192,
no_blocks = 1,
block_status = 0x7f747c000fd0 "\002",
block_changed = std::vector<bool> of length 1, capacity 64 = {1},
blocks = 0x7f747c000fb0,
block_allocator = 0x7f747c001090,
no_of_bits_in_last_block = 277,
shallow = false,
block_last_one = 0x7f747c000ff0,
block_filter = 0x7f747c000e90,
delayed_stats = -1,
delayed_block = -1,
delayed_stats_set = -1,
delayed_block_set = -1,
no_power = 16,
pack_def = 65536,
bit_block_pool = 0x7f747c001010,
bit_mut = std::shared_ptr<std::mutex> (use count 1, weak count 0) = {
get() = 0x7f747c001060
}
}