tidb
tidb copied to clipboard
planner: remove useless selection when the column in where clause is the same as partition clause.
Enhancement
table github_events
partition by column type
, so the return rows of table scan are all WatchEvent
type. There is no need to do selection again in this case.
mysql> explain analyze SELECT action, count(*) FROM github_events WHERE type = 'WatchEvent' GROUP BY action;
+----------------------------------------+---------------+-----------+--------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+----------------------------------------+---------------+-----------+--------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+------+
| Projection_5 | 1.00 | 1 | root | | time:1.89s, loops:2, Concurrency:OFF | gharchive_dev.github_events.action, Column#33 | 1016 Bytes | N/A |
| └─TableReader_47 | 1.00 | 1 | root | partition:watch_event | time:1.89s, loops:2, cop_task: {num: 5, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache_hit_ratio: 0.00} | data:ExchangeSender_46 | N/A | N/A |
| └─ExchangeSender_46 | 1.00 | 1 | mpp[tiflash] | | tiflash_task:{proc max:1.89s, min:1.89s, avg: 1.89s, p80:1.89s, p95:1.89s, iters:1, tasks:4, threads:88} | ExchangeType: PassThrough | N/A | N/A |
| └─Projection_42 | 1.00 | 1 | mpp[tiflash] | | tiflash_task:{proc max:1.89s, min:1.89s, avg: 1.89s, p80:1.89s, p95:1.89s, iters:1, tasks:4, threads:88} | Column#33, gharchive_dev.github_events.action | N/A | N/A |
| └─HashAgg_43 | 1.00 | 1 | mpp[tiflash] | | tiflash_task:{proc max:1.89s, min:1.89s, avg: 1.89s, p80:1.89s, p95:1.89s, iters:1, tasks:4, threads:4} | group by:gharchive_dev.github_events.action, funcs:sum(Column#40)->Column#33, funcs:firstrow(gharchive_dev.github_events.action)->gharchive_dev.github_events.action | N/A | N/A |
| └─ExchangeReceiver_45 | 1.00 | 4 | mpp[tiflash] | | tiflash_task:{proc max:1.89s, min:1.89s, avg: 1.89s, p80:1.89s, p95:1.89s, iters:4, tasks:4, threads:88} | | N/A | N/A |
| └─ExchangeSender_44 | 1.00 | 4 | mpp[tiflash] | | tiflash_task:{proc max:1.89s, min:0s, avg: 471.9ms, p80:1.89s, p95:1.89s, iters:1024, tasks:4, threads:88} | ExchangeType: HashPartition, Hash Cols: [name: gharchive_dev.github_events.action, collate: utf8mb4_unicode_ci] | N/A | N/A |
| └─HashAgg_10 | 1.00 | 4 | mpp[tiflash] | | tiflash_task:{proc max:1.89s, min:0s, avg: 471.4ms, p80:1.89s, p95:1.89s, iters:1024, tasks:4, threads:4} | group by:gharchive_dev.github_events.action, funcs:count(1)->Column#40 | N/A | N/A |
| └─Selection_41 | 305500065.00 | 323645427 | mpp[tiflash] | | tiflash_task:{proc max:1.18s, min:0s, avg: 296.2ms, p80:1.18s, p95:1.18s, iters:5720, tasks:4, threads:88} | eq(gharchive_dev.github_events.type, "WatchEvent") | N/A | N/A |
| └─TableFullScan_40 | 4875369803.00 | 323645427 | mpp[tiflash] | table:github_events | tiflash_task:{proc max:218.8ms, min:0s, avg: 54.7ms, p80:218.8ms, p95:218.8ms, iters:5720, tasks:4, threads:88} | keep order:false, PartitionTableScan:true | N/A | N/A |
+----------------------------------------+---------------+-----------+--------------+-----------------------+-----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+------+
10 rows in set (1.90 sec)
In general, when the column in where clause is the same as partition clause, there is no need to do the selection.
/cc @fixdb
Currently we don't do the optimization to avoid the regression that we might miss some indexes. But we can possibly limit the case to filter where there is only 1 condition and that the filter condition is on partition column.
More info in this PR: https://github.com/pingcap/tidb/pull/35298. If the optimisation is limited to non-indexed single value LIST partition (possibly single value RANGE partition) it would work without any side effects. Would such limited optimisation be valuable?
but it has issues with index-out-of-range and having other side-effects
Could you please give some examples about that? IMO, when the query like
select ... from github_events WHERE type = 'WatchEvent' [AND...] ...
we can just skip the filter condition type = 'WatchEvent'
, and the Selection is executing in the column store engine tiflash.
The pruning happens fairly early in the optimizer process and if it removes that specific filter condition, it may affect the cost/possibility of using tikv and a possible index in finding the best plan (lowest cost) to pick.
Say that there would be an index (type, ...)
then removing the type = 'WatchEvent
condition would mean that the index would not be considered, or at least have a much higher cost than if the condition on the first column of the index was still there.
Or if the partition would be defined as partition watch_event values in ('WatchEvent','OtherEvent')
then the filter condition would still be needed to not return also matching 'OtherEvent'.
To see the benefit, could you try to specify the partition as well, like
select ... from github_events partition(watch_event) WHERE type = 'WatchEvent' [AND...] ...
Which would only use the watch_event partition.
Also notice that the optimization that was reverted only was for RANGE partitioned tables (which normal use is for multi-value ranges, but it should really have been checking that the partition's range was a single value). So it was never implemented for LIST partitioning.
Say that there would be an
index (type, ...)
then removing thetype = 'WatchEvent
condition would mean that the index would not be considered
It only happen when there are indexes like index([...], type, ...)
, and the other condition should in the index. We can spot it before pruning?
if the partition would be defined as
partition watch_event values in ('WatchEvent','OtherEvent')
then the filter condition would still be needed to not return also matching 'OtherEvent'.
We can limit that the condition in where clause of type
is the same as partition condition.
Would such limited optimisation be valuable?
I think it is valuable. When I create partition, it usually means I will query the rows inside the partition table, which means a lot of my query have where clause like WHERE type = 'WatchEvent' [AND ...]
. In the case of this issue, the execution of selection is nearly half of the total execution time.
Meanwhile, the big query of tidb usually execute in tiflash, those side effect do not make sense in tiflash.
@mjonss @Lloyd-Pottiger How about this? When the partition is list partition, filter is on partition key with eq condition, when there is only 1 condition, or there are multiple conjunctive conditions but no indexes on partition key, we can safely remove that filter condition.
For filters with multiple conditions, we can provide a system variable to let user to control whether to remove the eq filter on list partition.
@mjonss @Lloyd-Pottiger How about this? When the partition is list partition, filter is on partition key with eq condition, when there is only 1 condition, or there are multiple conjunctive conditions but no indexes on partition key, we can safely remove that filter condition.
For filters with multiple conditions, we can provide a system variable to let user to control whether to remove the eq filter on list partition.
LGTM.
when there is only 1 condition
I prefer it is described as there are conditions the same as partition condition in order to solve partition watch_event values in ('WatchEvent','OtherEvent', ...)
I prefer it is described as there are conditions the same as partition condition in order to solve partition watch_event values in ('WatchEvent','OtherEvent', ...)
Looks good.
@mjonss What do you think? Can you help resolve this issue?
Yes, we can implement this filter/condition removal, when the following conditions match:
- the condition is completely covered by the partition definition (like LIST partition with a single value) and the condition is part of a full set of conditions that are conjunctive (only filter can fully manage disjunctive normal form).
- no indexes starting with the column in the condition.
- condition is EQ or IN (which can later be extended to check all that all values for a set of partitions are covered, so that the condition/filter is completely matched just by the selected set of partitions).
I would prefer to not have yet another system variable, but am OK with having it temporarily in one non-LTS version during stabilising the optimisation.
@mjonss Sounds good, let's do the filter/condition removal when the conditions match.