Predicate Columns Feature Evolution
Enhancement
- Temporarily change the default value to "all" and cherry-pick it back to 8.5. And we ensure no analyze performance regression during upgrades, if users were previously using "predicate," they will continue using "predicate" after the upgrade.
- Make more efficient and lower-cost collection of all NDVs the evolution goal for the next phase. Once this feature is delivered, collect NDV for all non-predicate columns, but don't collect TopN and Buckets.
- For SaaS scenarios, explore how to enable new tenants to inherit predicate columns.
- Once steps 1, 2, and 3 above are all completed, change the default value back to "predicate."
More background:
Investigating TiDB performance in analytics_benchmark V2
Background
We observed that TiDB's query performance was extremely poor in https://github.com/toddstoffel/analytics_benchmark, so we conducted an analysis and produced this analysis document. This analysis document mentions two important points. There are two main reasons for the slow queries:
- Not all tables have TiFlash replicas, which led to extremely poor query performance. After adding replicas, the query performance improved from 312 seconds to 52 seconds (note that this is the cumulative value of all queries).
- After performing ANALYZE TABLE ALL COLUMNS on all tables, the query performance improved from 52 seconds to 6.5 seconds. Therefore, we started a discussion on tidb_analyze_column_options, debating whether it should be set to "all" by default. So in this document, I will conduct an in - depth analysis of this issue again to explore it and provide you with some deeper insights.
Q14
An interesting discovery
When we talk about how "analyze" can significantly reduce execution latency, of course, this is true. However, what everyone needs to notice is that there is a very special query. The total execution time is 52 seconds, but its execution latency is 41.32 seconds. So, it's not as terrible as we described without all the statistics under AP. We just have a very poor specific query plan. The comparison method of total time consumption gives us this illusion that the performance of all queries is poor. Moreover, even after we perform "analyze", the performance of two queries regression.
Retest
As described in the document, the core reason is the issue of join order. In Q14, the join result between airlines and flights was used as the build side, followed by a join with airports. However, flights should instead be the probe side. After running analyze, the join order became optimal. Based on this problem, I conducted the following tests and continued to use the analysis suite:
CPU: Apple M4 Max (14) @ 4.51 GHz Memory: 29.03 GiB / 36.00 GiB (81%) Swap: 1.91 GiB / 3.00 GiB (64%) Disk (/): 476.55 GiB / 926.35 GiB (51%) - apfs [Read-only]
| Query | V8.5.3 (s) | Master (s) | Master + Analyze All (s) |
|---|---|---|---|
| 1.sql | 0.08 | 0.07 | 0.08 |
| 2.sql | 0.24 | 0.27 | 0.22 |
| 3.sql | 0.05 | 0.05 | 0.05 |
| 4.sql | 0.03 | 0.03 | 0.03 |
| 5.sql | 0.10 | 0.09 | 0.06 |
| 6.sql | 0.37 | 0.31 | 0.24 |
| 7.sql | 0.34 | 0.30 | 0.22 |
| 8.sql | 0.07 | 0.07 | 0.07 |
| 9.sql | 0.35 | 0.29 | 0.22 |
| 10.sql | 0.09 | 0.08 | 0.08 |
| 11.sql | 0.17 | 0.14 | 0.13 |
| 12.sql | 0.45 | 0.43 | 0.31 |
| 13.sql | 0.74 | 0.62 | 0.15 |
| 14.sql | 23.14 | 24.90 | 0.62 |
| 15.sql | 2.19 | 1.79 | 0.21 |
| 16.sql | 0.21 | 0.16 | 0.23 |
| 17.sql | 0.11 | 0.09 | 0.10 |
| 18.sql | 0.24 | 0.20 | 0.31 |
| 19.sql | 1.00 | 0.83 | 0.18 |
| 20.sql | 0.29 | 0.23 | 0.16 |
| Avg | 1.523 | 1.5975 | 0.184 |
| Sum | 30.46 | 31.95 | 3.68 |
Analyze the table:
- First, we used the standard v8.5.3 for testing. It was observed that SQL 14 was still very slow.
- Second, we used the latest master for testing. SQL 14 was still very slow.
- Third, we used the all option to analyze all tables. The execution time of SQL 14 immediately improved significantly and dropped to 0.62 seconds.
- Fourth, we used the all options, but this time we specified 0 topn and 0 histogram buckets, which means that each column only has the most basic NDV and null count statistics. I could still correctly select the execution plan.
In conclusion, the only statistics we really need are the most basic NDV and null count. This is the core factor that enables our joins to select the correct order. Note: Our current master code does not support specifying 0 topn and 0 histograms. Therefore, the following patch was used in the fourth test, and the table was analyzed using the following statement: analyze table airlines, airports, flights all columns with 0 topn, 0 BUCKETS;
diff --git a/pkg/planner/core/planbuilder.go b/pkg/planner/core/planbuilder.go
index d9266c5bf1..a9187ea973 100644
--- a/pkg/planner/core/planbuilder.go
+++ b/pkg/planner/core/planbuilder.go
@@ -3138,7 +3138,8 @@ func handleAnalyzeOptionsV2(opts []ast.AnalyzeOpt) (map[ast.AnalyzeOptionType]ui
if opt.Type == ast.AnalyzeOptNumSamples {
sampleNum = v
}
- if v == 0 || v > analyzeOptionLimit[opt.Type] {
+ // Allow 0 for NumBuckets to support minimal statistics (no histogram buckets)
+ if (v == 0 && opt.Type != ast.AnalyzeOptNumBuckets) || v > analyzeOptionLimit[opt.Type] {
return nil, errors.Errorf("Value of analyze option %s should be positive and not larger than %d", ast.AnalyzeOptionString[opt.Type], analyzeOptionLimit[opt.Type])
}
optMap[opt.Type] = v
@@ -3200,7 +3201,8 @@ func handleAnalyzeOptions(opts []ast.AnalyzeOpt, statsVer int) (map[ast.AnalyzeO
if opt.Type == ast.AnalyzeOptNumSamples {
sampleNum = v
}
- if v == 0 || v > analyzeOptionLimit[opt.Type] {
+ // Allow 0 for NumBuckets to support minimal statistics (no histogram buckets)
+ if (v == 0 && opt.Type != ast.AnalyzeOptNumBuckets) || v > analyzeOptionLimit[opt.Type] {
return nil, errors.Errorf("Value of analyze option %s should be positive and not larger than %d", ast.AnalyzeOptionString[opt.Type], analyzeOptionLimit[opt.Type])
}
optMap[opt.Type] = v
Detailed analysis
The core issue is still the join order. This is an execution plan that only collects statistics on predicate columns:
+---------------------------------------------------------------------------------+--------+-------+------------+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-------+
|id |estRows |actRows|task |access object |execution info |operator info |memory |disk |
+---------------------------------------------------------------------------------+--------+-------+------------+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-------+
|Sort_215 |29.97 |1080 |root | |time:50.1s, open:2.07ms, close:47.1ms, loops:3, RU:8626.49 |bts.airports.airport, Column#462 |538.8 KB |0 Bytes|
|└─Projection_217 |29.97 |1080 |root | |time:50.1s, open:2.06ms, close:47.1ms, loops:3, Concurrency:OFF |bts.airports.airport, bts.airports.city, bts.airports.state, bts.airlines.airline, Column#428, Column#455, Column#458, Column#463, Column#435, Column#436, Column#439, Column#440, Column#464, Column#462, Column#461 |412.2 KB |N/A |
| └─Projection_219 |29.97 |1080 |root | |time:50.1s, open:2.06ms, close:47.1ms, loops:3, Concurrency:OFF |bts.airlines.airline, bts.airports.airport, bts.airports.city, bts.airports.state, Column#428, Column#435, Column#436, Column#439, Column#440, Column#455, Column#458, Column#462, Column#461, case(eq(Column#462, 1), Hub Leader, le(Column#462, 3), Major Player, ge(Column#458, 10), Significant Presence, Minor Player)->Column#463, case(eq(Column#461, 1), Most Efficient, le(Column#461, 3), High Efficiency, Standard Efficiency)->Column#464|364.1 KB |N/A |
| └─Selection_221 |29.97 |1080 |root | |time:50.1s, open:2.06ms, close:47.1ms, loops:3 |ge(Column#458, 5.0) |191.2 KB |N/A |
| └─Shuffle_418 |37.46 |2058 |root | |time:50.1s, open:2.05ms, close:47.1ms, loops:7, ShuffleConcurrency:5 |execution info: concurrency:5, data sources:[Window_234] |N/A |N/A |
| └─Window_224 |37.46 |2058 |root | |total_time:4m10s, total_open:7.42µs, total_close:2.29µs, loops:10 |rank()->Column#462 over(partition by bts.airports.airport order by Column#428 desc) |N/A |N/A |
| └─Sort_417 |37.46 |2058 |root | |total_time:4m10s, total_open:6.96µs, total_close:1.75µs, loops:10 |bts.airports.airport, Column#428:desc |173.6 KB |0 Bytes|
| └─ShuffleReceiver_431 |37.46 |2058 |root | |total_time:4m10s, total_open:125ns, total_close:166ns, loops:10 | |N/A |N/A |
| └─Window_234 |37.46 |2058 |root | |time:50.1s, open:2.03ms, close:47.1ms, loops:4 |rank()->Column#461 over(partition by bts.airports.airport order by Column#435) |N/A |N/A |
| └─Sort_416 |37.46 |2058 |root | |time:50.1s, open:2.03ms, close:47.1ms, loops:4 |bts.airports.airport, Column#435 |715.7 KB |0 Bytes|
| └─Projection_237 |37.46 |2058 |root | |time:50.1s, open:2.02ms, close:47.1ms, loops:4, Concurrency:OFF |bts.airlines.airline, bts.airports.airport, bts.airports.city, bts.airports.state, Column#428, Column#435, Column#436, Column#439, Column#440, Column#455, round(div(mul(cast(Column#428, decimal(20,0) BINARY), 100.0), cast(Column#455, decimal(20,0) BINARY)), 2)->Column#458 |545.4 KB |N/A |
| └─Projection_250 |37.46 |2058 |root | |time:50.1s, open:2.02ms, close:47.1ms, loops:4, Concurrency:OFF |bts.airlines.airline, bts.airports.airport, bts.airports.city, bts.airports.state, Column#428, Column#435, Column#436, Column#439, Column#440, Column#428, Column#455 |593.2 KB |N/A |
| └─HashJoin_265 |37.46 |2058 |root | |time:50.1s, open:2.02ms, close:47.1ms, loops:4, build_hash_table:{total:50s, fetch:50s, build:125µs}, probe:{concurrency:5, total:4m10s, max:50s, probe:545.1µs, fetch and wait:4m10s} |inner join, equal:[eq(bts.airports.airport, bts.airports.airport)] |935.1 KB |0 Bytes|
| ├─Projection_283(Build) |37.46 |2058 |root | |time:50.1s, open:1.06ms, close:47.1ms, loops:4, Concurrency:OFF |bts.airlines.airline, bts.airports.airport, bts.airports.city, bts.airports.state, Column#428, round(Column#429, 2)->Column#435, round(Column#430, 2)->Column#436, round(div(mul(Column#433, 100.0), cast(Column#428, decimal(20,0) BINARY)), 2)->Column#439, round(div(mul(Column#434, 100.0), cast(Column#428, decimal(20,0) BINARY)), 2)->Column#440, Column#428 |341.9 KB |N/A |
| │ └─Selection_285 |37.46 |2058 |root | |time:50.1s, open:1.06ms, close:47.1ms, loops:4 |ge(Column#428, 50) |143.8 KB |N/A |
| │ └─HashAgg_289 |46.83 |2312 |root | |time:50.1s, open:1.06ms, close:47.1ms, loops:7, partial_worker:{wall_time:50.003250709s, concurrency:5, task_num:4834, tot_wait:4m9.201971091s, tot_exec:793.268968ms, tot_time:4m10.015531792s, max:50.003134917s, p95:50.003134917s}, final_worker:{wall_time:50.004043833s, concurrency:5, task_num:25, tot_wait:4.497µs, tot_exec:2.451749ms, tot_time:4m10.018843s, max:50.003925875s, p95:50.003925875s} |group by:Column#503, Column#504, Column#505, Column#506, funcs:count(1)->Column#428, funcs:avg(Column#499)->Column#429, funcs:avg(Column#500)->Column#430, funcs:sum(Column#501)->Column#433, funcs:sum(Column#502)->Column#434, funcs:firstrow(Column#503)->bts.airlines.airline, funcs:firstrow(Column#504)->bts.airports.airport, funcs:firstrow(Column#505)->bts.airports.city, funcs:firstrow(Column#506)->bts.airports.state |9.51 MB |0 Bytes|
| │ └─Projection_430 |46.83 |4947517|root | |time:50s, open:1.04ms, close:47.1ms, loops:4835, Concurrency:OFF |bts.flights.dep_delay->Column#499, bts.flights.taxi_out->Column#500, cast(case(gt(bts.flights.cancelled, 0), 1, 0), decimal(20,0) BINARY)->Column#501, cast(case(gt(bts.flights.dep_delay, 15), 1, 0), decimal(20,0) BINARY)->Column#502, bts.airlines.airline->Column#503, bts.airports.airport->Column#504, bts.airports.city->Column#505, bts.airports.state->Column#506 |6.82 MB |N/A |
| │ └─Projection_293 |46.83 |4947517|root | |time:49.8s, open:1.04ms, close:47.1ms, loops:4835, Concurrency:OFF |bts.flights.dep_delay, bts.flights.taxi_out, bts.flights.cancelled, bts.airlines.airline, bts.airports.airport, bts.airports.city, bts.airports.state |7.10 MB |N/A |
| │ └─HashJoin_309 |46.83 |4947517|root | |time:49.8s, open:1.04ms, close:47.1ms, loops:4835, build_hash_table:{total:5.38s, fetch:1.12s, build:4.26s}, probe:{concurrency:5, total:4m1.6s, max:50s, probe:3m34.5s, fetch and wait:27s} |inner join, equal:[eq(bts.airports.airport, bts.airports.airport)] |1023.6 MB|1.16 GB|
| │ ├─TableReader_351(Build) |46.83 |5022781|root | |time:4.81s, open:1.02ms, close:18.7µs, loops:4941, cop_task: {num: 76, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache_hit_ratio: 0.00}, fetch_resp_duration: 4.8s |MppVersion: 3, data:ExchangeSender_350 |54.4 MB |N/A |
| │ │ └─ExchangeSender_350 |46.83 |5022781|mpp[tiflash]| |tiflash_task:{time:1.39s, loops:77, threads:14}, tiflash_network: {inner_zone_send_bytes: 1081985635} |ExchangeType: PassThrough |N/A |N/A |
| │ │ └─Projection_349 |46.83 |5022781|mpp[tiflash]| |tiflash_task:{time:486.5ms, loops:77, threads:14} |bts.airlines.airline, bts.flights.dep_delay, bts.flights.taxi_out, bts.flights.cancelled, bts.airports.airport, bts.airports.city, bts.airports.state |N/A |N/A |
| │ │ └─HashJoin_334 |46.83 |5022781|mpp[tiflash]| |tiflash_task:{time:486.3ms, loops:77, threads:14} |inner join, equal:[eq(bts.flights.origin, bts.airports.iata_code)] |N/A |N/A |
| │ │ ├─ExchangeReceiver_320(Build) |37.46 |5022781|mpp[tiflash]| |tiflash_task:{time:117ms, loops:80, threads:14}, tiflash_wait: {pipeline_queue_wait: 68ms} | |N/A |N/A |
| │ │ │ └─ExchangeSender_319 |37.46 |5022781|mpp[tiflash]| |tiflash_task:{time:121.3ms, loops:80, threads:14} |ExchangeType: Broadcast, Compression: FAST |N/A |N/A |
| │ │ │ └─Projection_318 |37.46 |5022781|mpp[tiflash]| |tiflash_task:{time:119.9ms, loops:80, threads:14} |bts.airlines.airline, bts.flights.origin, bts.flights.dep_delay, bts.flights.taxi_out, bts.flights.cancelled |N/A |N/A |
| │ │ │ └─HashJoin_311 |37.46 |5022781|mpp[tiflash]| |tiflash_task:{time:119.8ms, loops:80, threads:14} |inner join, equal:[eq(bts.airlines.iata_code, bts.flights.carrier)] |N/A |N/A |
| │ │ │ ├─ExchangeReceiver_315(Build)|29.97 |30 |mpp[tiflash]| |tiflash_task:{time:14.6ms, loops:1, threads:14} | |N/A |N/A |
| │ │ │ │ └─ExchangeSender_314 |29.97 |30 |mpp[tiflash]| |tiflash_task:{time:4.25ms, loops:1, threads:14} |ExchangeType: Broadcast, Compression: FAST |N/A |N/A |
| │ │ │ │ └─Selection_313 |29.97 |30 |mpp[tiflash]| |tiflash_task:{time:4.08ms, loops:1, threads:14} |not(isnull(bts.airlines.iata_code)) |N/A |N/A |
| │ │ │ │ └─TableFullScan_312 |30.00 |30 |mpp[tiflash]|table:a |tiflash_task:{time:4.08ms, loops:1, threads:14}, tiflash_scan:{mvcc_input_rows:0, mvcc_input_bytes:0, mvcc_output_rows:0, local_regions:3, remote_regions:0, tot_learner_read:0ms, region_balance:{instance_num: 1, max/min: 3/3=1.000000}, delta_rows:0, delta_bytes:0, segments:1, stale_read_regions:0, tot_build_snapshot:0ms, tot_build_bitmap:0ms, tot_build_inputstream:1ms, min_local_stream:2ms, max_local_stream:2ms, dtfile:{data_scanned_rows:30, data_skipped_rows:0, mvcc_scanned_rows:0, mvcc_skipped_rows:0, lm_filter_scanned_rows:0, lm_filter_skipped_rows:0, tot_rs_index_check:0ms, tot_read:0ms}} |keep order:false, stats:pseudo |N/A |N/A |
| │ │ │ └─Selection_317(Probe) |37931.63|5022781|mpp[tiflash]| |tiflash_task:{time:113ms, loops:80, threads:14} |not(isnull(bts.flights.carrier)), not(isnull(bts.flights.crs_elapsed_time)), not(isnull(bts.flights.origin)), not(isnull(bts.flights.taxi_out)) |N/A |N/A |
| │ │ │ └─TableFullScan_316 |38083.74|5022781|mpp[tiflash]|table:f |tiflash_task:{time:112.8ms, loops:80, threads:14}, tiflash_wait: {pipeline_queue_wait: 73ms}, tiflash_scan:{mvcc_input_rows:0, mvcc_input_bytes:0, mvcc_output_rows:0, local_regions:46, remote_regions:0, tot_learner_read:12ms, region_balance:{instance_num: 1, max/min: 46/46=1.000000}, delta_rows:0, delta_bytes:0, segments:44, stale_read_regions:0, tot_build_snapshot:0ms, tot_build_bitmap:0ms, tot_build_inputstream:124ms, min_local_stream:29ms, max_local_stream:98ms, dtfile:{data_scanned_rows:5025569, data_skipped_rows:450560, mvcc_scanned_rows:0, mvcc_skipped_rows:0, lm_filter_scanned_rows:5025569, lm_filter_skipped_rows:33058166, tot_rs_index_check:22ms, tot_read:110ms}} |pushed down filter:eq(bts.flights.year, 2020), keep order:false, stats:pseudo |N/A |N/A |
| │ │ └─Selection_322(Probe) |398.20 |399 |mpp[tiflash]| |tiflash_task:{time:122.6ms, loops:1, threads:14} |not(isnull(bts.airports.airport)), not(isnull(bts.airports.iata_code)) |N/A |N/A |
| │ │ └─TableFullScan_321 |399.00 |399 |mpp[tiflash]|table:ap |tiflash_task:{time:122.6ms, loops:1, threads:14}, tiflash_wait: {pipeline_breaker_wait: 109ms}, tiflash_scan:{mvcc_input_rows:0, mvcc_input_bytes:0, mvcc_output_rows:0, local_regions:3, remote_regions:0, tot_learner_read:10ms, region_balance:{instance_num: 1, max/min: 3/3=1.000000}, delta_rows:0, delta_bytes:0, segments:1, stale_read_regions:0, tot_build_snapshot:0ms, tot_build_bitmap:0ms, tot_build_inputstream:0ms, min_local_stream:110ms, max_local_stream:110ms, dtfile:{data_scanned_rows:399, data_skipped_rows:0, mvcc_scanned_rows:0, mvcc_skipped_rows:0, lm_filter_scanned_rows:0, lm_filter_skipped_rows:0, tot_rs_index_check:0ms, tot_read:0ms}} |keep order:false, stats:pseudo |N/A |N/A |
| │ └─HashAgg_401(Probe) |203.88 |244 |root | |time:127.7ms, open:18.1µs, close:3.29µs, loops:6, partial_worker:{wall_time:127.684792ms, concurrency:5, task_num:1, tot_wait:127.597708ms, tot_exec:65.541µs, tot_time:638.360667ms, max:127.676125ms, p95:127.676125ms}, final_worker:{wall_time:127.722375ms, concurrency:5, task_num:5, tot_wait:14.21µs, tot_exec:167ns, tot_time:638.471915ms, max:127.711291ms, p95:127.711291ms} |group by:bts.airports.airport, funcs:firstrow(bts.airports.airport)->bts.airports.airport |126.2 KB |0 Bytes|
| │ └─Selection_405 |203.88 |244 |root | |time:127.6ms, open:3.96µs, close:1.5µs, loops:2 |not(isnull(bts.airports.airport)) |32.9 KB |N/A |
| │ └─CTEFullScan_407 |254.85 |244 |root |CTE:airport_volume_analysis |time:127.5ms, open:583ns, close:750ns, loops:3 |data:CTE_0 |N/A |N/A |
| └─Selection_267(Probe) |203.88 |244 |root | |time:128.7ms, open:954.9µs, close:6.25µs, loops:2 |not(isnull(bts.airports.airport)) |32.9 KB |N/A |
| └─CTEFullScan_269 |254.85 |244 |root |CTE:airport_volume_analysis AS ava|time:128.6ms, open:949.4µs, close:5.92µs, loops:3 |data:CTE_0 |124.5 KB |0 Bytes|
|CTE_0 |254.85 |244 |root | |time:127.5ms, open:583ns, close:750ns, loops:3 |Non-Recursive CTE |N/A |N/A |
|└─TableReader_208(Seed Part) |254.85 |244 |root | |time:128.5ms, open:947.4µs, close:5.21µs, loops:2, cop_task: {num: 15, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache_hit_ratio: 0.00}, fetch_resp_duration: 127.5ms |MppVersion: 3, data:ExchangeSender_207 |4.78 KB |N/A |
| └─ExchangeSender_207 |254.85 |244 |mpp[tiflash]| |tiflash_task:{time:127ms, loops:14, threads:14}, tiflash_network: {inner_zone_send_bytes: 31673} |ExchangeType: PassThrough |N/A |N/A |
| └─Projection_121 |254.85 |244 |mpp[tiflash]| |tiflash_task:{time:126.7ms, loops:14, threads:14} |bts.airports.airport, bts.airports.city, bts.airports.state, Column#42->Column#49, Column#43->Column#50, Column#45->Column#51 |N/A |N/A |
| └─Projection_206 |254.85 |244 |mpp[tiflash]| |tiflash_task:{time:126.7ms, loops:14, threads:14} |bts.airports.airport, bts.airports.city, bts.airports.state, Column#42, Column#43, round(Column#44, 2)->Column#45, Column#42 |N/A |N/A |
| └─Selection_203 |254.85 |244 |mpp[tiflash]| |tiflash_task:{time:126.7ms, loops:14, threads:14} |ge(Column#42, 1000) |N/A |N/A |
| └─Projection_197 |318.56 |379 |mpp[tiflash]| |tiflash_task:{time:126.7ms, loops:14, threads:14} |Column#42, Column#43, div(Column#44, cast(case(eq(Column#492, 0), 1, Column#492), decimal(20,0) BINARY))->Column#44, bts.airports.airport, bts.airports.city, bts.airports.state |N/A |N/A |
| └─HashAgg_198 |318.56 |379 |mpp[tiflash]| |tiflash_task:{time:126.7ms, loops:14, threads:14} |group by:bts.airports.airport, bts.airports.city, bts.airports.state, funcs:sum(Column#493)->Column#42, funcs:count(distinct bts.flights.carrier)->Column#43, funcs:sum(Column#494)->Column#492, funcs:sum(Column#495)->Column#44, funcs:firstrow(bts.airports.airport)->bts.airports.airport, funcs:firstrow(bts.airports.city)->bts.airports.city, funcs:firstrow(bts.airports.state)->bts.airports.state, stream_count: 14 |N/A |N/A |
| └─ExchangeReceiver_200 |318.56 |2487 |mpp[tiflash]| |tiflash_task:{time:126.6ms, loops:14, threads:14} |stream_count: 14 |N/A |N/A |
| └─ExchangeSender_199 |318.56 |2487 |mpp[tiflash]| |tiflash_task:{time:125ms, loops:1, threads:1} |ExchangeType: HashPartition, Compression: FAST, Hash Cols: [name: bts.airports.airport, collate: utf8mb4_bin], [name: bts.airports.city, collate: utf8mb4_bin], [name: bts.airports.state, collate: utf8mb4_bin], stream_count: 14 |N/A |N/A |
| └─HashAgg_195 |318.56 |2487 |mpp[tiflash]| |tiflash_task:{time:124.5ms, loops:1, threads:1} |group by:bts.airports.airport, bts.airports.city, bts.airports.state, bts.flights.carrier, funcs:count(1)->Column#493, funcs:count(bts.flights.dep_delay)->Column#494, funcs:sum(bts.flights.dep_delay)->Column#495 |N/A |N/A |
| └─Projection_184 |497.75 |5022781|mpp[tiflash]| |tiflash_task:{time:122.6ms, loops:80, threads:14} |bts.flights.carrier, bts.flights.dep_delay, bts.airports.airport, bts.airports.city, bts.airports.state |N/A |N/A |
| └─Projection_174 |497.75 |5022781|mpp[tiflash]| |tiflash_task:{time:122.6ms, loops:80, threads:14} |bts.airports.airport, bts.airports.city, bts.airports.state, bts.flights.carrier, bts.flights.dep_delay |N/A |N/A |
| └─HashJoin_173 |497.75 |5022781|mpp[tiflash]| |tiflash_task:{time:122.6ms, loops:80, threads:14} |inner join, equal:[eq(bts.airports.iata_code, bts.flights.origin)] |N/A |N/A |
| ├─ExchangeReceiver_153(Build) |398.20 |399 |mpp[tiflash]| |tiflash_task:{time:15.5ms, loops:1, threads:14} | |N/A |N/A |
| │ └─ExchangeSender_152 |398.20 |399 |mpp[tiflash]| |tiflash_task:{time:4.84ms, loops:1, threads:14} |ExchangeType: Broadcast, Compression: FAST |N/A |N/A |
| │ └─Selection_151 |398.20 |399 |mpp[tiflash]| |tiflash_task:{time:4.62ms, loops:1, threads:14} |not(isnull(bts.airports.airport)), not(isnull(bts.airports.iata_code)) |N/A |N/A |
| │ └─TableFullScan_150 |399.00 |399 |mpp[tiflash]|table:ap |tiflash_task:{time:4.62ms, loops:1, threads:14}, tiflash_scan:{mvcc_input_rows:0, mvcc_input_bytes:0, mvcc_output_rows:0, local_regions:3, remote_regions:0, tot_learner_read:0ms, region_balance:{instance_num: 1, max/min: 3/3=1.000000}, delta_rows:0, delta_bytes:0, segments:1, stale_read_regions:0, tot_build_snapshot:0ms, tot_build_bitmap:0ms, tot_build_inputstream:0ms, min_local_stream:1ms, max_local_stream:1ms, dtfile:{data_scanned_rows:399, data_skipped_rows:0, mvcc_scanned_rows:0, mvcc_skipped_rows:0, lm_filter_scanned_rows:0, lm_filter_skipped_rows:0, tot_rs_index_check:0ms, tot_read:0ms}} |keep order:false, stats:pseudo |N/A |N/A |
| └─Selection_155(Probe) |38045.65|5022781|mpp[tiflash]| |tiflash_task:{time:122.4ms, loops:80, threads:14} |not(isnull(bts.flights.origin)) |N/A |N/A |
| └─TableFullScan_154 |38083.74|5022781|mpp[tiflash]|table:f |tiflash_task:{time:122.4ms, loops:80, threads:14}, tiflash_wait: {pipeline_queue_wait: 85ms}, tiflash_scan:{mvcc_input_rows:0, mvcc_input_bytes:0, mvcc_output_rows:0, local_regions:46, remote_regions:0, tot_learner_read:12ms, region_balance:{instance_num: 1, max/min: 46/46=1.000000}, delta_rows:0, delta_bytes:0, segments:44, stale_read_regions:0, tot_build_snapshot:0ms, tot_build_bitmap:0ms, tot_build_inputstream:103ms, min_local_stream:12ms, max_local_stream:107ms, dtfile:{data_scanned_rows:5025569, data_skipped_rows:450560, mvcc_scanned_rows:0, mvcc_skipped_rows:0, lm_filter_scanned_rows:5025569, lm_filter_skipped_rows:33058166, tot_rs_index_check:16ms, tot_read:174ms}}|pushed down filter:eq(bts.flights.year, 2020), keep order:false, stats:pseudo |N/A |N/A |
+---------------------------------------------------------------------------------+--------+-------+------------+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-------+
Here is the execution plan for collecting the NDV and null count of all columns:
+-----------------------------------------------------------------------------+--------+-------+------------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+-------+
|id |estRows |actRows|task |access object |execution info |operator info |memory |disk |
+-----------------------------------------------------------------------------+--------+-------+------------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+-------+
|Sort_215 |203.88 |1080 |root | |time:656.4ms, open:1.74ms, close:49.8µs, loops:3, RU:8629.49 |bts.airports.airport, Column#462 |448.5 KB|0 Bytes|
|└─Projection_217 |203.88 |1080 |root | |time:655.8ms, open:1.74ms, close:49.2µs, loops:3, Concurrency:OFF |bts.airports.airport, bts.airports.city, bts.airports.state, bts.airlines.airline, Column#428, Column#455, Column#458, Column#463, Column#435, Column#436, Column#439, Column#440, Column#464, Column#462, Column#461 |419.4 KB|N/A |
| └─Projection_219 |203.88 |1080 |root | |time:655.8ms, open:1.74ms, close:48.3µs, loops:3, Concurrency:OFF |bts.airlines.airline, bts.airports.airport, bts.airports.city, bts.airports.state, Column#428, Column#435, Column#436, Column#439, Column#440, Column#455, Column#458, Column#462, Column#461, case(eq(Column#462, 1), Hub Leader, le(Column#462, 3), Major Player, ge(Column#458, 10), Significant Presence, Minor Player)->Column#463, case(eq(Column#461, 1), Most Efficient, le(Column#461, 3), High Efficiency, Standard Efficiency)->Column#464|369.2 KB|N/A |
| └─Selection_221 |203.88 |1080 |root | |time:655.7ms, open:1.73ms, close:48µs, loops:3 |ge(Column#458, 5.0) |180.8 KB|N/A |
| └─Shuffle_418 |254.85 |2058 |root | |time:655.4ms, open:1.73ms, close:47.8µs, loops:7, ShuffleConcurrency:5 |execution info: concurrency:5, data sources:[Window_236] |N/A |N/A |
| └─Window_224 |254.85 |2058 |root | |total_time:3.27s, total_open:11.2µs, total_close:2.83µs, loops:10 |rank()->Column#462 over(partition by bts.airports.airport order by Column#428 desc) |N/A |N/A |
| └─Sort_417 |254.85 |2058 |root | |total_time:3.27s, total_open:10.8µs, total_close:2.33µs, loops:10 |bts.airports.airport, Column#428:desc |173.6 KB|0 Bytes|
| └─ShuffleReceiver_431 |254.85 |2058 |root | |total_time:3.27s, total_open:250ns, total_close:167ns, loops:10 | |N/A |N/A |
| └─Window_236 |254.85 |2058 |root | |time:654.7ms, open:1.7ms, close:43.5µs, loops:4 |rank()->Column#461 over(partition by bts.airports.airport order by Column#435) |N/A |N/A |
| └─Sort_416 |254.85 |2058 |root | |time:654.5ms, open:1.7ms, close:43.4µs, loops:4 |bts.airports.airport, Column#435 |715.7 KB|0 Bytes|
| └─Projection_239 |254.85 |2058 |root | |time:653.4ms, open:1.69ms, close:43µs, loops:4, Concurrency:OFF |bts.airlines.airline, bts.airports.airport, bts.airports.city, bts.airports.state, Column#428, Column#435, Column#436, Column#439, Column#440, Column#455, round(div(mul(cast(Column#428, decimal(20,0) BINARY), 100.0), cast(Column#455, decimal(20,0) BINARY)), 2)->Column#458 |602.6 KB|N/A |
| └─Projection_252 |254.85 |2058 |root | |time:653.1ms, open:1.69ms, close:42µs, loops:4, Concurrency:OFF |bts.airlines.airline, bts.airports.airport, bts.airports.city, bts.airports.state, Column#428, Column#435, Column#436, Column#439, Column#440, Column#428, Column#455 |593.4 KB|N/A |
| └─HashJoin_268 |254.85 |2058 |root | |time:653.1ms, open:1.69ms, close:41.7µs, loops:4, build_hash_table:{total:141.5ms, fetch:141.5ms, build:32.4µs}, probe:{concurrency:5, total:3.26s, max:651.5ms, probe:935.5µs, fetch and wait:3.25s} |inner join, equal:[eq(bts.airports.airport, bts.airports.airport)] |115.3 KB|0 Bytes|
| ├─Selection_269(Build) |254.85 |244 |root | |time:142.3ms, open:790.8µs, close:7.88µs, loops:2 |not(isnull(bts.airports.airport)) |32.9 KB |N/A |
| │ └─CTEFullScan_271 |318.56 |244 |root |CTE:airport_volume_analysis AS ava|time:142.2ms, open:784µs, close:7.38µs, loops:3 |data:CTE_0 |72.8 KB |0 Bytes|
| └─Projection_285(Probe) |318.56 |2058 |root | |time:651.9ms, open:896µs, close:28.6µs, loops:4, Concurrency:OFF |bts.airlines.airline, bts.airports.airport, bts.airports.city, bts.airports.state, Column#428, round(Column#429, 2)->Column#435, round(Column#430, 2)->Column#436, round(div(mul(Column#433, 100.0), cast(Column#428, decimal(20,0) BINARY)), 2)->Column#439, round(div(mul(Column#434, 100.0), cast(Column#428, decimal(20,0) BINARY)), 2)->Column#440, Column#428 |409.8 KB|N/A |
| └─Selection_287 |318.56 |2058 |root | |time:651.5ms, open:892.9µs, close:28.2µs, loops:4 |ge(Column#428, 50) |143.8 KB|N/A |
| └─HashAgg_291 |398.20 |2312 |root | |time:651ms, open:890.8µs, close:28µs, loops:7, partial_worker:{wall_time:649.270209ms, concurrency:5, task_num:4834, tot_wait:2.20034562s, tot_exec:1.04107559s, tot_time:3.246275708s, max:649.261542ms, p95:649.261542ms}, final_worker:{wall_time:650.143792ms, concurrency:5, task_num:25, tot_wait:4.252µs, tot_exec:1.850711ms, tot_time:3.250411958s, max:650.136ms, p95:650.136ms} |group by:Column#503, Column#504, Column#505, Column#506, funcs:count(1)->Column#428, funcs:avg(Column#499)->Column#429, funcs:avg(Column#500)->Column#430, funcs:sum(Column#501)->Column#433, funcs:sum(Column#502)->Column#434, funcs:firstrow(Column#503)->bts.airlines.airline, funcs:firstrow(Column#504)->bts.airports.airport, funcs:firstrow(Column#505)->bts.airports.city, funcs:firstrow(Column#506)->bts.airports.state |10.8 MB |0 Bytes|
| └─Projection_430 |24276.24|4947517|root | |time:612.9ms, open:871.1µs, close:26.9µs, loops:4835, Concurrency:5 |bts.flights.dep_delay->Column#499, bts.flights.taxi_out->Column#500, cast(case(gt(bts.flights.cancelled, 0), 1, 0), decimal(20,0) BINARY)->Column#501, cast(case(gt(bts.flights.dep_delay, 15), 1, 0), decimal(20,0) BINARY)->Column#502, bts.airlines.airline->Column#503, bts.airports.airport->Column#504, bts.airports.city->Column#505, bts.airports.state->Column#506 |2.90 MB |N/A |
| └─Projection_295 |24276.24|4947517|root | |time:599.2ms, open:870.8µs, close:20.1µs, loops:4835, Concurrency:5 |bts.flights.dep_delay, bts.flights.taxi_out, bts.flights.cancelled, bts.airlines.airline, bts.airports.airport, bts.airports.city, bts.airports.state |2.70 MB |N/A |
| └─HashJoin_310 |24276.24|4947517|root | |time:624.3ms, open:870.3µs, close:7.83µs, loops:4835, build_hash_table:{total:141.5ms, fetch:141.5ms, build:31.8µs}, probe:{concurrency:5, total:3.24s, max:648.9ms, probe:2.16s, fetch and wait:1.09s} |inner join, equal:[eq(bts.airports.airport, bts.airports.airport)] |19.8 KB |0 Bytes|
| ├─HashAgg_401(Build) |254.85 |244 |root | |time:141.5ms, open:17.8µs, close:1.92µs, loops:6, partial_worker:{wall_time:141.459333ms, concurrency:5, task_num:1, tot_wait:141.375125ms, tot_exec:64.208µs, tot_time:707.235875ms, max:141.448584ms, p95:141.448584ms}, final_worker:{wall_time:141.496667ms, concurrency:5, task_num:5, tot_wait:50.625µs, tot_exec:42ns, tot_time:707.381998ms, max:141.491333ms, p95:141.491333ms} |group by:bts.airports.airport, funcs:firstrow(bts.airports.airport)->bts.airports.airport |111.8 KB|0 Bytes|
| │ └─Selection_405 |254.85 |244 |root | |time:141.4ms, open:4.21µs, close:625ns, loops:2 |not(isnull(bts.airports.airport)) |32.9 KB |N/A |
| │ └─CTEFullScan_407 |318.56 |244 |root |CTE:airport_volume_analysis |time:141.3ms, open:334ns, close:250ns, loops:3 |data:CTE_0 |N/A |N/A |
| └─TableReader_351(Probe) |37931.63|5022781|root | |time:139.6ms, open:850.8µs, close:2.71µs, loops:4911, cop_task: {num: 81, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache_hit_ratio: 0.00}, fetch_resp_duration: 108.6ms |MppVersion: 3, data:ExchangeSender_350 |13.7 MB |N/A |
| └─ExchangeSender_350 |37931.63|5022781|mpp[tiflash]| |tiflash_task:{time:540.1ms, loops:80, threads:14}, tiflash_network: {inner_zone_send_bytes: 1081982802} |ExchangeType: PassThrough |N/A |N/A |
| └─Projection_349 |37931.63|5022781|mpp[tiflash]| |tiflash_task:{time:172.9ms, loops:80, threads:14} |bts.airlines.airline, bts.flights.dep_delay, bts.flights.taxi_out, bts.flights.cancelled, bts.airports.airport, bts.airports.city, bts.airports.state |N/A |N/A |
| └─HashJoin_334 |37931.63|5022781|mpp[tiflash]| |tiflash_task:{time:172.9ms, loops:80, threads:14} |inner join, equal:[eq(bts.flights.origin, bts.airports.iata_code)] |N/A |N/A |
| ├─ExchangeReceiver_324(Build) |398.20 |399 |mpp[tiflash]| |tiflash_task:{time:18.8ms, loops:1, threads:14} | |N/A |N/A |
| │ └─ExchangeSender_323 |398.20 |399 |mpp[tiflash]| |tiflash_task:{time:17.2ms, loops:1, threads:14} |ExchangeType: Broadcast, Compression: FAST |N/A |N/A |
| │ └─Selection_322 |398.20 |399 |mpp[tiflash]| |tiflash_task:{time:16.9ms, loops:1, threads:14} |not(isnull(bts.airports.airport)), not(isnull(bts.airports.iata_code)) |N/A |N/A |
| │ └─TableFullScan_321 |399.00 |399 |mpp[tiflash]|table:ap |tiflash_task:{time:16.9ms, loops:1, threads:14}, tiflash_scan:{mvcc_input_rows:0, mvcc_input_bytes:0, mvcc_output_rows:0, local_regions:3, remote_regions:0, tot_learner_read:13ms, region_balance:{instance_num: 1, max/min: 3/3=1.000000}, delta_rows:0, delta_bytes:0, segments:1, stale_read_regions:0, tot_build_snapshot:0ms, tot_build_bitmap:0ms, tot_build_inputstream:0ms, min_local_stream:2ms, max_local_stream:3ms, dtfile:{data_scanned_rows:399, data_skipped_rows:0, mvcc_scanned_rows:0, mvcc_skipped_rows:0, lm_filter_scanned_rows:0, lm_filter_skipped_rows:0, tot_rs_index_check:0ms, tot_read:0ms}} |keep order:false |N/A |N/A |
| └─Projection_320(Probe) |37931.63|5022781|mpp[tiflash]| |tiflash_task:{time:127.2ms, loops:80, threads:14} |bts.airlines.airline, bts.flights.origin, bts.flights.dep_delay, bts.flights.taxi_out, bts.flights.cancelled |N/A |N/A |
| └─HashJoin_313 |37931.63|5022781|mpp[tiflash]| |tiflash_task:{time:127.1ms, loops:80, threads:14} |inner join, equal:[eq(bts.airlines.iata_code, bts.flights.carrier)] |N/A |N/A |
| ├─ExchangeReceiver_317(Build)|29.97 |30 |mpp[tiflash]| |tiflash_task:{time:18.1ms, loops:1, threads:14} | |N/A |N/A |
| │ └─ExchangeSender_316 |29.97 |30 |mpp[tiflash]| |tiflash_task:{time:16.8ms, loops:1, threads:14} |ExchangeType: Broadcast, Compression: FAST |N/A |N/A |
| │ └─Selection_315 |29.97 |30 |mpp[tiflash]| |tiflash_task:{time:16.5ms, loops:1, threads:14} |not(isnull(bts.airlines.iata_code)) |N/A |N/A |
| │ └─TableFullScan_314 |30.00 |30 |mpp[tiflash]|table:a |tiflash_task:{time:16.5ms, loops:1, threads:14}, tiflash_scan:{mvcc_input_rows:0, mvcc_input_bytes:0, mvcc_output_rows:0, local_regions:3, remote_regions:0, tot_learner_read:13ms, region_balance:{instance_num: 1, max/min: 3/3=1.000000}, delta_rows:0, delta_bytes:0, segments:1, stale_read_regions:0, tot_build_snapshot:0ms, tot_build_bitmap:0ms, tot_build_inputstream:0ms, min_local_stream:2ms, max_local_stream:2ms, dtfile:{data_scanned_rows:30, data_skipped_rows:0, mvcc_scanned_rows:0, mvcc_skipped_rows:0, lm_filter_scanned_rows:0, lm_filter_skipped_rows:0, tot_rs_index_check:0ms, tot_read:0ms}} |keep order:false |N/A |N/A |
| └─Selection_319(Probe) |37931.63|5022781|mpp[tiflash]| |tiflash_task:{time:120.1ms, loops:80, threads:14} |not(isnull(bts.flights.carrier)), not(isnull(bts.flights.crs_elapsed_time)), not(isnull(bts.flights.origin)), not(isnull(bts.flights.taxi_out)) |N/A |N/A |
| └─TableFullScan_318 |38083.74|5022781|mpp[tiflash]|table:f |tiflash_task:{time:118.9ms, loops:80, threads:14}, tiflash_wait: {pipeline_breaker_wait: 4ms, pipeline_queue_wait: 38ms}, tiflash_scan:{mvcc_input_rows:0, mvcc_input_bytes:0, mvcc_output_rows:0, local_regions:46, remote_regions:0, tot_learner_read:13ms, region_balance:{instance_num: 1, max/min: 46/46=1.000000}, delta_rows:0, delta_bytes:0, segments:44, stale_read_regions:0, tot_build_snapshot:0ms, tot_build_bitmap:0ms, tot_build_inputstream:221ms, min_local_stream:57ms, max_local_stream:104ms, dtfile:{data_scanned_rows:5025569, data_skipped_rows:450560, mvcc_scanned_rows:0, mvcc_skipped_rows:0, lm_filter_scanned_rows:5025569, lm_filter_skipped_rows:33058166, tot_rs_index_check:57ms, tot_read:171ms}}|pushed down filter:eq(bts.flights.year, 2020), keep order:false |N/A |N/A |
|CTE_0 |318.56 |244 |root | |time:142.2ms, open:784µs, close:7.38µs, loops:3 |Non-Recursive CTE |72.8 KB |0 Bytes|
|└─TableReader_208(Seed Part) |318.56 |244 |root | |time:142.2ms, open:780µs, close:6.42µs, loops:2, cop_task: {num: 15, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache_hit_ratio: 0.00}, fetch_resp_duration: 141.3ms |MppVersion: 3, data:ExchangeSender_207 |4.78 KB |N/A |
| └─ExchangeSender_207 |318.56 |244 |mpp[tiflash]| |tiflash_task:{time:138.7ms, loops:14, threads:14}, tiflash_network: {inner_zone_send_bytes: 31673} |ExchangeType: PassThrough |N/A |N/A |
| └─Projection_121 |318.56 |244 |mpp[tiflash]| |tiflash_task:{time:138.6ms, loops:14, threads:14} |bts.airports.airport, bts.airports.city, bts.airports.state, Column#42->Column#49, Column#43->Column#50, Column#45->Column#51 |N/A |N/A |
| └─Projection_206 |318.56 |244 |mpp[tiflash]| |tiflash_task:{time:138.6ms, loops:14, threads:14} |bts.airports.airport, bts.airports.city, bts.airports.state, Column#42, Column#43, round(Column#44, 2)->Column#45, Column#42 |N/A |N/A |
| └─Selection_203 |318.56 |244 |mpp[tiflash]| |tiflash_task:{time:138.6ms, loops:14, threads:14} |ge(Column#42, 1000) |N/A |N/A |
| └─Projection_197 |398.20 |379 |mpp[tiflash]| |tiflash_task:{time:138.6ms, loops:14, threads:14} |Column#42, Column#43, div(Column#44, cast(case(eq(Column#492, 0), 1, Column#492), decimal(20,0) BINARY))->Column#44, bts.airports.airport, bts.airports.city, bts.airports.state |N/A |N/A |
| └─HashAgg_198 |398.20 |379 |mpp[tiflash]| |tiflash_task:{time:138.6ms, loops:14, threads:14} |group by:bts.airports.airport, bts.airports.city, bts.airports.state, funcs:sum(Column#493)->Column#42, funcs:count(distinct bts.flights.carrier)->Column#43, funcs:sum(Column#494)->Column#492, funcs:sum(Column#495)->Column#44, funcs:firstrow(bts.airports.airport)->bts.airports.airport, funcs:firstrow(bts.airports.city)->bts.airports.city, funcs:firstrow(bts.airports.state)->bts.airports.state, stream_count: 14 |N/A |N/A |
| └─ExchangeReceiver_200 |398.20 |2487 |mpp[tiflash]| |tiflash_task:{time:138.5ms, loops:14, threads:14} |stream_count: 14 |N/A |N/A |
| └─ExchangeSender_199 |398.20 |2487 |mpp[tiflash]| |tiflash_task:{time:136.3ms, loops:1, threads:1} |ExchangeType: HashPartition, Compression: FAST, Hash Cols: [name: bts.airports.airport, collate: utf8mb4_bin], [name: bts.airports.city, collate: utf8mb4_bin], [name: bts.airports.state, collate: utf8mb4_bin], stream_count: 14 |N/A |N/A |
| └─HashAgg_195 |398.20 |2487 |mpp[tiflash]| |tiflash_task:{time:135.7ms, loops:1, threads:1} |group by:bts.airports.airport, bts.airports.city, bts.airports.state, bts.flights.carrier, funcs:count(1)->Column#493, funcs:count(bts.flights.dep_delay)->Column#494, funcs:sum(bts.flights.dep_delay)->Column#495 |N/A |N/A |
| └─Projection_184 |38045.65|5022781|mpp[tiflash]| |tiflash_task:{time:122.6ms, loops:80, threads:14} |bts.flights.carrier, bts.flights.dep_delay, bts.airports.airport, bts.airports.city, bts.airports.state |N/A |N/A |
| └─Projection_174 |38045.65|5022781|mpp[tiflash]| |tiflash_task:{time:122.6ms, loops:80, threads:14} |bts.airports.airport, bts.airports.city, bts.airports.state, bts.flights.carrier, bts.flights.dep_delay |N/A |N/A |
| └─HashJoin_173 |38045.65|5022781|mpp[tiflash]| |tiflash_task:{time:122.6ms, loops:80, threads:14} |inner join, equal:[eq(bts.airports.iata_code, bts.flights.origin)] |N/A |N/A |
| ├─ExchangeReceiver_153(Build) |398.20 |399 |mpp[tiflash]| |tiflash_task:{time:16ms, loops:1, threads:14} | |N/A |N/A |
| │ └─ExchangeSender_152 |398.20 |399 |mpp[tiflash]| |tiflash_task:{time:5.29ms, loops:1, threads:14} |ExchangeType: Broadcast, Compression: FAST |N/A |N/A |
| │ └─Selection_151 |398.20 |399 |mpp[tiflash]| |tiflash_task:{time:5.13ms, loops:1, threads:14} |not(isnull(bts.airports.airport)), not(isnull(bts.airports.iata_code)) |N/A |N/A |
| │ └─TableFullScan_150 |399.00 |399 |mpp[tiflash]|table:ap |tiflash_task:{time:5.13ms, loops:1, threads:14}, tiflash_scan:{mvcc_input_rows:0, mvcc_input_bytes:0, mvcc_output_rows:0, local_regions:3, remote_regions:0, tot_learner_read:1ms, region_balance:{instance_num: 1, max/min: 3/3=1.000000}, delta_rows:0, delta_bytes:0, segments:1, stale_read_regions:0, tot_build_snapshot:0ms, tot_build_bitmap:0ms, tot_build_inputstream:0ms, min_local_stream:2ms, max_local_stream:2ms, dtfile:{data_scanned_rows:399, data_skipped_rows:0, mvcc_scanned_rows:0, mvcc_skipped_rows:0, lm_filter_scanned_rows:0, lm_filter_skipped_rows:0, tot_rs_index_check:0ms, tot_read:0ms}} |keep order:false |N/A |N/A |
| └─Selection_155(Probe) |38045.65|5022781|mpp[tiflash]| |tiflash_task:{time:91.9ms, loops:80, threads:14} |not(isnull(bts.flights.origin)) |N/A |N/A |
| └─TableFullScan_154 |38083.74|5022781|mpp[tiflash]|table:f |tiflash_task:{time:90.6ms, loops:80, threads:14}, tiflash_wait: {pipeline_queue_wait: 24ms}, tiflash_scan:{mvcc_input_rows:0, mvcc_input_bytes:0, mvcc_output_rows:0, local_regions:46, remote_regions:0, tot_learner_read:13ms, region_balance:{instance_num: 1, max/min: 46/46=1.000000}, delta_rows:0, delta_bytes:0, segments:44, stale_read_regions:0, tot_build_snapshot:0ms, tot_build_bitmap:0ms, tot_build_inputstream:269ms, min_local_stream:27ms, max_local_stream:75ms, dtfile:{data_scanned_rows:5025569, data_skipped_rows:450560, mvcc_scanned_rows:0, mvcc_skipped_rows:0, lm_filter_scanned_rows:5025569, lm_filter_skipped_rows:33058166, tot_rs_index_check:22ms, tot_read:177ms}} |pushed down filter:eq(bts.flights.year, 2020), keep order:false |N/A |N/A |
+-----------------------------------------------------------------------------+--------+-------+------------+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------+-------+
It can be seen that the biggest difference between these two plans is the hash join order of CTE airport_volume_analysis and other tables. We should use CTE airport_volume_analysis as the build side, but in the absence of statistics, we used the join result of other tables as the build side. This results in the 5,022,781 data scanned by the entire tilfash being read and used to build the hash build side. It not only consumes memory but also takes time. Just building the hash table took 5 seconds, and there was also spilling to the disk. The worker with the longest time consumption in the probe phase took 50 seconds. According to the business, we also know that a large amount of data will be matched here, so the entire hash join is executed too slowly. There are several inaccurate estimations in the entire plan. First, it's the filtering condition f.year = 2020, but this isn't the key factor affecting the entire query because you can see that a better query plan uses the same estimation. The more crucial point is JOIN airlines a ON f.carrier = a.iata_code. In the bad plan, it estimates the number of rows to be 30, but in fact, there are 5,022,781 rows. If you look at the code, you'll find that we used the following formula when estimating the join row count:
// EstimateFullJoinRowCount estimates the row count of a full join.
func EstimateFullJoinRowCount(sctx planctx.PlanContext,
isCartesian bool,
leftProfile, rightProfile *property.StatsInfo,
leftJoinKeys, rightJoinKeys []*expression.Column,
leftSchema, rightSchema *expression.Schema,
leftNAJoinKeys, rightNAJoinKeys []*expression.Column) float64 {
if isCartesian {
return leftProfile.RowCount * rightProfile.RowCount
}
var leftKeyNDV, rightKeyNDV float64
var leftColCnt, rightColCnt int
if len(leftJoinKeys) > 0 || len(rightJoinKeys) > 0 {
leftKeyNDV, leftColCnt = EstimateColsNDVWithMatchedLen(sctx, leftJoinKeys, leftSchema, leftProfile)
rightKeyNDV, rightColCnt = EstimateColsNDVWithMatchedLen(sctx, rightJoinKeys, rightSchema, rightProfile)
} else {
leftKeyNDV, leftColCnt = EstimateColsNDVWithMatchedLen(sctx, leftNAJoinKeys, leftSchema, leftProfile)
rightKeyNDV, rightColCnt = EstimateColsNDVWithMatchedLen(sctx, rightNAJoinKeys, rightSchema, rightProfile)
}
count := leftProfile.RowCount * rightProfile.RowCount / max(leftKeyNDV, rightKeyNDV)
if sctx.GetSessionVars().TiDBOptJoinReorderThreshold <= 0 {
return count
}
When there is no statistics, the calculated values are as follows:
leftProfile.RowCount float64 = 29.97
rightProfile.RowCount float64 = 37931.628410113146
leftKeyNDV float64 = 23.976
rightKeyNDV float64 = 30345.302728090515
count float64 = 37.4625
When there is a basic NDV, the calculated values are as follows:
leftProfile.RowCount float64 = 29.97
rightProfile.RowCount float64 = 37931.628410113146
leftKeyNDV float64 = 29.969999999999995
rightKeyNDV float64 = 1
count float64 = 37931.628410113146
The core of the problem is the NDV estimation for the bts.flights.carrier column. When there is no statistics, we use the method of RealtimeCount × 0.8 to estimate it as 30345.302728090515. However, when there is statistics, we can obtain the real NDV and thus get an estimation that the NDV is 1. For the specific reason why it is 1, you can refer to the ScaleNDV and EstimateColsNDVWithMatchedLen functions, and the specific details will not be elaborated here. But the core reason is that the lack of NDV statistics leads us to make a relatively conservative estimate, which in turn causes an incorrect selection of the join order. Eventually, it makes the entire query slower. You may be curious about how it would perform if we collect all the statistics, including all the topn and buckets. It would perform better because it can correct the error of the filtering condition f.year = 2020. We can correctly estimate that there are 5053460 data records that match. Since this value will fall within the topn, we can make a relatively accurate estimate.
Conclusion
- The worst query in the entire test was the 14th query. Other queries did not perform that poorly when only the statistics of predicate columns were available.
- Collecting only the NDV and null count of all columns can solve the problem of sub-optimal query plans caused by incomplete statistics.
- Collecting all statistics still has benefits for equal conditions.
- The time consumed for collecting all statistics of all columns is similar to that for collecting only the NDV and null count of all columns. (⚠️: There are very few columns in this test set. In theory, this conclusion will change in scenarios with more columns because building and storing topN and buckets will consume a large amount of resources on TiDB.)
TiDB [email protected]:bts> analyze table airlines, airports, flights all columns with 100 topn, 256 BUCKETS;
Query OK, 0 rows affected
Time: 16.394s
TiDB [email protected]:bts> analyze table airlines, airports, flights all columns with 0 topn, 0 BUCKETS;
Query OK, 0 rows affected
Time: 16.144s