tidb icon indicating copy to clipboard operation
tidb copied to clipboard

planner: SPM & AI Plan Evolution

Open qw4990 opened this issue 9 months ago • 3 comments

Enhancement

Feature Overview

  1. A new sys-table to help identify regressed queries;
  2. A new SQL statement EXPLAIN EXPLORE <SQL> to show all historical plans of a specified query;

Below is an example of using these 2 new capabilities to fix regressions, which has 3 steps:

  1. identify regressed queries through the new sys-table;
  2. show all historical plans of this regressed query;
  3. create a binding to fix the regression according to our recommendation.
mysql> select * from regressed_query\G
*************************** 1. row ***************************
                   digest: 3f87b465219c7d0a8e13564b927d0a3f652c198e7d0a652f9c138e4b7d652c3f
                    query: SELECT * FROM t WHERE b in (...) AND c = ?
     current_plan_latency: 3131.2727
  prior_good_plan_latency: 791.8161
             current_plan: TableReader_7      root     0.03   data:Selection_6                       0      time:1.15ms, open:16.7µs, close:1.79µs, loops:1, cop_task: {num: 1, max: 1.11ms, proc_keys: 0, tot_proc: 56.7µs, tot_wait: 31.8µs, copr_cache_hit_ratio: 0.00, build_task_duration: 3.54µs, max_distsql_concurrency: 1}, rpc_info:{Cop:{num_rpc:1, total_time:1.1ms}}234 BytesN/A
                           └─Selection_6      cop[tikv]0.03   eq(test.t.c, 2), in(test.t.b, 1, 2, 3) 0      tikv_task:{time:0s, loops:1}, scan_detail: {total_keys: 1, get_snapshot_time: 19.8µs, rocksdb: {block: {cache_hit_count: 1}}}, time_detail: {total_process_time: 56.7µs, total_wait_time: 31.8µs, tikv_wall_time: 938.3µs}                                            N/A      N/A
                             └─TableFullScan_5cop[tikv]10000  table:t, keep order:false, stats:pseudo0      tikv_task:{time:0s, loops:1}
          prior_good_plan: IndexLookUp_12           root     0.03                                                                   0      time:460µs, open:6.96µs, close:1.63µs, loops:1                                                                                                                                                                                                                                                                                                                                                                                                                                              243 BytesN/A
                           ├─IndexRangeScan_9(Build)cop[tikv]10     table:t, index:c(c), range:[2,2], keep order:false, stats:pseudo0      time:420µs, open:0s, close:0s, loops:1, cop_task: {num: 1, max: 392µs, proc_keys: 0, tot_proc: 46µs, tot_wait: 102.8µs, copr_cache_hit_ratio: 0.00, build_task_duration: 5.92µs, max_distsql_concurrency: 1}, rpc_info:{Cop:{num_rpc:1, total_time:382.5µs}}, tikv_task:{time:0s, loops:1}, scan_detail: {total_keys: 1, get_snapshot_time: 82µs, rocksdb: {block: {cache_hit_count: 1}}}, time_detail: {total_process_time: 46µs, total_wait_time: 102.8µs, tikv_wall_time: 230.5µs}N/A      N/A
                           └─Selection_11(Probe)    cop[tikv]0.03   in(test.t.b, 1, 2, 3)                                           0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     N/A      N/A
                             └─TableRowIDScan_10    cop[tikv]10     table:t, keep order:false, stats:pseudo                         0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     N/A      N/A
prior_good_plan_last_seen: 2025-02-26 15:51:23


mysql> EXPLAIN EXPLORE '25e8feaa9169eee1dc9cf763aecacc618ead27deebee8ad6c41046e5b4b3b998'\G
*************************** 1. row ***************************
...
*************************** 2. row ***************************
...
*************************** 3. row ***************************
                 statement: select * from `test` . `t` where `a` = ? and `b` = ? and `c` = ?
              binding_hint: use_index(@`sel_1` `test`.`t` `primary`)
                      plan: id                   task        estRows        operator info                           actRows        execution info                                                                                                                                                                                                                                                                                                                                                        memory           disk
                            Selection_6          root        0.00           eq(test.t.b, 1), eq(test.t.c, 1)        1              time:574.6µs, open:7.25µs, close:4.71µs, loops:2                                                                                                                                                                                                                                                                                                                   516 Bytes        N/A
                            └─Point_Get_5        root        1              table:t, index:a(a)                     1              time:550.1µs, open:625ns, close:4.04µs, loops:3, Get:{num_rpc:2, total_time:509.2µs}, time_detail: {total_process_time: 69.2µs, total_wait_time: 84.8µs, total_kv_read_wall_time: 158.3µs, tikv_wall_time: 189.8µs}, scan_detail: {total_process_keys: 2, total_process_keys_size: 93, total_keys: 2, get_snapshot_time: 34.3µs, rocksdb: {block: {}}}        N/A              N/A
               plan_digest: 23adc8e6f6251ecfa48b9261f7805173af733a1a17ff92e2e05f2df57d9e6651
               avg_latency: 955407.2222222222
                exec_times: 9
             avg_scan_rows: 0
         avg_returned_rows: 1
  latency_per_returned_row: 955407.2222222222
scan_rows_per_returned_row: 0
                 recommend: YES
                    reason: Simple PointGet/BatchPointGet is the optimal plan.

mysql> set binding enabled for plan digest '23adc8e6f6251ecfa48b9261f7805173af733a1a17ff92e2e05f2df57d9e6651';
Query OK, 1 rows affected(0.01 sec)

Development Tasks

  • [ ] record historical plans as disabled bindings
    • [x] new unique index on mysql.bind_info.plan_digest to avoid duplicated data
    • [ ] new process in each TiDB node to record historical plans (?)
  • [ ] EXPLAIN EXPLORE to show optimal plans for a certain SQL
    • [x] new syntax EXPLAIN EXPLORE [<SQL> | <SQL_digest>] https://github.com/pingcap/tidb/pull/61099
    • [x] support a set of cost-factor variables https://github.com/pingcap/tidb/pull/60333
    • [x] function to get relevant optimizer vars/fixes for a certain SQL https://github.com/pingcap/tidb/pull/61119
    • [x] explore/generate new plans via tunning variables like cost-factor and etc https://github.com/pingcap/tidb/pull/61190
    • [x] a set of rules to recommend the optimal plan
    • [x] support EXPLAIN EXPLORE ANALYZE
    • [ ] use LLM to recommend the optimal plan
  • [ ] a new sys-table to identify regressed queries
  • [ ] cross-database binding support
    • [ ] a new sys view to merge stmt_stats info of queries with the same pattern but different database names
    • [ ] TODO

Developers: @qw4990 @henrybw

qw4990 avatar Mar 18 '25 09:03 qw4990

/assign @henrybw @qw4990

qw4990 avatar Mar 18 '25 09:03 qw4990

@qw4990: GitHub didn't allow me to assign the following users: henrybw.

Note that only pingcap members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. For more information please see the contributor guide

In response to this:

/assign @henrybw @qw4990

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

ti-chi-bot[bot] avatar Mar 18 '25 09:03 ti-chi-bot[bot]

/assign

henrybw avatar Mar 19 '25 01:03 henrybw