databend icon indicating copy to clipboard operation
databend copied to clipboard

Feature: support left semi join and right semi join

Open xudong963 opened this issue 3 years ago • 0 comments

Summary

TPCH q4 will be converted to semi join

MySQL [tpch]> explain
    -> select
    ->     o_orderpriority,
    ->     count(*) as order_count
    -> from
    ->     orders
    -> where
    ->         o_orderdate >= to_date('1993-07-01')
    ->   and o_orderdate < add_months(to_date('1993-07-01'), 3)
    ->   and exists (
    ->         select
    ->             *
    ->         from
    ->             lineitem
    ->         where
    ->                 l_orderkey = o_orderkey
    ->           and l_commitdate < l_receiptdate
    ->     )
    -> group by
    ->     o_orderpriority
    -> order by
    ->     o_orderpriority;
+---------------------------------------------------------------------------------------------------------------------+
| explain                                                                                                             |
+---------------------------------------------------------------------------------------------------------------------+
| Project                                                                                                             |
| ├── columns: [o_orderpriority (#5), order_count (#25)]                                                              |
| └── EvalScalar                                                                                                      |
|     ├── expressions: [COUNT(*) (#27)]                                                                               |
|     └── Sort                                                                                                        |
|         ├── sort keys: [o_orderpriority ASC NULLS LAST]                                                             |
|         └── AggregateFinal                                                                                          |
|             ├── group by: [o_orderpriority]                                                                         |
|             ├── aggregate functions: [count()]                                                                      |
|             └── AggregatePartial                                                                                    |
|                 ├── group by: [o_orderpriority]                                                                     |
|                 ├── aggregate functions: [count()]                                                                  |
|                 └── HashJoin                                                                                        |
|                     ├── join type: SEMI                                                                             |
|                     ├── build keys: [lineitem.l_orderkey (#9)]                                                      |
|                     ├── probe keys: [orders.o_orderkey (#0)]                                                        |
|                     ├── filters: []                                                                                 |
|                     ├── Project(Build)                                                                              |
|                     │   ├── columns: [l_orderkey (#9)]                                                              |
|                     │   └── TableScan                                                                               |
|                     │       ├── table: default.tpch.lineitem                                                        |
|                     │       ├── read rows: 600037902                                                                |
|                     │       ├── read bytes: 4219861478                                                              |
|                     │       ├── partitions total: 933                                                               |
|                     │       ├── partitions scanned: 933                                                             |
|                     │       └── push downs: [filters: [(l_commitdate < l_receiptdate)], limit: NONE]                |
|                     └── TableScan(Probe)                                                                            |
|                         ├── table: default.tpch.orders                                                              |
|                         ├── read rows: 150000000                                                                    |
|                         ├── read bytes: 1438183449                                                                  |
|                         ├── partitions total: 208                                                                   |
|                         ├── partitions scanned: 208                                                                 |
|                         └── push downs: [filters: [(o_orderdate >= 8582), (o_orderdate < 8674)], limit: NONE]       |
+---------------------------------------------------------------------------------------------------------------------+
33 rows in set (0.056 sec)

After supporting left/right semi join, we can adjust the order of semi join to make the small table be the build side.

xudong963 avatar Sep 13 '22 10:09 xudong963