dolt icon indicating copy to clipboard operation
dolt copied to clipboard

Expecting Indexed JOINs. Got Inner JOINs.

Open timsehn opened this issue 3 years ago • 2 comments

Simple Repro:

https://www.dolthub.com/repositories/learn-sql/lesson-1/query/master?q=explain+SELECT+first_name%2C+last_name%2C+dept_name%0AFROM+%60employees%60+NATURAL+JOIN+employees_departments+NATURAL+JOIN+departments%3B%0A%0A&active=Tables

timsehn avatar Aug 10 '21 20:08 timsehn

Issue seems to be that the natural join puts the query plan into a shape that it can't be optimized by the join optimizer. Same query rewritten to not use the NATURAL JOIN syntax uses indexes.

zachmu avatar Aug 12 '21 23:08 zachmu

Natural joins are just broken generally, and we need to do more work before we would get indexing to work.

The main problem is how we handle NATURAL_JOIN projections. Currently, we inline deduplicate every "shared" column that is chosen as a join filter. For example:

tmp5> explain select * from mytable t1 natural join mytable t2 join othertable t3 on t2.i = t3.i2;
+----------------------------------------------------+
| plan                                               |
+----------------------------------------------------+
| InnerJoin(t1.i = t3.i2)                            |
|  ├─ Project                                        |
|  │   ├─ columns: [t1.i, t1.s]                      |
|  │   └─ InnerJoin((t1.i = t2.i) AND (t1.s = t2.s)) |
|  │       ├─ Exchange                               |
|  │       │   └─ TableAlias(t1)                     |
|  │       │       └─ Table(mytable)                 |
|  │       │           └─ columns: [i s]             |
|  │       └─ Exchange                               |
|  │           └─ TableAlias(t2)                     |
|  │               └─ Table(mytable)                 |
|  │                   └─ columns: [i s]             |
|  └─ Exchange                                       |
|      └─ TableAlias(t3)                             |
|          └─ Table(othertable)                      |
|              └─ columns: [i2 s2]                   |
+----------------------------------------------------+

t1 and t2 are the same underlying table, so we try to join on every column (even though on primary key would do). We "deduplicate" by adding a PROJECT node directly above the resulting inner join. This mirrors the basic MySQL behavior for a specific plan type, and is correct in this context:

mysql> select * from mytable t1 natural join mytable t2 join othertable t3 on t2.i = t3.i2;
+------+------------+--------+----+
| i    | s          | s2     | i2 |
+------+------------+--------+----+
|    1 | first row  | third  |  1 |
|    2 | second row | second |  2 |
|    3 | third row  | first  |  3 |
+------+------------+--------+----+

In most other contexts, this is too blunt and eliminates table visibility:

tmp5> explain select t2.* from mytable t1 natural join mytable t2 join othertable t3 on t2.i = t3.i2;
table not found: t2

vs MySQL

mysql> select t2.* from mytable t1 natural join mytable t2 join othertable t3 on t2.i = t3.i2;
+------+------------+
| i    | s          |
+------+------------+
|    1 | first row  |
|    2 | second row |
|    3 | third row  |
+------+------------+
3 rows in set (0.00 sec)

I am having the same problem with indexing natural joins. Sticking a projection node in the middle of a join tree prevents reordering the tree. Deduplication is a top-level scope optimization applied only for an unqualified star select.

TODO summary:

  1. The natural join columns are going to be aliases, applied at the top of a join tree to avoid interfering with reordering. Ex:
mysql> select t1.*, t2.*, i from mytable t1 natural join mytable t2 join othertable t3 on t2.i = t3.i2;
+------+------------+------+------------+------+
| i    | s          | i    | s          | i    |
+------+------------+------+------------+------+
|    1 | first row  |    1 | first row  |    1 |
|    2 | second row |    2 | second row |    2 |
|    3 | third row  |    3 | third row  |    3 |
+------+------------+------+------------+------+
  1. Only deduplicate natural join aliased columns when the scope root select operator is an unqualified star.

max-hoffman avatar Oct 10 '22 23:10 max-hoffman

Looks like https://github.com/dolthub/go-mysql-server/pull/1907 fixes this. NATURAL and USING joins funnel to the same operator, which is simplified into the equivalent projections+inner/left join.

max-hoffman avatar Apr 15 '24 22:04 max-hoffman