dolt
dolt copied to clipboard
Expecting Indexed JOINs. Got Inner JOINs.
Simple Repro:
https://www.dolthub.com/repositories/learn-sql/lesson-1/query/master?q=explain+SELECT+first_name%2C+last_name%2C+dept_name%0AFROM+%60employees%60+NATURAL+JOIN+employees_departments+NATURAL+JOIN+departments%3B%0A%0A&active=Tables
Issue seems to be that the natural join puts the query plan into a shape that it can't be optimized by the join optimizer. Same query rewritten to not use the NATURAL JOIN
syntax uses indexes.
Natural joins are just broken generally, and we need to do more work before we would get indexing to work.
The main problem is how we handle NATURAL_JOIN projections. Currently, we inline deduplicate every "shared" column that is chosen as a join filter. For example:
tmp5> explain select * from mytable t1 natural join mytable t2 join othertable t3 on t2.i = t3.i2;
+----------------------------------------------------+
| plan |
+----------------------------------------------------+
| InnerJoin(t1.i = t3.i2) |
| ├─ Project |
| │ ├─ columns: [t1.i, t1.s] |
| │ └─ InnerJoin((t1.i = t2.i) AND (t1.s = t2.s)) |
| │ ├─ Exchange |
| │ │ └─ TableAlias(t1) |
| │ │ └─ Table(mytable) |
| │ │ └─ columns: [i s] |
| │ └─ Exchange |
| │ └─ TableAlias(t2) |
| │ └─ Table(mytable) |
| │ └─ columns: [i s] |
| └─ Exchange |
| └─ TableAlias(t3) |
| └─ Table(othertable) |
| └─ columns: [i2 s2] |
+----------------------------------------------------+
t1
and t2
are the same underlying table, so we try to join on every column (even though on primary key would do). We "deduplicate" by adding a PROJECT node directly above the resulting inner join. This mirrors the basic MySQL behavior for a specific plan type, and is correct in this context:
mysql> select * from mytable t1 natural join mytable t2 join othertable t3 on t2.i = t3.i2;
+------+------------+--------+----+
| i | s | s2 | i2 |
+------+------------+--------+----+
| 1 | first row | third | 1 |
| 2 | second row | second | 2 |
| 3 | third row | first | 3 |
+------+------------+--------+----+
In most other contexts, this is too blunt and eliminates table visibility:
tmp5> explain select t2.* from mytable t1 natural join mytable t2 join othertable t3 on t2.i = t3.i2;
table not found: t2
vs MySQL
mysql> select t2.* from mytable t1 natural join mytable t2 join othertable t3 on t2.i = t3.i2;
+------+------------+
| i | s |
+------+------------+
| 1 | first row |
| 2 | second row |
| 3 | third row |
+------+------------+
3 rows in set (0.00 sec)
I am having the same problem with indexing natural joins. Sticking a projection node in the middle of a join tree prevents reordering the tree. Deduplication is a top-level scope optimization applied only for an unqualified star select.
TODO summary:
- The natural join columns are going to be aliases, applied at the top of a join tree to avoid interfering with reordering. Ex:
mysql> select t1.*, t2.*, i from mytable t1 natural join mytable t2 join othertable t3 on t2.i = t3.i2;
+------+------------+------+------------+------+
| i | s | i | s | i |
+------+------------+------+------------+------+
| 1 | first row | 1 | first row | 1 |
| 2 | second row | 2 | second row | 2 |
| 3 | third row | 3 | third row | 3 |
+------+------------+------+------------+------+
- Only deduplicate natural join aliased columns when the scope root select operator is an unqualified star.
Looks like https://github.com/dolthub/go-mysql-server/pull/1907 fixes this. NATURAL and USING joins funnel to the same operator, which is simplified into the equivalent projections+inner/left join.