fix: Dialect requires derived table alias
Which issue does this PR close?
Closes #12993.
Rationale for this change
Fixes a bug preventing derived logical plans from running in MySQL due to a requirement that every derived table must have an alias:
The [AS] tbl_name clause is mandatory because every table in a FROM clause must have a name.
What changes are included in this PR?
Includes a dialect update to specify whether derived tables require aliases. For dialects that do, set a static alias. I've simply selected the name of the operation prefixed with derived_, like derived_sort, etc.
Are these changes tested?
Yes
Are there any user-facing changes?
No
By the way, I tried to pull your branch and run the tests on my laptop but the tests always fail. After merging with the latest main branch, the tests are passed.
Maybe @phillipleblanc and @sgrebnov want to take a look at this PR.
Looks good to me, thanks @goldmedal and @peasee
@goldmedal perhaps you would like to merge this PR as a way to verify your permissions are configured correctly?
@goldmedal perhaps you would like to merge this PR as a way to verify your permissions are configured correctly?
Sure, I plan to merge this PR if no more comments tomorrow.
@goldmedal perhaps you would like to merge this PR as a way to verify your permissions are configured correctly?
Sure, I plan to merge this PR if no more comments tomorrow.
Let's hold off on merging this for now. I think I've introduced a regression that I'd like to test more first before merging.
@goldmedal perhaps you would like to merge this PR as a way to verify your permissions are configured correctly?
Sure, I plan to merge this PR if no more comments tomorrow.
Let's hold off on merging this for now. I think I've introduced a regression that I'd like to test more first before merging.
Okay, looks like my PR didn't introduce the regression but there's a somewhat related bug currently on main. Take the following SQL:
SELECT j1_id FROM (SELECT ta.j1_id AS j1_id FROM j1 ta);
This gets rewritten in the GenericDialect to:
SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 AS ta)
Which is invalid, because the subquery is un-aliased. I tested it in DuckDB to confirm:
D SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ta);
Binder Error: Referenced table "ta" not found!
Candidate tables: "unnamed_subquery"
LINE 1: SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ...
Let's merge my PR, and I can raise a new issue for this other bug @goldmedal ?
This gets rewritten in the
GenericDialectto:SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 AS ta)
It's valid for DataFusion
> create table j1(j1_id int);
0 row(s) fetched.
Elapsed 0.085 seconds.
> SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 AS ta);
+-------+
| j1_id |
+-------+
+-------+
0 row(s) fetched.
Elapsed 0.042 seconds.
Generally, GenericDialect fits the DataFusion syntax. This behavior makes sense to me.
Which is invalid, because the subquery is un-aliased. I tested it in DuckDB to confirm:
D SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ta); Binder Error: Referenced table "ta" not found! Candidate tables: "unnamed_subquery" LINE 1: SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ...Let's merge my PR, and I can raise a new issue for this other bug @goldmedal ?
How about MySQLDialect? It enables requires_derived_table_alias. Which would be the result?
SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ta) derived_projection
or
SELECT j1_id FROM (SELECT ta.j1_id FROM j1 ta) `derived_projection`
If it's the first one, I think it's an issue because it isn't valid. We can file an issue for it.
If it's the second one, I think the user can customize the dialect::requires_derived_table_alias for the DuckDB syntax purpose.
If it's the first one, I think it's an issue because it isn't valid. We can file an issue for it.
My bad, I should've clarified better. It is the first one with MySqlDialect. I had just used GenericDialect as the example, but it affects both:
SELECT `ta`.`j1_id` FROM (SELECT `ta`.`j1_id` FROM `j1` AS `ta`) AS `derived_projection`
If it's the first one, I think it's an issue because it isn't valid. We can file an issue for it.
My bad, I should've clarified better. It is the first one with
MySqlDialect. I had just usedGenericDialectas the example, but it affects both:SELECT `ta`.`j1_id` FROM (SELECT `ta`.`j1_id` FROM `j1` AS `ta`) AS `derived_projection`
I see. I think it's an issue but we can fix it in the follow-up PR. Let's merge this PR first. Could you help to file an issue to trace it?
I see. I think it's an issue but we can fix it in the follow-up PR. Let's merge this PR first. Could you help to file an issue to trace it?
https://github.com/apache/datafusion/issues/13027
Thanks @peasee and @phillipleblanc @alamb for reviewing. 👍