datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

fix: Dialect requires derived table alias

Open peasee opened this issue 1 year ago • 5 comments

Which issue does this PR close?

Closes #12993.

Rationale for this change

Fixes a bug preventing derived logical plans from running in MySQL due to a requirement that every derived table must have an alias:

The [AS] tbl_name clause is mandatory because every table in a FROM clause must have a name.

What changes are included in this PR?

Includes a dialect update to specify whether derived tables require aliases. For dialects that do, set a static alias. I've simply selected the name of the operation prefixed with derived_, like derived_sort, etc.

Are these changes tested?

Yes

Are there any user-facing changes?

No

peasee avatar Oct 18 '24 06:10 peasee

By the way, I tried to pull your branch and run the tests on my laptop but the tests always fail. After merging with the latest main branch, the tests are passed.

goldmedal avatar Oct 19 '24 09:10 goldmedal

Maybe @phillipleblanc and @sgrebnov want to take a look at this PR.

goldmedal avatar Oct 19 '24 13:10 goldmedal

Looks good to me, thanks @goldmedal and @peasee

phillipleblanc avatar Oct 19 '24 14:10 phillipleblanc

@goldmedal perhaps you would like to merge this PR as a way to verify your permissions are configured correctly?

alamb avatar Oct 20 '24 12:10 alamb

@goldmedal perhaps you would like to merge this PR as a way to verify your permissions are configured correctly?

Sure, I plan to merge this PR if no more comments tomorrow.

goldmedal avatar Oct 20 '24 12:10 goldmedal

@goldmedal perhaps you would like to merge this PR as a way to verify your permissions are configured correctly?

Sure, I plan to merge this PR if no more comments tomorrow.

Let's hold off on merging this for now. I think I've introduced a regression that I'd like to test more first before merging.

peasee avatar Oct 21 '24 03:10 peasee

@goldmedal perhaps you would like to merge this PR as a way to verify your permissions are configured correctly?

Sure, I plan to merge this PR if no more comments tomorrow.

Let's hold off on merging this for now. I think I've introduced a regression that I'd like to test more first before merging.

Okay, looks like my PR didn't introduce the regression but there's a somewhat related bug currently on main. Take the following SQL:

SELECT j1_id FROM (SELECT ta.j1_id AS j1_id FROM j1 ta);

This gets rewritten in the GenericDialect to:

SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 AS ta)

Which is invalid, because the subquery is un-aliased. I tested it in DuckDB to confirm:

D SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ta);
Binder Error: Referenced table "ta" not found!
Candidate tables: "unnamed_subquery"
LINE 1: SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ...

Let's merge my PR, and I can raise a new issue for this other bug @goldmedal ?

peasee avatar Oct 21 '24 03:10 peasee

This gets rewritten in the GenericDialect to:

SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 AS ta)

It's valid for DataFusion

> create table j1(j1_id int);
0 row(s) fetched. 
Elapsed 0.085 seconds.

> SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 AS ta);
+-------+
| j1_id |
+-------+
+-------+
0 row(s) fetched. 
Elapsed 0.042 seconds.

Generally, GenericDialect fits the DataFusion syntax. This behavior makes sense to me.

Which is invalid, because the subquery is un-aliased. I tested it in DuckDB to confirm:

D SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ta);
Binder Error: Referenced table "ta" not found!
Candidate tables: "unnamed_subquery"
LINE 1: SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ...

Let's merge my PR, and I can raise a new issue for this other bug @goldmedal ?

How about MySQLDialect? It enables requires_derived_table_alias. Which would be the result?

SELECT ta.j1_id FROM (SELECT ta.j1_id FROM j1 ta) derived_projection

or

SELECT j1_id FROM (SELECT ta.j1_id FROM j1 ta) `derived_projection`

If it's the first one, I think it's an issue because it isn't valid. We can file an issue for it. If it's the second one, I think the user can customize the dialect::requires_derived_table_alias for the DuckDB syntax purpose.

goldmedal avatar Oct 21 '24 06:10 goldmedal

If it's the first one, I think it's an issue because it isn't valid. We can file an issue for it.

My bad, I should've clarified better. It is the first one with MySqlDialect. I had just used GenericDialect as the example, but it affects both:

SELECT `ta`.`j1_id` FROM (SELECT `ta`.`j1_id` FROM `j1` AS `ta`) AS `derived_projection`

peasee avatar Oct 21 '24 06:10 peasee

If it's the first one, I think it's an issue because it isn't valid. We can file an issue for it.

My bad, I should've clarified better. It is the first one with MySqlDialect. I had just used GenericDialect as the example, but it affects both:

SELECT `ta`.`j1_id` FROM (SELECT `ta`.`j1_id` FROM `j1` AS `ta`) AS `derived_projection`

I see. I think it's an issue but we can fix it in the follow-up PR. Let's merge this PR first. Could you help to file an issue to trace it?

goldmedal avatar Oct 21 '24 06:10 goldmedal

I see. I think it's an issue but we can fix it in the follow-up PR. Let's merge this PR first. Could you help to file an issue to trace it?

https://github.com/apache/datafusion/issues/13027

peasee avatar Oct 21 '24 06:10 peasee

Thanks @peasee and @phillipleblanc @alamb for reviewing. 👍

goldmedal avatar Oct 21 '24 06:10 goldmedal