datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

fix: schema error when parsing order-by expressions

Open jonahgao opened this issue 1 year ago • 0 comments

Which issue does this PR close?

Closes #10013.

Rationale for this change

In the following query syntax, SELECT select_list FROM table_expression ORDER BY expressions, the order-by expressions can not only reference the column names in the select list, but also reference the column names in the FROM clause.

Currently, when building order-by expressions, only the input plan's schema (derived from the select list) is used. For the following query, this will cause the column reference a in ORDER BY SUM(a) to fail to normalize.

DataFusion CLI v37.0.0
> create table t(a int);
0 row(s) fetched.
Elapsed 0.005 seconds.

> SELECT
   SUM(a)
FROM t
having SUM(a) > 0
order by SUM(a);
Schema error: No field named a. Valid fields are "SUM(t.a)", "SUM(t.a)".

The solution is, when constructing order-by expressions, we can use both the schema of the select list and the schema of the FROM clause.

To achieve this, we need to handle the ORDER BY clause within the planning of select, that is, select_to_plan. This approach may also have other benefits:

  • Refine the current add_missing_columns logic by directly adding the missing columns to the select list, which should be more efficient and less susceptible to errors.
  • Easier to coordinate DISTINCT and ORDER BY, because distinct is defined within the select list.
  • Easier to expand the functionality of ORDER BY, such as supporting order by unprojected aggregate expressions and unprojected window functions. The term 'unprojected' means that they do not appear in the select list.

These can be implemented in subsequent PRs.

What changes are included in this PR?

Use both the schema of the select list and the FROM clause to construct order-by expressions.

Are these changes tested?

Yes. By existing tests and new tests.

Are there any user-facing changes?

No

jonahgao avatar Apr 25 '24 16:04 jonahgao