datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Unify SQL planning for `ORDER BY`, `HAVING`, `DISTINCT`, etc

Open alamb opened this issue 1 year ago • 0 comments

Is your feature request related to a problem or challenge?

As @jonahgao points out in https://github.com/apache/datafusion/pull/10234:

select x from foo order by y can is covered by add_missing_columns, by blindly adding columns into the descendant projection node. Another issue is that we should not run add_missing_columns for other SetExprs except SELECT.

In https://github.com/apache/datafusion/pull/10234 @jonahgao added a more general solution to use the merged schema from the select list and the FROM clause to handle resolving HAVING and set operations

However, both codepaths now exist, which makes for fairly complicated planning process

Describe the solution you'd like

I think that we should handle ORDER BY similarly to HAVING, use the merged schema, add the missing columns directly in the select list, instead of traversing the plan looking for projection node. Their processing logic may be reusable. I agree it might be good to have a broader discussion about this.

@jonahgao in https://github.com/apache/datafusion/pull/10234#issuecomment-2087760241

Describe alternatives you've considered

One alternative might be add_missing_columns and using the new order_by_to_sort_expr options added in https://github.com/apache/datafusion/pull/10234

Additional context

No response

alamb avatar May 01 '24 11:05 alamb