datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Incorrect results with expression resolution

Open alamb opened this issue 1 year ago • 1 comments

Describe the bug

DataFusion will sometimes resolve expressions incorrectly when the alias shadows an expression

To Reproduce

DataFusion CLI v37.1.0
> select a + b from (select 1 as a, 2 as b, 1 as "a + b");
+-------+
| a + b |
+-------+
| 1     | <- Should be 3
+-------+
1 row(s) fetched.
Elapsed 0.009 seconds.

Expected behavior

Result should be 3

Additional context

This was found by @peter-toth in https://github.com/apache/datafusion/pull/10396

alamb avatar May 07 '24 17:05 alamb

As @peter-toth mentions in https://github.com/apache/datafusion/pull/10396#issuecomment-2099072200 this appears to be related to name resolution (not CSE). I checked with EXPLAIN VERBOSE and indeed the initial plan appears to be incorret

DataFusion CLI v37.1.0
> explain verbose select a + b from (select 1 as a, 2 as b, 1 as "a + b");
+------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| plan_type                                                  | plan                                                                                       |
+------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| initial_logical_plan                                       | Projection: a + b                                                                          |
|                                                            |   Projection: Int64(1) AS a, Int64(2) AS b, Int64(1) AS a + b                              |
|                                                            |     EmptyRelation                                                                          |
| logical_plan after apply_function_rewrites                 | SAME TEXT AS ABOVE                                                                         |
| logical_plan after inline_table_scan                       | SAME TEXT AS ABOVE                                                                         |
| logical_plan after type_coercion                           | SAME TEXT AS ABOVE                                                                         |
| logical_plan after count_wildcard_rule                     | SAME TEXT AS ABOVE                                                                         |
| analyzed_logical_plan                                      | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_nested_union                  | SAME TEXT AS ABOVE                                                                         |
| logical_plan after simplify_expressions                    | SAME TEXT AS ABOVE                                                                         |
| logical_plan after unwrap_cast_in_comparison               | SAME TEXT AS ABOVE                                                                         |
| logical_plan after replace_distinct_aggregate              | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_join                          | SAME TEXT AS ABOVE                                                                         |
| logical_plan after decorrelate_predicate_subquery          | SAME TEXT AS ABOVE                                                                         |
| logical_plan after scalar_subquery_to_join                 | SAME TEXT AS ABOVE                                                                         |
| logical_plan after extract_equijoin_predicate              | SAME TEXT AS ABOVE                                                                         |
| logical_plan after simplify_expressions                    | SAME TEXT AS ABOVE                                                                         |
| logical_plan after rewrite_disjunctive_predicate           | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_duplicated_expr               | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_filter                        | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_cross_join                    | SAME TEXT AS ABOVE                                                                         |
| logical_plan after common_sub_expression_eliminate         | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_limit                         | SAME TEXT AS ABOVE                                                                         |
| logical_plan after propagate_empty_relation                | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_one_union                     | SAME TEXT AS ABOVE                                                                         |
| logical_plan after filter_null_join_keys                   | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_outer_join                    | SAME TEXT AS ABOVE                                                                         |
| logical_plan after push_down_limit                         | SAME TEXT AS ABOVE                                                                         |
| logical_plan after push_down_filter                        | SAME TEXT AS ABOVE                                                                         |
| logical_plan after single_distinct_aggregation_to_group_by | SAME TEXT AS ABOVE                                                                         |
| logical_plan after simplify_expressions                    | SAME TEXT AS ABOVE                                                                         |
| logical_plan after unwrap_cast_in_comparison               | SAME TEXT AS ABOVE                                                                         |
| logical_plan after common_sub_expression_eliminate         | SAME TEXT AS ABOVE                                                                         |
| logical_plan after optimize_projections                    | Projection: Int64(1) AS a + b                                                              |
|                                                            |   EmptyRelation                                                                            |
| logical_plan after eliminate_nested_union                  | SAME TEXT AS ABOVE                                                                         |
| logical_plan after simplify_expressions                    | SAME TEXT AS ABOVE                                                                         |
| logical_plan after unwrap_cast_in_comparison               | SAME TEXT AS ABOVE                                                                         |
| logical_plan after replace_distinct_aggregate              | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_join                          | SAME TEXT AS ABOVE                                                                         |
| logical_plan after decorrelate_predicate_subquery          | SAME TEXT AS ABOVE                                                                         |
| logical_plan after scalar_subquery_to_join                 | SAME TEXT AS ABOVE                                                                         |
| logical_plan after extract_equijoin_predicate              | SAME TEXT AS ABOVE                                                                         |
| logical_plan after simplify_expressions                    | SAME TEXT AS ABOVE                                                                         |
| logical_plan after rewrite_disjunctive_predicate           | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_duplicated_expr               | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_filter                        | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_cross_join                    | SAME TEXT AS ABOVE                                                                         |
| logical_plan after common_sub_expression_eliminate         | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_limit                         | SAME TEXT AS ABOVE                                                                         |
| logical_plan after propagate_empty_relation                | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_one_union                     | SAME TEXT AS ABOVE                                                                         |
| logical_plan after filter_null_join_keys                   | SAME TEXT AS ABOVE                                                                         |
| logical_plan after eliminate_outer_join                    | SAME TEXT AS ABOVE                                                                         |
| logical_plan after push_down_limit                         | SAME TEXT AS ABOVE                                                                         |
| logical_plan after push_down_filter                        | SAME TEXT AS ABOVE                                                                         |
| logical_plan after single_distinct_aggregation_to_group_by | SAME TEXT AS ABOVE                                                                         |
| logical_plan after simplify_expressions                    | SAME TEXT AS ABOVE                                                                         |
| logical_plan after unwrap_cast_in_comparison               | SAME TEXT AS ABOVE                                                                         |
| logical_plan after common_sub_expression_eliminate         | SAME TEXT AS ABOVE                                                                         |
| logical_plan after optimize_projections                    | SAME TEXT AS ABOVE                                                                         |
| logical_plan                                               | Projection: Int64(1) AS a + b                                                              |
|                                                            |   EmptyRelation                                                                            |
| initial_physical_plan                                      | ProjectionExec: expr=[1 as a + b]                                                          |
|                                                            |   PlaceholderRowExec                                                                       |
|                                                            |                                                                                            |
| initial_physical_plan_with_stats                           | ProjectionExec: expr=[1 as a + b], statistics=[Rows=Exact(1), Bytes=Exact(8), [(Col[0]:)]] |
|                                                            |   PlaceholderRowExec, statistics=[Rows=Exact(1), Bytes=Exact(0), []]                       |
|                                                            |                                                                                            |
| physical_plan after OutputRequirements                     | OutputRequirementExec                                                                      |
|                                                            |   ProjectionExec: expr=[1 as a + b]                                                        |
|                                                            |     PlaceholderRowExec                                                                     |
|                                                            |                                                                                            |
| physical_plan after aggregate_statistics                   | SAME TEXT AS ABOVE                                                                         |
| physical_plan after join_selection                         | SAME TEXT AS ABOVE                                                                         |
| physical_plan after LimitedDistinctAggregation             | SAME TEXT AS ABOVE                                                                         |
| physical_plan after EnforceDistribution                    | SAME TEXT AS ABOVE                                                                         |
| physical_plan after CombinePartialFinalAggregate           | SAME TEXT AS ABOVE                                                                         |
| physical_plan after EnforceSorting                         | SAME TEXT AS ABOVE                                                                         |
| physical_plan after OptimizeAggregateOrder                 | SAME TEXT AS ABOVE                                                                         |
| physical_plan after ProjectionPushdown                     | SAME TEXT AS ABOVE                                                                         |
| physical_plan after coalesce_batches                       | SAME TEXT AS ABOVE                                                                         |
| physical_plan after OutputRequirements                     | ProjectionExec: expr=[1 as a + b]                                                          |
|                                                            |   PlaceholderRowExec                                                                       |
|                                                            |                                                                                            |
| physical_plan after PipelineChecker                        | SAME TEXT AS ABOVE                                                                         |
| physical_plan after LimitAggregation                       | SAME TEXT AS ABOVE                                                                         |
| physical_plan after ProjectionPushdown                     | SAME TEXT AS ABOVE                                                                         |
| physical_plan                                              | ProjectionExec: expr=[1 as a + b]                                                          |
|                                                            |   PlaceholderRowExec                                                                       |
|                                                            |                                                                                            |
| physical_plan_with_stats                                   | ProjectionExec: expr=[1 as a + b], statistics=[Rows=Exact(1), Bytes=Exact(8), [(Col[0]:)]] |
|                                                            |   PlaceholderRowExec, statistics=[Rows=Exact(1), Bytes=Exact(0), []]                       |
|                                                            |                                                                                            |
+------------------------------------------------------------+--------------------------------------------------------------------------------------------+
77 row(s) fetched.
Elapsed 0.032 seconds.

alamb avatar May 07 '24 18:05 alamb

It is caused by the columnize_expr function. Based on the comments, I guess columnize_expr is for special needs related to aggregation, and it was introduced in PR #55.

Maybe we can reimplement columnize_expr by directly comparing the aggregated output expressions instead of their display names. Let me try to fix this issue.

jonahgao avatar May 10 '24 07:05 jonahgao

take

jonahgao avatar May 10 '24 07:05 jonahgao

Wow https://github.com/apache/datafusion/pull/55 is a very old one. Thank you @jonahgao -- this will be a great fix

alamb avatar May 10 '24 10:05 alamb