datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Need to type coercion for arithmetic op when the left and right type are same

Open liukun4515 opened this issue 3 years ago • 6 comments

Describe the bug

https://github.com/apache/arrow-datafusion/pull/3254#issuecomment-1239345990

bug code: https://github.com/apache/arrow-datafusion/blob/master/datafusion/expr/src/binary_rule.rs#L293

To Reproduce Steps to reproduce the behavior:

Expected behavior A clear and concise description of what you expected to happen.

Additional context Add any other context about the problem here.

liukun4515 avatar Sep 07 '22 12:09 liukun4515

current plan for datafusion

❯ explain select cast(1 as decimal(10,1)) + cast(2 as decimal(10,1));
+---------------+--------------------------------------------------------------+
| plan_type     | plan                                                         |
+---------------+--------------------------------------------------------------+
| logical_plan  | Projection: Decimal128(Some(30),10,1) AS Int64(1) + Int64(2) |
|               |   EmptyRelation                                              |
| physical_plan | ProjectionExec: expr=[Some(30),10,1 as Int64(1) + Int64(2)]  |
|               |   EmptyExec: produce_one_row=true                            |
|               |                                                              |
+---------------+--------------------------------------------------------------+

spark plan:

spark-sql> explain extended select cast(1 as decimal(10,1)) + cast(2 as decimal(10,1));
== Parsed Logical Plan ==
'Project [unresolvedalias((cast(1 as decimal(10,1)) + cast(2 as decimal(10,1))), None)]
+- OneRowRelation

== Analyzed Logical Plan ==
(CAST(1 AS DECIMAL(10,1)) + CAST(2 AS DECIMAL(10,1))): decimal(11,1)
Project [CheckOverflow((promote_precision(cast(cast(1 as decimal(10,1)) as decimal(11,1))) + promote_precision(cast(cast(2 as decimal(10,1)) as decimal(11,1)))), DecimalType(11,1), true) AS (CAST(1 AS DECIMAL(10,1)) + CAST(2 AS DECIMAL(10,1)))#166]
+- OneRowRelation

== Optimized Logical Plan ==
Project [3.0 AS (CAST(1 AS DECIMAL(10,1)) + CAST(2 AS DECIMAL(10,1)))#166]
+- OneRowRelation

== Physical Plan ==
*(1) Project [3.0 AS (CAST(1 AS DECIMAL(10,1)) + CAST(2 AS DECIMAL(10,1)))#166]
+- *(1) Scan OneRowRelation[]

Time taken: 0.017 seconds, Fetched 1 row(s)

liukun4515 avatar Sep 07 '22 12:09 liukun4515

We follow the spark rule in the decimal type coercion https://github.com/apache/arrow-datafusion/blob/c359018baa8bbb0a227e83df948c903cde4d701f/datafusion/expr/src/binary_rule.rs#L353

liukun4515 avatar Sep 07 '22 12:09 liukun4515

cc @alamb @andygrove

I will fix it later.

liukun4515 avatar Sep 07 '22 13:09 liukun4515

Can we fix this and https://github.com/apache/arrow-datafusion/issues/3388 together?

liukun4515 avatar Sep 08 '22 09:09 liukun4515

There is more context about this issue here: https://github.com/apache/arrow-datafusion/pull/3254#issuecomment-1239345990

alamb avatar Sep 08 '22 20:09 alamb

When I try to fix this bug with remove below code

    // same type => all good
    if lhs_type == rhs_type {
        return Some(lhs_type.clone());
    }

When move the type coercion to the logical phase, the result data type of the expr can be determined in the logical phase.

In the physical phase, we must not use the type coercion to coerce the data type.

liukun4515 avatar Sep 16 '22 02:09 liukun4515