datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

short-circuited expression should be evaluated one by one

Open haohuaijin opened this issue 1 year ago • 5 comments

Describe the bug

In our current implementation, the sub-expression within the short-circuited expression is always evaluated even when they don't need to be evaluated. We first evaluate all expressions recursively and then perform the corresponding operation or function.

The related code is below the behavior of COALESCE function is wrong https://github.com/apache/arrow-datafusion/blob/b7e13a0af711477ad41450566c14430089edd3f2/datafusion/physical-expr/src/scalar_function.rs#L152-L156 the behavior of And and Or operation is wrong https://github.com/apache/arrow-datafusion/blob/b7e13a0af711477ad41450566c14430089edd3f2/datafusion/physical-expr/src/expressions/binary.rs#L262-L263

the behavior of CASE ... WHEN ... is correct https://github.com/apache/arrow-datafusion/blob/b7e13a0af711477ad41450566c14430089edd3f2/datafusion/physical-expr/src/expressions/case.rs#L136-L140

To Reproduce

DataFusion CLI v34.0.0
❯ create table t(x int, y int) as values (1,1), (2,2), (3,3), (0,0), (4,0);
0 rows in set. Query took 0.004 seconds.
❯ select y > 0 and 1/y < 1 from t;
Arrow error: Divide by zero error
❯ select y = 0 or 1/y < 1 from t;
Arrow error: Divide by zero error
❯ select COALESCE(1, 1/y) from t;
Arrow error: Divide by zero error
❯ select case 1 when 2 then 1/y end from t;
+--------------------------------------------------------------------+
| CASE Int64(1) WHEN Int64(2) THEN Int64(1) / CAST(t.y AS Int64) END |
+--------------------------------------------------------------------+
|                                                                    |
|                                                                    |
|                                                                    |
|                                                                    |
|                                                                    |
+--------------------------------------------------------------------+
5 rows in set. Query took 0.003 seconds.

Expected behavior

The short-circuited expressions should have the correct results. I also check in postgres

postgres=# create table t(x int, y int); insert into t values (1,1), (2,2), (3,3), (0,0), (4,0);
CREATE TABLE
INSERT 0 5
postgres=# select y > 0 and 1/y < 1 from t;
 ?column? 
----------
 f
 t
 t
 f
 f
(5 rows)

postgres=# select y = 0 or 1/y < 1 from t;
 ?column? 
----------
 f
 t
 t
 t
 t
(5 rows)

postgres=# select COALESCE(1, 1/y) from t;
 coalesce 
----------
        1
        1
        1
        1
        1
(5 rows)

Additional context

No response

haohuaijin avatar Jan 21 '24 07:01 haohuaijin

I prepare to handle it, If there is any progress, I'll record it under this issue

jackwener avatar Jan 24 '24 04:01 jackwener

Hi @jackwener, I have some thoughts on #8959 about implementing short circuit functions(like coalesce), do you have a time to take a look?

haohuaijin avatar Jan 24 '24 04:01 haohuaijin

I also find some very special cases when I meet the error. This is example sql:

SELECT
  (
    case
      when SUM(col1) != 0 then SUM(col1) / SUM(col2)
      else 0
    end
  ) as metric
FROM
  table
having
  (
    metric > 0
  )

This a case when case, but the filter metric > 0 will changed to and filter in the simplify_expressions optimization rule

liukun4515 avatar May 11 '24 03:05 liukun4515

I also find some very special cases when I meet the error. This is example sql:

SELECT
  (
    case
      when SUM(col1) != 0 then SUM(col1) / SUM(col2)
      else 0
    end
  ) as metric
FROM
  table
having
  (
    metric > 0
  )

This a case when case, but the filter metric > 0 will changed to and filter in the simplify_expressions optimization rule

In the simplify_expressions rule, we convert the CASE WHEN to filter but ignore the corner case.

The above plan will get the filter plan like:

filter:  SUM(col1) != 0 and SUM(col1) / SUM(col2) > 0

Due to the bug: the behavior of And and Or operation is wrong https://github.com/apache/arrow-datafusion/blob/b7e13a0af711477ad41450566c14430089edd3f2/datafusion/physical-expr/src/expressions/binary.rs#L262-L263

We will get error in the runtime.

liukun4515 avatar May 11 '24 03:05 liukun4515

I also find error, that short-circuited logical is invalid

DataFusion CLI v39.0.0
> create table if not exists test_tb(c1 int, c2 int, c3 varchar);
0 row(s) fetched.
Elapsed 0.002 seconds.

> insert into test_tb(c1, c2, c3) values(1,0,'1'),(2,2,'2'),(3,2,'3');
+-------+
| count |
+-------+
| 3     |
+-------+
1 row(s) fetched.
Elapsed 0.003 seconds.

> select * from test_tb;
+----+----+----+
| c1 | c2 | c3 |
+----+----+----+
| 1  | 0  | 1  |
| 2  | 2  | 2  |
| 3  | 2  | 3  |
+----+----+----+
3 row(s) fetched.
Elapsed 0.002 seconds.

> select c1 != 1 and c1 / c2 = 1 from test_tb;
Arrow error: Divide by zero error

The c1 in the first row does not meet the condition c1 !=1 In theory, it is not necessary to execute the sub-expr c1 / c2 = 1

zhuliquan avatar Jun 27 '24 15:06 zhuliquan

I think this idea was largely implemented by @acking-you in

  • https://github.com/apache/datafusion/pull/15694

alamb avatar Aug 10 '25 10:08 alamb

We merged a fix for this in the 51 release, but we have found as subtle problem so I am planning to revert it for 51 (the fix should appear in 52):

  • https://github.com/apache/datafusion/pull/18567

alamb avatar Nov 09 '25 11:11 alamb