datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Make modulos with negative float zero compat with other engines

Open comphead opened this issue 1 year ago • 8 comments

Describe the bug

DF returns NaN in modulo query below and is not compliant with major engines:

> select 1 % -0.0;
+------------------------+
| Int64(1) % Float64(-0) |
+------------------------+
| NaN                    |
+------------------------+

PG, Trino fails on such query, DuckDB, Spark returns NULL

@alamb @viirya @andygrove what would be expected behavior from DataFusion? We can also introduce a config to be PG compliant or PG/Trino compliant?

To Reproduce

No response

Expected behavior

No response

Additional context

No response

comphead avatar Jun 21 '24 17:06 comphead

Surprising that Postgres fails. According to the docs, Infinity, -Infinity, and NaN are legitimate floating point values. This is also the expected behavior per the ANSI SQL standard, afaik. NULL is not the right thing to return here.

parthchandra avatar Jun 21 '24 20:06 parthchandra

Postgres fails:

andrewlamb@Andrews-MacBook-Pro-2:~/Software/arrow-rs$ psql -h localhost -U postgres
psql (14.12 (Homebrew), server 16.1 (Debian 16.1-1.pgdg120+1))
WARNING: psql major version 14, server major version 16.
         Some psql features might not work.
Type "help" for help.

postgres=# select 1 % -0.0;
ERROR:  division by zero

alamb avatar Jun 21 '24 22:06 alamb

I don't have a strong preference one way or the other, personally

alamb avatar Jun 21 '24 22:06 alamb

I don't have a strong preference one way or the other, personally

I would go with Spark/DuckDB way :) if there is no other strong opinions, I'll make a fix soon

comphead avatar Jun 21 '24 23:06 comphead

ERROR:  division by zero

I wasn't doubting it. :) But I am surprised.

@comphead I'll take back my 'NULL is not the right thing to return here' comment. It's better than failing for most users. Just one thing to keep in mind - Spark sql has a isnan function which should be consistent with this.

parthchandra avatar Jun 22 '24 00:06 parthchandra

Thanks @parthchandra for the suggestion, I just checked we should be fine.

scala> spark.sql("select 1 % -0.0").show(false)
+---------+
|(1 % 0.0)|
+---------+
|NULL     |
+---------+

scala> spark.sql("select isnan(1 % -0.0)").show(false)
+----------------+
|isnan((1 % 0.0))|
+----------------+
|false           |
+----------------+

comphead avatar Jun 22 '24 22:06 comphead

arrow-rs follows the rules of IEEE 754. If we intend to be compatible with other engines, many cases will also need to be modified, such as division by zero.

DataFusion CLI v39.0.0
> select 1.0/0.0;
+-------------------------+
| Float64(1) / Float64(0) |
+-------------------------+
| inf                     |
+-------------------------+


> select 0.0/0.0;
+-------------------------+
| Float64(0) / Float64(0) |
+-------------------------+
| NaN                     |
+-------------------------+

In PostgreSQL:

postgres=# select 1.0/0.0;
ERROR:  division by zero
postgres=# select 0.0/0.0;
ERROR:  division by zero

jonahgao avatar Jun 23 '24 15:06 jonahgao

This is a good point @jonahgao so this gives me even more confidence to introduce a new DF param which controls should DF go with IEEE 754

comphead avatar Jun 23 '24 17:06 comphead