datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Comparison between negative zero and false produces incorrect result

Open andygrove opened this issue 1 year ago • 2 comments

Describe the bug

SQL

SELECT c30, c98, c30 = c98 FROM test0 ORDER BY c30, c98;

Spark Plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(2) Sort [c30#30 ASC NULLS FIRST, c98#98 ASC NULLS FIRST], true, 0
   +- AQEShuffleRead coalesced
      +- ShuffleQueryStage 0
         +- Exchange rangepartitioning(c30#30 ASC NULLS FIRST, c98#98 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=16892]
            +- *(1) Project [c30#30, c98#98, (c30#30 = cast(c98#98 as float)) AS (c30 = c98)#15740]
               +- *(1) ColumnarToRow
                  +- FileScan parquet [c30#30,c98#98] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c30:float,c98:boolean>
+- == Initial Plan ==
   Sort [c30#30 ASC NULLS FIRST, c98#98 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(c30#30 ASC NULLS FIRST, c98#98 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=16878]
      +- Project [c30#30, c98#98, (c30#30 = cast(c98#98 as float)) AS (c30 = c98)#15740]
         +- FileScan parquet [c30#30,c98#98] Batched: true, DataFilters: [], Format: Parquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c30:float,c98:boolean>

Comet Plan

AdaptiveSparkPlan isFinalPlan=true
+- == Final Plan ==
   *(2) Sort [c30#30 ASC NULLS FIRST, c98#98 ASC NULLS FIRST], true, 0
   +- AQEShuffleRead coalesced
      +- ShuffleQueryStage 0
         +- Exchange rangepartitioning(c30#30 ASC NULLS FIRST, c98#98 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=16957]
            +- *(1) ColumnarToRow
               +- CometProject [c30#30, c98#98, (c30 = c98)#15748], [c30#30, c98#98, (c30#30 = cast(c98#98 as float)) AS (c30 = c98)#15748]
                  +- CometScan parquet [c30#30,c98#98] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c30:float,c98:boolean>
+- == Initial Plan ==
   Sort [c30#30 ASC NULLS FIRST, c98#98 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(c30#30 ASC NULLS FIRST, c98#98 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=16937]
      +- CometProject [c30#30, c98#98, (c30 = c98)#15748], [c30#30, c98#98, (c30#30 = cast(c98#98 as float)) AS (c30 = c98)#15748]
         +- CometScan parquet [c30#30,c98#98] Batched: true, DataFilters: [], Format: CometParquet, Location: InMemoryFileIndex(1 paths)[file:/home/andy/git/apache/datafusion-comet/fuzz-testing/test0.parquet], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<c30:float,c98:boolean>

First difference at row 23: Spark: -0.0,false,true Comet: -0.0,false,false

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

andygrove avatar Jul 15 '24 15:07 andygrove

We should test with non-negative zero as well

andygrove avatar Jul 15 '24 15:07 andygrove

We should fix in the upstream https://github.com/apache/datafusion/issues/11108

kazuyukitanimura avatar Jul 17 '24 22:07 kazuyukitanimura