datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

DataFusion HashJoin LeftAnti doesn't support null aware anti join

Open viirya opened this issue 1 year ago • 0 comments

Describe the bug

During working on https://github.com/apache/datafusion-comet/pull/437, a few Spark join tests are failed when delegating to DataFusion HashJoin.

It is because that DataFusion HashJoin LeftAnti Join returns incorrect results when it is a null aware anti join.

To Reproduce

Added a test to join.slt:

statement ok
CREATE TABLE IF NOT EXISTS test_table(c1 INT, c2 INT) AS VALUES
(1, 1),
(2, 2),
(3, 3),
(4, null),
(null, 0);

query II
SELECT * FROM test_table t1 WHERE (c1 NOT IN (SELECT c2 FROM test_table)) = true
----
4 NULL
NULL 0

Expected behavior

Above query should return empty relation.

Additional context

No response

viirya avatar May 20 '24 17:05 viirya