datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

`array_union` and `array_intersect` cannot handle NULL columnar data

Open Weijun-H opened this issue 1 year ago • 2 comments

Describe the bug

It seems that the functions array_union and array_intersect are unable to process columnar data that contains NULL values. These NULL values are being overlooked in the process.

To Reproduce

❯ 
CREATE TABLE array_intersect_table
AS VALUES
  ([1, 2, 2, 3], [2, 3, 4]),
  ([2, 3, 3], [3]),
  ([3], [3, 3, 4]),
  (null, [3, 4]),
  ([1, 2], null),
  (null, null)
;
0 rows in set. Query took 0.013 seconds.

❯ select array_intersect(column1, column2) from array_intersect_table;
+------------------------------------------------------------------------+
| array_intersect(array_intersect_table.column1,array_except_table.column2) |
+------------------------------------------------------------------------+
| [2, 3]                                                                 |
| [3]                                                                    |
| [3]                                                                    |
+------------------------------------------------------------------------+
3 rows in set. Query took 0.007 seconds.

Expected behavior

No response

Additional context

No response

Weijun-H avatar Mar 20 '24 11:03 Weijun-H

take

Weijun-H avatar Mar 20 '24 11:03 Weijun-H

Error is now:

DataFusion CLI v51.0.0
> CREATE TABLE array_intersect_table
AS VALUES
  ([1, 2, 2, 3], [2, 3, 4]),
  ([2, 3, 3], [3]),
  ([3], [3, 3, 4]),
  (null, [3, 4]),
  ([1, 2], null),
  (null, null)
;
0 row(s) fetched.
Elapsed 0.078 seconds.

> select array_intersect(column1, column2) from array_intersect_table;
Arrow error: Invalid argument error: Incorrect number of arrays provided to RowConverter, expected 1 got 0

Need some debugging in the implementation here:

https://github.com/apache/datafusion/blob/2a08013af3ccf703bee202c959b40bb0d35bdea1/datafusion/functions-nested/src/set_ops.rs

Add SLT tests here:

https://github.com/apache/datafusion/blob/2a08013af3ccf703bee202c959b40bb0d35bdea1/datafusion/sqllogictest/test_files/array.slt

Jefffrey avatar Dec 10 '25 13:12 Jefffrey