datafusion
datafusion copied to clipboard
`array_union` and `array_intersect` cannot handle NULL columnar data
Describe the bug
It seems that the functions array_union and array_intersect are unable to process columnar data that contains NULL values. These NULL values are being overlooked in the process.
To Reproduce
❯
CREATE TABLE array_intersect_table
AS VALUES
([1, 2, 2, 3], [2, 3, 4]),
([2, 3, 3], [3]),
([3], [3, 3, 4]),
(null, [3, 4]),
([1, 2], null),
(null, null)
;
0 rows in set. Query took 0.013 seconds.
❯ select array_intersect(column1, column2) from array_intersect_table;
+------------------------------------------------------------------------+
| array_intersect(array_intersect_table.column1,array_except_table.column2) |
+------------------------------------------------------------------------+
| [2, 3] |
| [3] |
| [3] |
+------------------------------------------------------------------------+
3 rows in set. Query took 0.007 seconds.
Expected behavior
No response
Additional context
No response
take
Error is now:
DataFusion CLI v51.0.0
> CREATE TABLE array_intersect_table
AS VALUES
([1, 2, 2, 3], [2, 3, 4]),
([2, 3, 3], [3]),
([3], [3, 3, 4]),
(null, [3, 4]),
([1, 2], null),
(null, null)
;
0 row(s) fetched.
Elapsed 0.078 seconds.
> select array_intersect(column1, column2) from array_intersect_table;
Arrow error: Invalid argument error: Incorrect number of arrays provided to RowConverter, expected 1 got 0
Need some debugging in the implementation here:
https://github.com/apache/datafusion/blob/2a08013af3ccf703bee202c959b40bb0d35bdea1/datafusion/functions-nested/src/set_ops.rs
Add SLT tests here:
https://github.com/apache/datafusion/blob/2a08013af3ccf703bee202c959b40bb0d35bdea1/datafusion/sqllogictest/test_files/array.slt