datafusion
datafusion copied to clipboard
`array[]` / `make_array()` will error if the types can not be coerced to a common type
Describe the bug
make_array()
will error if the types can not be coerced to a common type, as in postgres and spark
To Reproduce
Run this query (see query_array_scalar_coerce
test added in https://github.com/apache/arrow-datafusion/pull/3122)
SELECT [1, 2, '3']
This results in an error:
"Error during planning: Coercion from [Int64, Int64, Utf8] to the signature VariadicEqual failed.",);
Expected behavior
I expect this to return like postgres where it has coerced all the arguments to int (tried to coerce all arguments to the same type as the first, which in this case is 1
and thus int
alamb=# select array[1, 2, '3'];
array
---------
{1,2,3}
(1 row)
Additional context Found in https://github.com/apache/arrow-datafusion/pull/3122
I actually think in general the coercion rules for uniform
should attempt to coerce all arguments to the same type as the first argument
@alamb this issue description confronts with #3123 Which one should be done?
@comphead I think https://github.com/apache/arrow-datafusion/issues/3123 and this ticket are different
This ticket is about the input types of the arguments to array[]
https://github.com/apache/arrow-datafusion/issues/3123 is about the output type produced by the array[]
function
@alamb I checked 3 options
DataFusion CLI v11.0.0
❯ select make_array(1,2,'3');
+----------------------------------------+
| makearray(Int64(1),Int64(2),Utf8("3")) |
+----------------------------------------+
| [1, 2, 3] |
+----------------------------------------+
1 row in set. Query took 0.006 seconds.
❯ select [1, 2, '3'];
NotImplemented("Arrays with different types are not supported: {Int64, Utf8}")
❯ select array[1,2,'3'];
NotImplemented("Arrays with different types are not supported: {Utf8, Int64}")
here I can see nonconsistent behaviuor. I suppose all 3 should try coerce to first datatype and fail if its not possible?
here I can see nonconsistent behaviuor. I suppose all 3 should try coerce to first datatype and fail if its not possible?
I think that would make sense -- thank you for looking into this @comphead
@alamb it appears we have 2 code bases for supporting arrays:
- built in scalar function that drives
make_array(x, y)
function - sql planner that drives
array[x, y]
or[x, y]
constructions.
All those constructions do the same work, but they can be out of sync like now. do we need to support all 2 branches?
All those constructions do the same work, but they can be out of sync like now. do we need to support all 2 branches?
I am sorry for the delay in response. The short answer is I don't really know what the desired / correct behavior is here -- I think array[]
syntax is from postgres and make_array()
(formerly array()
) is from Spark
Yes, we can support all 3 of them. My concern there are 2 different codebases, and no easy way to find a common denominator for them. I can make all 3 functions works the same way, but in future those codebases can be out of sync again, leading make_array
works differently comparing to array
or []
.
🤔
@alamb one more thing I found
make_array
works on coerce expressions, and coerce doesn' consider first element as driving cast element. Rather it finds common type among all array values. Are you ok to with current behaviuor or you would like to change it to make first datatype as target type?
Are you ok to with current behaviuor or you would like to change it to make first datatype as target type?
Ideally we would follow some well established model (like posgres) but if that is too challenging I think we can keep the existing behavior too