datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

`array[]` / `make_array()` will error if the types can not be coerced to a common type

Open alamb opened this issue 2 years ago • 9 comments

Describe the bug make_array() will error if the types can not be coerced to a common type, as in postgres and spark

To Reproduce

Run this query (see query_array_scalar_coerce test added in https://github.com/apache/arrow-datafusion/pull/3122)

SELECT [1, 2, '3']

This results in an error:

"Error during planning: Coercion from [Int64, Int64, Utf8] to the signature VariadicEqual failed.",);

Expected behavior I expect this to return like postgres where it has coerced all the arguments to int (tried to coerce all arguments to the same type as the first, which in this case is 1 and thus int

alamb=# select array[1, 2, '3'];
  array  
---------
 {1,2,3}
(1 row)

Additional context Found in https://github.com/apache/arrow-datafusion/pull/3122

alamb avatar Aug 15 '22 22:08 alamb

I actually think in general the coercion rules for uniform should attempt to coerce all arguments to the same type as the first argument

alamb avatar Aug 16 '22 09:08 alamb

@alamb this issue description confronts with #3123 Which one should be done?

comphead avatar Aug 25 '22 16:08 comphead

@comphead I think https://github.com/apache/arrow-datafusion/issues/3123 and this ticket are different

This ticket is about the input types of the arguments to array[]

https://github.com/apache/arrow-datafusion/issues/3123 is about the output type produced by the array[] function

alamb avatar Aug 31 '22 11:08 alamb

@alamb I checked 3 options

DataFusion CLI v11.0.0
❯ select make_array(1,2,'3');
+----------------------------------------+
| makearray(Int64(1),Int64(2),Utf8("3")) |
+----------------------------------------+
| [1, 2, 3]                              |
+----------------------------------------+
1 row in set. Query took 0.006 seconds.
❯ select [1, 2, '3'];
NotImplemented("Arrays with different types are not supported: {Int64, Utf8}")
❯ select array[1,2,'3'];
NotImplemented("Arrays with different types are not supported: {Utf8, Int64}") 

here I can see nonconsistent behaviuor. I suppose all 3 should try coerce to first datatype and fail if its not possible?

comphead avatar Sep 02 '22 01:09 comphead

here I can see nonconsistent behaviuor. I suppose all 3 should try coerce to first datatype and fail if its not possible?

I think that would make sense -- thank you for looking into this @comphead

alamb avatar Sep 04 '22 10:09 alamb

@alamb it appears we have 2 code bases for supporting arrays:

  • built in scalar function that drives make_array(x, y) function
  • sql planner that drives array[x, y] or [x, y] constructions.

All those constructions do the same work, but they can be out of sync like now. do we need to support all 2 branches?

comphead avatar Sep 18 '22 03:09 comphead

All those constructions do the same work, but they can be out of sync like now. do we need to support all 2 branches?

I am sorry for the delay in response. The short answer is I don't really know what the desired / correct behavior is here -- I think array[] syntax is from postgres and make_array() (formerly array()) is from Spark

alamb avatar Sep 21 '22 17:09 alamb

Yes, we can support all 3 of them. My concern there are 2 different codebases, and no easy way to find a common denominator for them. I can make all 3 functions works the same way, but in future those codebases can be out of sync again, leading make_array works differently comparing to array or [].

comphead avatar Sep 22 '22 16:09 comphead

🤔

alamb avatar Sep 22 '22 17:09 alamb

@alamb one more thing I found make_array works on coerce expressions, and coerce doesn' consider first element as driving cast element. Rather it finds common type among all array values. Are you ok to with current behaviuor or you would like to change it to make first datatype as target type?

comphead avatar Sep 29 '22 23:09 comphead

Are you ok to with current behaviuor or you would like to change it to make first datatype as target type?

Ideally we would follow some well established model (like posgres) but if that is too challenging I think we can keep the existing behavior too

alamb avatar Sep 30 '22 11:09 alamb