coercion rule about `eq` and InList between string type and numeric type
I have some concerns about the rule between string and number. I check some situation in the spark:
spark-sql> desc t3;
c1 int
spark-sql> explain extended select * from t3 where c1 = cast(123.123 as string);
== Parsed Logical Plan ==
'Project [*]
+- 'Filter ('c1 = cast(123.123 as string))
+- 'UnresolvedRelation [t3], [], false
== Analyzed Logical Plan ==
c1: int
Project [c1#186]
+- Filter (c1#186 = cast(cast(123.123 as string) as int))
+- SubqueryAlias spark_catalog.default.t3
+- HiveTableRelation [`default`.`t3`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [c1#186], Partition Cols: []]
== Optimized Logical Plan ==
Filter (isnotnull(c1#186) AND (c1#186 = 123))
+- HiveTableRelation [`default`.`t3`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [c1#186], Partition Cols: []]
== Physical Plan ==
*(1) Filter (isnotnull(c1#186) AND (c1#186 = 123))
+- Scan hive default.t3 [c1#186], HiveTableRelation [`default`.`t3`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, Data Cols: [c1#186], Partition Cols: []]
In the previous case, the result of coercion is Int.
I think we need to create an issue to track this.
@viirya @alamb
Originally posted by @liukun4515 in https://github.com/apache/arrow-datafusion/pull/2794#discussion_r907011527
cc @viirya
I'm going to implement casting between UTF8 and decimal, we can also consider these cases together.
I wonder if the question here is "should we automatically try and coerce numbers to strings (which is more general but slower) or coerce strings to numbers (which is less general but faster)"?
I wonder if the question here is "should we automatically try and coerce numbers to strings (which is more general but slower) or coerce strings to numbers (which is less general but faster)"?
yes, we should try some other database system and discuss the best behavior for these case
Here is what postgres does (appears to me to cast based on the type of the first element in the IN list):
alamb=# select 5 in (1, 2, 'fff');
ERROR: invalid input syntax for type integer: "fff"
LINE 1: select 5 in (1, 2, 'fff');
^
alamb=# select 'foo' in (1, 2, 'fff');
ERROR: invalid input syntax for type integer: "fff"
LINE 1: select 'foo' in (1, 2, 'fff');
^
alamb=# select 'foo' in ('1', '2', 'fff');
?column?
----------
f
(1 row)
Here is what postgres does (appears to me to cast based on the type of the first element in the IN list)
This sounds reasonable.
Here is what postgres does (appears to me to cast based on the type of the first element in the IN list):
alamb=# select 5 in (1, 2, 'fff'); ERROR: invalid input syntax for type integer: "fff" LINE 1: select 5 in (1, 2, 'fff'); ^ alamb=# select 'foo' in (1, 2, 'fff'); ERROR: invalid input syntax for type integer: "fff" LINE 1: select 'foo' in (1, 2, 'fff'); ^ alamb=# select 'foo' in ('1', '2', 'fff'); ?column? ---------- f (1 row)
Do you try this?
select 1 in (12,1,3,'123');
Do you try this?
psql (14.3)
Type "help" for help.
alamb=# select 1 in (12,1,3,'123');
?column?
----------
t
(1 row)