datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

DataFusion should support casting strings such as "4e7" to decimal

Open andygrove opened this issue 1 year ago • 2 comments

Is your feature request related to a problem or challenge?

DataFusion supports casting the string 4e7 to float but not to decimal. This is inconsistent with Postgres (and Apache Spark).

Postgres

postgres=# select cast('4e7' as float);
  float8  
----------
 40000000
(1 row)

postgres=# select cast('4e7' as decimal(10,2));
   numeric   
-------------
 40000000.00

Apache Spark

scala> spark.sql("select cast('4e7' as float)").show
+------------------+
|CAST(4e7 AS FLOAT)|
+------------------+
|             4.0E7|
+------------------+


scala> spark.sql("select cast('4e7' as decimal(10,2))").show
+--------------------------+
|CAST(4e7 AS DECIMAL(10,2))|
+--------------------------+
|               40000000.00|
+--------------------------+

DataFusion

DataFusion CLI v37.0.0
❯ select cast('4e7' as float);
+-------------+
| Utf8("4e7") |
+-------------+
| 40000000.0  |
+-------------+
1 row in set. Query took 0.010 seconds.

❯ select cast('4e7' as decimal(10,2));
Arrow error: Cast error: Cannot cast string '4e7' to value of Decimal128(38, 10) type

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

andygrove avatar Apr 30 '24 16:04 andygrove

I suspect this issue is actually in arrow_cast - the error message seems to come from https://github.com/apache/arrow-rs/blob/ada986c7ec8f8fe4f94235c8aaeba4995392ee72/arrow-cast/src/cast.rs#L2753

Omega359 avatar May 01 '24 16:05 Omega359

I believe this will be supported in the next arrow release

tustvold avatar May 01 '24 16:05 tustvold