datafusion-comet
datafusion-comet copied to clipboard
Cast String to Date ANSI Mode - Spark 3.2 - Mismatch between Spark and Comet Errors
Describe the bug
When a String which is an invalid date is cast to a Datetype
In spark 3.2 the error message is
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.10 executor driver): java.time.DateTimeException: Cannot cast 0 to DateType.
In spark 3.3 and above the error message is :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2) (192.168.1.10 executor driver): org.apache.spark.SparkDateTimeException: [CAST_INVALID_INPUT] The value '0' of the type "STRING" cannot be cast to "DATE" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error.
Currently in Comet the error messages match to spark 3.3 and above
Steps to reproduce
In the CometTestSuite cast StringType to DateType test we have added an assumption for this test to be only running in Spark3.3 and above.
Removing that triggers a test failure when the test suite is run on with the following env jdk-1.8 and spark-3.2.0
Additionally you can reproduce this error locally using spark shell setup with jdk 1.8 and spark 3.2.0
$SPARK_HOME/bin/spark-shell --conf spark.sql.ansi.enabled=true
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import java.io.File
import java.nio.file.Files
def roundtripParquet(df: DataFrame): DataFrame = {
val tempDir = Files.createTempDirectory("spark").toString
val filename = new File(tempDir, s"castTest_${System.currentTimeMillis()}.parquet").toString
df.write.mode(SaveMode.Overwrite).parquet(filename)
spark.read.parquet(filename)
}
import spark.implicits._
val data = roundtripParquet(Seq("0").toDF("a"))
data.createOrReplaceTempView("t")
val df = spark.sql(s"select a, cast(a as ${DataTypes.DateType.sql}) from t order by a")
df.collect().foreach(println)
Expected behavior
CometTestSuite cast String to DateType test should pass for spark-3.2.0
Additional context
https://github.com/apache/datafusion-comet/pull/383#issuecomment-2115341055
Is this an issue of just a mismatch between error messages? Or is the cast actually not doing the right thing with Spark 3.2?
Is this an issue of just a mismatch between error messages? Or is the cast actually not doing the right thing with Spark 3.2?
Is an issue with mismatch between error messages. - @andygrove we skip fixing that for now as its not a high priority and create a ticket instead https://github.com/apache/datafusion-comet/pull/383#issuecomment-2115341055
We can close this now that we no longer support Spark 3.2. Thanks @vidyasankarv