datafusion-comet
datafusion-comet copied to clipboard
bug: CAST timestamp to string ignores timezone prior to Spark 3.4
Describe the bug
In CometExpressionSuite we have two tests that are ignored for Spark 3.2 and 3.3.
test("cast timestamp and timestamp_ntz to string") {
// TODO: make the test pass for Spark 3.2 & 3.3
assume(isSpark34Plus)
test("cast timestamp and timestamp_ntz to long, date") {
// TODO: make the test pass for Spark 3.2 & 3.3
assume(isSpark34Plus)
Enabling these tests for 3.2 shows incorrect output:
== Results ==
!== Correct Answer - 2001 == == Spark Answer - 2001 ==
struct<tz_millis:string,ntz_millis:string,tz_micros:string,ntz_micros:string> struct<tz_millis:string,ntz_millis:string,tz_micros:string,ntz_micros:string>
![1970-01-01 05:29:59.991,1970-01-01 05:29:59.991,1970-01-01 05:29:59.991,1970-01-01 05:29:59.991] [1970-01-01 05:29:59.991,1969-12-31 23:59:59.991,1970-01-01 05:29:59.991,1969-12-31 23:59:59.991]
== Results ==
!== Correct Answer - 10000 == == Spark Answer - 10000 ==
struct<tz_millis:bigint,tz_micros:bigint,tz_millis_to_date:date,ntz_millis_to_date:date,tz_micros_to_date:date,ntz_micros_to_date:date> struct<tz_millis:bigint,tz_micros:bigint,tz_millis_to_date:date,ntz_millis_to_date:date,tz_micros_to_date:date,ntz_micros_to_date:date>
![-1,-1,1970-01-01,1970-01-01,1970-01-01,1970-01-01] [-1,-1,1970-01-01,1969-12-31,1970-01-01,1969-12-31]
We should fall back to Spark rather than produce the wrong results.
Steps to reproduce
No response
Expected behavior
No response
Additional context
No response
IIRC there were differences in output between Spark 3.2 and Spark 3.4 for the timestamp_ntz type.
Taking a closer look, the definition of timestamp_ntz (in Spark) essentially means that the value should be left untouched.
So a value - 0 means 1970-01-01 00:00:00 in the session timezone. In the example above, the value is -1 so the correct output for timezone_ntz (millis) should be 1960-12-31 23:59:59 (ignoring the millis). Spark 3.2's answer of 1970-01-01 05:29:59 seems incorrect to me.
I've recently been learning about the project and can be assigned me if this issue hasn't already been resolved,thanks