datafusion-comet bug: CAST timestamp to string ignores timezone prior to Spark 3.4

Describe the bug

In CometExpressionSuite we have two tests that are ignored for Spark 3.2 and 3.3.

  test("cast timestamp and timestamp_ntz to string") {
    // TODO: make the test pass for Spark 3.2 & 3.3
    assume(isSpark34Plus)

  test("cast timestamp and timestamp_ntz to long, date") {
    // TODO: make the test pass for Spark 3.2 & 3.3
    assume(isSpark34Plus)

Enabling these tests for 3.2 shows incorrect output:

== Results ==
  !== Correct Answer - 2001 ==                                                                         == Spark Answer - 2001 ==
   struct<tz_millis:string,ntz_millis:string,tz_micros:string,ntz_micros:string>                       struct<tz_millis:string,ntz_millis:string,tz_micros:string,ntz_micros:string>
  ![1970-01-01 05:29:59.991,1970-01-01 05:29:59.991,1970-01-01 05:29:59.991,1970-01-01 05:29:59.991]   [1970-01-01 05:29:59.991,1969-12-31 23:59:59.991,1970-01-01 05:29:59.991,1969-12-31 23:59:59.991]

  == Results ==
  !== Correct Answer - 10000 ==                                                                                                              == Spark Answer - 10000 ==
   struct<tz_millis:bigint,tz_micros:bigint,tz_millis_to_date:date,ntz_millis_to_date:date,tz_micros_to_date:date,ntz_micros_to_date:date>   struct<tz_millis:bigint,tz_micros:bigint,tz_millis_to_date:date,ntz_millis_to_date:date,tz_micros_to_date:date,ntz_micros_to_date:date>
  ![-1,-1,1970-01-01,1970-01-01,1970-01-01,1970-01-01]                                                                                       [-1,-1,1970-01-01,1969-12-31,1970-01-01,1969-12-31]

We should fall back to Spark rather than produce the wrong results.

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

May 24 '24 14:05 andygrove

IIRC there were differences in output between Spark 3.2 and Spark 3.4 for the timestamp_ntz type. Taking a closer look, the definition of timestamp_ntz (in Spark) essentially means that the value should be left untouched. So a value - 0 means 1970-01-01 00:00:00 in the session timezone. In the example above, the value is -1 so the correct output for timezone_ntz (millis) should be 1960-12-31 23:59:59 (ignoring the millis). Spark 3.2's answer of 1970-01-01 05:29:59 seems incorrect to me.

May 29 '24 17:05 parthchandra

I've recently been learning about the project and can be assigned me if this issue hasn't already been resolved，thanks

Jun 27 '24 12:06 suibianwanwank