Different semantics of casting from int64 to timestamp between Comet and Spark
What is the problem the feature request solves?
In Spark, casting from LongType to Timestamp is to casting seconds to microseconds. arrow-rs's cast kernel basically simply reinterprets Int64Array as TimestampMicrosecondArray. So the scale is different, i.e., seconds v.s. microseconds.
We need to add this special case into Comet's cast kernel.
Describe the potential solution
No response
Additional context
No response
This cast is currently disabled by default, but it would be good to implement this. I have added this to the 0.2.0 milestone.
I can take this
Thanks @justahuman1!
A good place to start is org.apache.comet.expressions.CometCast#canCastFromLong. This currently returns false for casting to timestamp, so enabling it here is the first step.
Next, you can enable the following test in CometCastSuite:
ignore("cast LongType to TimestampType") {
// java.lang.ArithmeticException: long overflow
castTest(generateLongs(), DataTypes.TimestampType)
}
Also check CometExpressionSuite.test("cast timestamp and timestamp_ntz"). This reads timestamps as longs from a parquet file (which may store the values as either millis or micros).
A better test would be something like this:
test("cast LongType to TimestampType") {
// input value is interpreted as milliseconds but will be cast to microseconds
val min = -9223372036854775L
val max = 9223372036854775L
val values = gen.generateLongs(dataSize)
.filter(x => x >= min && x <= max) ++ Seq(min, max)
castTest(withNulls(values).toDF("a"), DataTypes.TimestampType)
}
However the min value here is actually causing an overflow in Spark and I am not sure why so this needs some more research.
the min value here is actually causing an overflow in Spark
Probably because Spark is converting the value from millis to micros?
the min value here is actually causing an overflow in Spark
Probably because Spark is converting the value from millis to micros?
Right, my understanding was that the minimum valid value for micros would be -9223372036854775808 (Long.MinValue) so dividing that by 1000 to convert to millis would be -9223372036854775. The actual min value supported in the cast appears to be at least one order of magnitude smaller than that though.