spark-rapids icon indicating copy to clipboard operation
spark-rapids copied to clipboard

[FEA] Support FromUTCTimestamp

Open viadea opened this issue 3 years ago • 2 comments

I wish we can Support FromUTCTimestamp

Reproduce:

import org.apache.spark.sql.functions._
import spark.implicits._
import org.apache.spark.sql.types._
 
var df = spark.sparkContext.parallelize(Seq(1)).toDF()
df=df.withColumn("value82", (lit("123456.78").cast(DecimalType(8,2)))).
           withColumn("value63", (lit("123.456").cast(DecimalType(6,3)))).
           withColumn("value1510", (lit("12345.0123456789").cast(DecimalType(15,10)))).
           withColumn("value2510", (lit("123456789012345.0123456789").cast(DecimalType(25,10)))).
           withColumn("value2901", (lit("1234567890123456789012345678.1").cast(DecimalType(29,1)))).
           withColumn("value3802", (lit("123456789012345678901234567890123456.01").cast(DecimalType(38,2)))).
           withColumn("timestring", (lit("1997-02-28 10:30:00.012")))

df.write.format("parquet").mode("overwrite").save("/tmp/df.parquet")
df=spark.read.parquet("/tmp/df.parquet")
df.createOrReplaceTempView("df")

spark.sql("SELECT from_utc_timestamp(timestring,'UTC') FROM df").collect

Not-supported-messages:

! <FromUTCTimestamp> from_utc_timestamp(cast(timestring#420 as timestamp), UTC) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.FromUTCTimestamp
      !Expression <Cast> cast(timestring#420 as timestamp) cannot run on GPU because the GPU only supports a subset of formats when casting strings to timestamps. Refer to the CAST documentation for more details. To enable this operation on the GPU, set spark.rapids.sql.castStringToTimestamp.enabled to true.; Parsing the full rage of supported years is not supported. If your years are limited to 4 positive digits set spark.rapids.sql.hasExtendedYearValues to false.

viadea avatar Jul 29 '22 00:07 viadea

@viadea In the example above it looks like there is a conversion from a UTC timestamp to a UTC timestamp. Is that what is required, or do we expect the second parameter to be another timezone like Asia/Seoul? And if the second parameter is expected to be something other than UTC, will it always be a literal or will it be a column?

sameerz avatar Aug 02 '22 20:08 sameerz

The real use case is like below:

from_utc_timestamp(cast(date_format(xxx, Some(Etc/UTC)) as timestamp), UTC)

So above is just my minimum reproduce.

I can ask about the second parameter which is the timezone

viadea avatar Aug 05 '22 19:08 viadea