spark-rapids icon indicating copy to clipboard operation
spark-rapids copied to clipboard

[FEA] Support sha2

Open viadea opened this issue 2 years ago • 3 comments

I wish we can support sha2 function.

eg in spark-sql:

select sha2(c_customer_id,256) from tpcds.customer limit 3;

Not-supported-messages:

      ! <Sha2> sha2(cast(c_customer_id#3 as binary), 256) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.Sha2

viadea avatar Aug 18 '23 17:08 viadea

We could leverage https://github.com/rapidsai/cudf/pull/9215 , but could also implement the sha-2 algorithm in spark-rapids-jni.

sameerz avatar Aug 22 '23 20:08 sameerz

already supported

scala> spark.sql("""select _1, get_json_object(_1,"$.package_name"), sha2(_2,256) from table""").show(44, truncate=false)
25/05/29 17:30:53 WARN GpuOverrides:
! <LocalTableScanExec> cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.execution.LocalTableScanExec
  @Expression <AttributeReference> _1#364 could run on GPU
  @Expression <AttributeReference> get_json_object(_1, $.package_name)#365 could run on GPU
  @Expression <AttributeReference> sha2(_2, 256)#366 could run on GPU

nvliyuan avatar May 29 '25 09:05 nvliyuan

Hi @nvliyuan, this is interesting. This is what I get in my testing. I used the 25.12.0-SNAPSHOT for my testing. We don't seem to support sha2 yet.

scala> spark.sql("select sha2(n_name, 256) from nation").show
25/10/29 17:54:52 WARN GpuOverrides:
!Exec <CollectLimitExec> cannot run on GPU because the Exec CollectLimitExec has been disabled, and is disabled by default because Collect Limit replacement can be slower on the GPU, if huge number of rows in a batch it could help by limiting the number of rows transferred from GPU to CPU. Set spark.rapids.sql.exec.CollectLimitExec to true if you wish to enable it
  @Partitioning <SinglePartition$> could run on GPU
  !Exec <ProjectExec> cannot run on GPU because not all expressions can be replaced
    @Expression <Alias> toprettystring(sha2(cast(n_name#35 as binary), 256), Some(UTC)) AS toprettystring(sha2(n_name, 256))#160 could run on GPU
      @Expression <ToPrettyString> toprettystring(sha2(cast(n_name#35 as binary), 256), Some(UTC)) could run on GPU
        ! <Sha2> sha2(cast(n_name#35 as binary), 256) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.Sha2
          @Expression <Cast> cast(n_name#35 as binary) could run on GPU
            @Expression <AttributeReference> n_name#35 could run on GPU
          @Expression <Literal> 256 could run on GPU
    *Exec <FileSourceScanExec> will run on GPU

jihoonson avatar Oct 29 '25 17:10 jihoonson