spark-rapids icon indicating copy to clipboard operation
spark-rapids copied to clipboard

Add shims for AQEShuffleReader [databricks]

Open andygrove opened this issue 1 year ago • 10 comments

Part of https://github.com/NVIDIA/spark-rapids/issues/9753

This PR updates GpuCustomShuffleReaderExec to match the logic in Spark, which has changed since we first implemented this.

There are three different sets of behaviors:

  • Spark 3.1: No special handling for coalesced shuffle reads
  • Spark 3.2+: Special handling for coalesced shuffle reads
  • Spark 3.4.2+/3.5.1: Introduces CoalescedBoundary when reading HashPartitioned shuffles

andygrove avatar Feb 29 '24 15:02 andygrove

build

andygrove avatar Feb 29 '24 19:02 andygrove

build

andygrove avatar Feb 29 '24 19:02 andygrove

build

andygrove avatar Feb 29 '24 20:02 andygrove

build

razajafri avatar Mar 08 '24 17:03 razajafri

The premerge failed due to


[2024-02-29T21:09:42.364Z] [INFO] Compiling 450 Scala sources and 58 Java sources to /home/ubuntu/spark-rapids/sql-plugin/target/spark330db/classes...

[2024-02-29T21:09:48.866Z] [ERROR] /home/ubuntu/spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuGetJsonObject.scala:19: object parsing is not a member of package util

[2024-02-29T21:09:48.867Z] [ERROR] import scala.util.parsing.combinator.RegexParsers

This should be fixed now as the Databricks nightly builds are passing. Kicking off the build again

razajafri avatar Mar 08 '24 17:03 razajafri

build

andygrove avatar Mar 11 '24 14:03 andygrove

Do we have tests for this? All of our existing tests pass without any of these changes, so I am curious if there is a good way for us to verify that these changes are working.

I don't think that we can reproduce the issue from https://issues.apache.org/jira/browse/SPARK-45592 until we implement https://github.com/NVIDIA/spark-rapids/issues/9876, but I could be wrong

andygrove avatar Mar 15 '24 16:03 andygrove

But #9876 is talking about replacing TableCacheQueryStageExec which we should never replace as I understand we shouldn't replace query stage execs?

razajafri avatar Mar 25 '24 15:03 razajafri

But #9876 is talking about replacing TableCacheQueryStageExec which we should never replace as I understand we shouldn't replace query stage execs?

I think #9876 is really about making the plan inside TableCacheQueryStageExec run on the GPU. We can't do this today because Spark is looking for a specific InMemoryTableScan class so does not recognize our GPU version.

andygrove avatar Mar 27 '24 19:03 andygrove

build

razajafri avatar Apr 09 '24 17:04 razajafri