Add shims for AQEShuffleReader [databricks]
Part of https://github.com/NVIDIA/spark-rapids/issues/9753
This PR updates GpuCustomShuffleReaderExec to match the logic in Spark, which has changed since we first implemented this.
There are three different sets of behaviors:
- Spark 3.1: No special handling for coalesced shuffle reads
- Spark 3.2+: Special handling for coalesced shuffle reads
- Spark 3.4.2+/3.5.1: Introduces CoalescedBoundary when reading HashPartitioned shuffles
build
build
build
build
The premerge failed due to
[2024-02-29T21:09:42.364Z] [INFO] Compiling 450 Scala sources and 58 Java sources to /home/ubuntu/spark-rapids/sql-plugin/target/spark330db/classes...
[2024-02-29T21:09:48.866Z] [ERROR] /home/ubuntu/spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuGetJsonObject.scala:19: object parsing is not a member of package util
[2024-02-29T21:09:48.867Z] [ERROR] import scala.util.parsing.combinator.RegexParsers
This should be fixed now as the Databricks nightly builds are passing. Kicking off the build again
build
Do we have tests for this? All of our existing tests pass without any of these changes, so I am curious if there is a good way for us to verify that these changes are working.
I don't think that we can reproduce the issue from https://issues.apache.org/jira/browse/SPARK-45592 until we implement https://github.com/NVIDIA/spark-rapids/issues/9876, but I could be wrong
But #9876 is talking about replacing TableCacheQueryStageExec which we should never replace as I understand we shouldn't replace query stage execs?
But #9876 is talking about replacing
TableCacheQueryStageExecwhich we should never replace as I understand we shouldn't replace query stage execs?
I think #9876 is really about making the plan inside TableCacheQueryStageExec run on the GPU. We can't do this today because Spark is looking for a specific InMemoryTableScan class so does not recognize our GPU version.
build