Add in basic GPU/CPU bridge operation [databricks]
This is step 3 in splitting https://github.com/NVIDIA/spark-rapids/pull/13368 into smaller pieces
Description
This adds in basic GPU/CPU bridge functionality, but it is off by default because the performance would not be good without the thread pool and optimizer.
Checklists
- [ ] This PR has added documentation for new or modified features or behaviors.
- [X] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)
- [ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.
The performance is expected to be bad so it is off by default an not tested. I did add some basic tests to verify that the code works.
Greptile Overview
Greptile Summary
This PR implements basic GPU/CPU bridge functionality that enables CPU expression evaluation within GPU execution plans. The feature is disabled by default (spark.rapids.sql.expression.cpuBridge.enabled=false) as noted in the PR description, since full performance optimization will come in future PRs.
Key Changes:
- New
GpuCpuBridgeExpressionthat transfers data GPU→Host→CPU→Host→GPU for CPU expression evaluation - Code generation support via
BridgeGenerateUnsafeProjectionwith interpreted fallback for ~940 lines - Bridge optimizer in
RapidsMetathat automatically wraps incompatible expressions (~300 lines of changes) - Comprehensive test coverage with
GpuCpuBridgeSuiteandBridgeUnsafeProjectionSuite(1500+ test lines) - New metrics for tracking bridge processing and wait times
- Proper resource management with ThreadLocal projections and task completion cleanup
Architecture: The bridge sits between GPU and CPU execution by: evaluating GPU input expressions → copying to host → running CPU expression → copying result back to GPU. The implementation includes deduplication of GPU inputs using semantic equality to minimize data transfers.
Safety Considerations:
- Feature is off by default with explicit configuration required
- Proper resource cleanup via task completion hooks
- ThreadLocal usage prevents conflicts across threads
- Comprehensive null handling and type support
- Excludes non-deterministic and unevaluable expressions from bridge
Confidence Score: 4/5
- Safe to merge - feature is disabled by default and has comprehensive test coverage
- Strong implementation with proper resource management and extensive testing. Score is 4/5 rather than 5/5 due to: (1) large amount of new code (~3000 lines) requiring careful runtime validation, (2) complex ThreadLocal usage that needs production verification, (3) bridge optimizer modifying expression trees which could have edge cases, and (4) acknowledged performance concerns that will be addressed in future PRs
- Pay close attention to
RapidsMeta.scala(complex optimizer logic with AST interaction) andBridgeGenerateUnsafeProjection.scala(large codegen module)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCpuBridgeExpression.scala | 5/5 | New GPU/CPU bridge expression implementation - enables CPU expression evaluation within GPU plans with proper resource management and metrics |
| sql-plugin/src/main/scala/org/apache/spark/sql/rapids/BridgeGenerateUnsafeProjection.scala | 4/5 | Large code generation module (~940 lines) for optimized bridge projections with codegen and interpreted fallback support |
| sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsMeta.scala | 4/5 | Extensive changes (~300 lines) adding bridge optimization logic, AST interaction handling, and expression tree conversion support |
| tests/src/test/scala/org/apache/spark/sql/rapids/BridgeUnsafeProjectionSuite.scala | 5/5 | Comprehensive tests (~1373 lines) for projection correctness across all data types |
Sequence Diagram
sequenceDiagram
participant Plan as SparkPlan
participant Bridge as GpuCpuBridgeExpression
participant GPU as GPU Memory
participant Host as Host Memory
participant CPU as CPU Expression
participant Builder as RapidsHostColumnBuilder
Plan->>Bridge: columnarEval(batch)
Note over Bridge: Start wait time tracking
Bridge->>Bridge: Evaluate GPU input expressions
Bridge->>GPU: Get GPU column data
GPU-->>Bridge: GPU columns
Bridge->>Bridge: Create ColumnarBatch with GPU columns
Note over Bridge: Start processing time tracking
Bridge->>Host: ColumnarToRowIterator (GPU→Host)
Note over Host: Data copied to host memory
Host->>CPU: Iterate rows through projection
loop For each row
CPU->>CPU: Evaluate CPU expression
CPU->>Builder: Append result to builder
end
Builder->>Host: Build host column
Host->>GPU: buildAndPutOnDevice() (Host→GPU)
Note over GPU: Result copied back to GPU
GPU-->>Bridge: GPU result column
Note over Bridge: End processing time
Bridge-->>Plan: GpuColumnVector
Note over Bridge: End wait time
build
CI failed because of https://github.com/NVIDIA/spark-rapids/issues/14009 but our CI currently has not way to turn off the databricks tests when you touched something even remotely related to databricks.
build
I upmerged to make sure that I had the latest fixes for what caused CI to fail
build
build