spark-rapids Add in basic GPU/CPU bridge operation [databricks]

This is step 3 in splitting https://github.com/NVIDIA/spark-rapids/pull/13368 into smaller pieces

Description

This adds in basic GPU/CPU bridge functionality, but it is off by default because the performance would not be good without the thread pool and optimizer.

Checklists

[ ] This PR has added documentation for new or modified features or behaviors.
[X] This PR has added new tests or modified existing tests to cover new code paths. (Please explain in the PR description how the new code paths are tested, such as names of the new/existing tests that cover them.)
[ ] Performance testing has been performed and its results are added in the PR description. Or, an issue has been filed with a link in the PR description.

The performance is expected to be bad so it is off by default an not tested. I did add some basic tests to verify that the code works.

Dec 12 '25 17:12 revans2

Greptile Overview

Greptile Summary

This PR implements basic GPU/CPU bridge functionality that enables CPU expression evaluation within GPU execution plans. The feature is disabled by default (spark.rapids.sql.expression.cpuBridge.enabled=false) as noted in the PR description, since full performance optimization will come in future PRs.

Key Changes:

New GpuCpuBridgeExpression that transfers data GPU→Host→CPU→Host→GPU for CPU expression evaluation
Code generation support via BridgeGenerateUnsafeProjection with interpreted fallback for ~940 lines
Bridge optimizer in RapidsMeta that automatically wraps incompatible expressions (~300 lines of changes)
Comprehensive test coverage with GpuCpuBridgeSuite and BridgeUnsafeProjectionSuite (1500+ test lines)
New metrics for tracking bridge processing and wait times
Proper resource management with ThreadLocal projections and task completion cleanup

Architecture: The bridge sits between GPU and CPU execution by: evaluating GPU input expressions → copying to host → running CPU expression → copying result back to GPU. The implementation includes deduplication of GPU inputs using semantic equality to minimize data transfers.

Safety Considerations:

Feature is off by default with explicit configuration required
Proper resource cleanup via task completion hooks
ThreadLocal usage prevents conflicts across threads
Comprehensive null handling and type support
Excludes non-deterministic and unevaluable expressions from bridge

Confidence Score: 4/5

Safe to merge - feature is disabled by default and has comprehensive test coverage
Strong implementation with proper resource management and extensive testing. Score is 4/5 rather than 5/5 due to: (1) large amount of new code (~3000 lines) requiring careful runtime validation, (2) complex ThreadLocal usage that needs production verification, (3) bridge optimizer modifying expression trees which could have edge cases, and (4) acknowledged performance concerns that will be addressed in future PRs
Pay close attention to RapidsMeta.scala (complex optimizer logic with AST interaction) and BridgeGenerateUnsafeProjection.scala (large codegen module)

Important Files Changed

File Analysis

Filename	Score	Overview
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuCpuBridgeExpression.scala	5/5	New GPU/CPU bridge expression implementation - enables CPU expression evaluation within GPU plans with proper resource management and metrics
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/BridgeGenerateUnsafeProjection.scala	4/5	Large code generation module (~940 lines) for optimized bridge projections with codegen and interpreted fallback support
sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsMeta.scala	4/5	Extensive changes (~300 lines) adding bridge optimization logic, AST interaction handling, and expression tree conversion support
tests/src/test/scala/org/apache/spark/sql/rapids/BridgeUnsafeProjectionSuite.scala	5/5	Comprehensive tests (~1373 lines) for projection correctness across all data types

Sequence Diagram

sequenceDiagram
    participant Plan as SparkPlan
    participant Bridge as GpuCpuBridgeExpression
    participant GPU as GPU Memory
    participant Host as Host Memory
    participant CPU as CPU Expression
    participant Builder as RapidsHostColumnBuilder
    
    Plan->>Bridge: columnarEval(batch)
    Note over Bridge: Start wait time tracking
    
    Bridge->>Bridge: Evaluate GPU input expressions
    Bridge->>GPU: Get GPU column data
    GPU-->>Bridge: GPU columns
    
    Bridge->>Bridge: Create ColumnarBatch with GPU columns
    Note over Bridge: Start processing time tracking
    
    Bridge->>Host: ColumnarToRowIterator (GPU→Host)
    Note over Host: Data copied to host memory
    
    Host->>CPU: Iterate rows through projection
    loop For each row
        CPU->>CPU: Evaluate CPU expression
        CPU->>Builder: Append result to builder
    end
    
    Builder->>Host: Build host column
    Host->>GPU: buildAndPutOnDevice() (Host→GPU)
    Note over GPU: Result copied back to GPU
    
    GPU-->>Bridge: GPU result column
    Note over Bridge: End processing time
    Bridge-->>Plan: GpuColumnVector
    Note over Bridge: End wait time

Dec 12 '25 17:12 greptile-apps[bot]

build

Dec 12 '25 19:12 revans2

CI failed because of https://github.com/NVIDIA/spark-rapids/issues/14009 but our CI currently has not way to turn off the databricks tests when you touched something even remotely related to databricks.

Dec 15 '25 14:12 revans2

build

Dec 15 '25 15:12 revans2

I upmerged to make sure that I had the latest fixes for what caused CI to fail

Dec 15 '25 15:12 revans2

build

Dec 15 '25 15:12 revans2

build

Dec 15 '25 17:12 revans2