spark-rapids
spark-rapids copied to clipboard
[FEA] Implement `outputPartitioning` for GPU join execs
Is your feature request related to a problem? Please describe.
Spark's BroadcastHashJoinExec, ShuffleHashJoinExec, and BroadcastNestedLoopJoinExec classes implement outputPartitioning, but our GPU implementations do not. This could potentially lead to missed optimizations.
Describe the solution you'd like
- Add failing tests to compare GPU vs CPU join plans to ensure they have the same output partitioning
- Implement
outputPartitioning
Describe alternatives you've considered
Additional context