spark-rapids
spark-rapids copied to clipboard
GpuCoalesce may need an associated batch coalesce
A user may place a coalesce in a query which will get translated into a GpuCoalesceExec. GpuCoalesceExec will collect batches without causing an additional shuffle, just like the original CPU CoalesceExec. However there may be many shuffle partitions being coalesced into a few partitions, and there's no batch coalescing associated with this exec. That could lead to poor performance from execs that will see the many individual shuffle partitions as separate batches rather than fewer, larger more efficient batches.
See this comment for a toy example that demonstrates the potential problem.