spark-rapids icon indicating copy to clipboard operation
spark-rapids copied to clipboard

Support columnar processing for mapInArrow[databricks]

Open firestarman opened this issue 3 years ago • 2 comments

closes https://github.com/NVIDIA/spark-rapids/issues/6313

This PR adds the columnar support for the new API mapInArrow which is introduced in Spark 3.3.0.

Performance

  • About 6.8 GB Parquet data in local files.
  • CPU 12 cores, and one GPU (Titan V, with 12GB memory)
CPU Read + CPU mapInArrow GPU Read + CPU mapInArrow GPU Read + GPU mapInArrow
97.20 91.36 81.67

Signed-off-by: Liangcai Li [email protected]

firestarman avatar Oct 17 '22 05:10 firestarman

build

firestarman avatar Oct 17 '22 08:10 firestarman

build

firestarman avatar Oct 18 '22 02:10 firestarman

build

firestarman avatar Oct 19 '22 04:10 firestarman

build

firestarman avatar Oct 19 '22 05:10 firestarman

build

pxLi avatar Oct 19 '22 07:10 pxLi

build

firestarman avatar Oct 20 '22 08:10 firestarman

The latest failure is related to https://github.com/NVIDIA/spark-rapids/issues/6869

firestarman avatar Oct 20 '22 10:10 firestarman

build

firestarman avatar Oct 20 '22 13:10 firestarman