ipex-llm
ipex-llm copied to clipboard
convert sparkdf to pdf within arrow
Description
In the previous implementation, we convert rdd of spark row to pandas dataframe directly, in this pr, we convert spark row to arrow table first, then convert arrow table to pandas dataframe. Below is init test perf data config: 1.1g csv file, 55g memory, 10 cores
without arrow: 29s
with arrow: 40s
Test code: [root@clx001]/home/ding/with_arrow.py, without_arrow.py
You may refer to the Pandas UDF implementations in Spark for using arrow for spark df and pandas df conversion.