raydp icon indicating copy to clipboard operation
raydp copied to clipboard

Performance benchmark on RayDP v.s. Spark

Open chenya-zhang opened this issue 1 year ago • 3 comments

Hi there,

In the talk "RayDP: Build Large-scale End-to-end Data Analytics and AI Pipelines Using Spark and Ray" https://youtu.be/ELSrR1Geqg4?t=819, @carsonwang mentioned that RayDP would have better performance.

We are curious which type of queries / workflows you run and your analysis on the performance differences.

Thanks a lot!

chenya-zhang avatar May 05 '23 21:05 chenya-zhang

Hi @chenya-zhang , there is a plan to integrate RayDP with Gluten which offloads the sql operations to native engine such as Velox. For TPC-H or TPC-DS like benchmark, we observed more than 2x speedup. You can find more details from the Gluten project https://github.com/oap-project/gluten.

We are also running RayDP + XGBoost on Ray workflows and observed performance advantage over running XGBoost on Spark. We will share more once the data is ready to publish.

carsonwang avatar May 06 '23 06:05 carsonwang

Hi @carsonwang, Can you please share the performance benchmark numbers for Ray + XGBoost vs XGboost on Spark.

rishabh-dream11 avatar Mar 01 '24 20:03 rishabh-dream11

@carsonwang Did the plan to integrate RayDP with Gluten materialize?

rishabh-dream11 avatar Mar 01 '24 20:03 rishabh-dream11