raydp
raydp copied to clipboard
Performance benchmark on RayDP v.s. Spark
Hi there,
In the talk "RayDP: Build Large-scale End-to-end Data Analytics and AI Pipelines Using Spark and Ray" https://youtu.be/ELSrR1Geqg4?t=819, @carsonwang mentioned that RayDP would have better performance.
We are curious which type of queries / workflows you run and your analysis on the performance differences.
Thanks a lot!
Hi @chenya-zhang , there is a plan to integrate RayDP with Gluten which offloads the sql operations to native engine such as Velox. For TPC-H or TPC-DS like benchmark, we observed more than 2x speedup. You can find more details from the Gluten project https://github.com/oap-project/gluten.
We are also running RayDP + XGBoost on Ray workflows and observed performance advantage over running XGBoost on Spark. We will share more once the data is ready to publish.
Hi @carsonwang, Can you please share the performance benchmark numbers for Ray + XGBoost vs XGboost on Spark.
@carsonwang Did the plan to integrate RayDP with Gluten materialize?