raydp
raydp copied to clipboard
Performance Evaluation
Hi there,
I would like to compare spark and spark with raydp. Is Hibench available? If not, Is there any other benchmarks that can run on a raydp cluster? Thanks!
We used to run some shuffle heavy workloads using RayDP and it showed no performance difference comparing with Spark on Yarn or standalone. Once Spark executors are started, they communicate with each other using spark's protocols so in theory there should be no difference. HiBench will not directly work with RayDP and you need to customize it so it can submit jobs to a Ray cluster. I recommend generating dataset using HiBench and write a pySpark program to compare the performance.
Our team is working on whether ray can help our system. If not performance, what are the advantages of raydp.
@Second222None , a typical use case of RayDP is to build an end to end pipeline in a single program instead of stitching multiple programs using glue code. For example you can use Spark to do data processing, Ray Train/Horovod/XGBoost on Ray for training and RayTune for hyper parameter training. All of these can be written in a single program and running on a single platform Ray. Data exchange can be accomplished by Ray's in-memory object store instead of persisting on a storage system.
If you can describe your potential use cases and requirement in a little more details, we can take a look and see if we can help.
close as stale