Joseph Bradley
Joseph Bradley
@sryza Thanks for publishing it to Maven! Spark packages should be very doable---let me know if you have questions about it.
This can be done by running both sets of tests. (They use the same set of parameters in the config file.) I've done it some, and the change in performance...
I don't think we will for this release, but we will need to for the next one. We've been focusing on the API for now, but I hope the API...
@Jetsonpaul Can you try running a problem of similar size outside of Spark perf, but on the same cluster and with a similar configuration? Also, are you running with only...
I see. I'm still not sure if it's a problem with spark-perf or with pyspark requiring more memory than equivalent Scala/Java jobs. The best way to figure that out might...
> For ML jobs, I tend to set # partitions = # CPU cores available in cluster This can be set in the spark-perf/config/config.py file. Note that there is an...
Sounds good to me
Discussed offline: This PR will be renamed and put aside, limited to the docker stuff only, and considered in the future. A new PR will have the README updates required...