spark-perf MLlib TODO items

MLlib TODO items

Open jkbradley opened this issue 10 years ago • 4 comments

[ ] Change Scala testName to match Python test names: “glm-regression” —> GLMRegressionTest
[ ] Make parameter names match across all tests. (num-examples, num-rows, etc.)
[ ] Refactor correlation tests so pearson/spearman is a parameter.
[ ] Better data generation in Python

Nov 26 '14 01:11 jkbradley

Would be great to measure performance loss in pyspark vs scala for mllib models implemented in scala.

Jan 13 '15 17:01 petro-rudenko

This can be done by running both sets of tests. (They use the same set of parameters in the config file.) I've done it some, and the change in performance varies based on the particular algorithm. For long training times, it does not matter. For prediction, it varies some. The Spark 1.2 release brought Python a lot closer to Scala for prediction.

Jan 14 '15 23:01 jkbradley

Any plans to test performance new ml API (Pipeline, Crossvalidation, GridSearch, etc.)?

Feb 13 '15 18:02 petro-rudenko

I don't think we will for this release, but we will need to for the next one. We've been focusing on the API for now, but I hope the API can be stabilized before long.

Feb 16 '15 19:02 jkbradley

spark-perf spark-perf copied to clipboard

MLlib TODO items

spark-perf
spark-perf copied to clipboard