spark-perf icon indicating copy to clipboard operation
spark-perf copied to clipboard

MLlib TODO items

Open jkbradley opened this issue 10 years ago • 4 comments

  • [ ] Change Scala testName to match Python test names: “glm-regression” —> GLMRegressionTest
  • [ ] Make parameter names match across all tests. (num-examples, num-rows, etc.)
  • [ ] Refactor correlation tests so pearson/spearman is a parameter.
  • [ ] Better data generation in Python

jkbradley avatar Nov 26 '14 01:11 jkbradley

Would be great to measure performance loss in pyspark vs scala for mllib models implemented in scala.

petro-rudenko avatar Jan 13 '15 17:01 petro-rudenko

This can be done by running both sets of tests. (They use the same set of parameters in the config file.) I've done it some, and the change in performance varies based on the particular algorithm. For long training times, it does not matter. For prediction, it varies some. The Spark 1.2 release brought Python a lot closer to Scala for prediction.

jkbradley avatar Jan 14 '15 23:01 jkbradley

Any plans to test performance new ml API (Pipeline, Crossvalidation, GridSearch, etc.)?

petro-rudenko avatar Feb 13 '15 18:02 petro-rudenko

I don't think we will for this release, but we will need to for the next one. We've been focusing on the API for now, but I hope the API can be stabilized before long.

jkbradley avatar Feb 16 '15 19:02 jkbradley