spark-bench icon indicating copy to clipboard operation
spark-bench copied to clipboard

How to generate the dataset for Logistic Regression test?

Open congxu2016 opened this issue 6 years ago • 2 comments

Spark-Bench version (version number, tag, or git commit hash)

spark-bench_2.3.0_0.4.0-RELEASE_99

Details of your cluster setup (Spark version, Standalone/Yarn/Local/Etc)

spark-2.2, standalone

Scala version on your cluster

2.12.2

Your exact configuration file (with system details anonymized for security)

Relevant stacktrace

Description of your problem and any other relevant info

There is a test for Logistic Regression, but there is no data generator for Logistic Regression. How can I generate or download the dataset for this test?

congxu2016 avatar Jul 24 '18 21:07 congxu2016

Hi @congxu2016, sorry for the delay in answering.

The data generator in the legacy branch will generate a dataset appropriate for the LogisticRegression workload: https://github.com/CODAIT/spark-bench/blob/legacy/LogisticRegression/bin/gen_data.sh

ecurtin avatar Jul 25 '18 17:07 ecurtin

I also have hard time creating dataset for LogisticRegression. I tried using the 'gen_data' from legacy as suggested above. I'm getting: Exception in thread "main" java.io.FileNotFoundException: File file:/opt/spark-bench/LogisticRegression/target/LogisticRegressionApp-1.0.jar does not exist

Is there an alternative way to to create the dataset ? possibly with something from the current 'RELEASE_99' ??

lovengulu avatar Oct 17 '18 12:10 lovengulu