spark-bench
spark-bench copied to clipboard
How to generate the dataset for Logistic Regression test?
Spark-Bench version (version number, tag, or git commit hash)
spark-bench_2.3.0_0.4.0-RELEASE_99
Details of your cluster setup (Spark version, Standalone/Yarn/Local/Etc)
spark-2.2, standalone
Scala version on your cluster
2.12.2
Your exact configuration file (with system details anonymized for security)
Relevant stacktrace
Description of your problem and any other relevant info
There is a test for Logistic Regression, but there is no data generator for Logistic Regression. How can I generate or download the dataset for this test?
Hi @congxu2016, sorry for the delay in answering.
The data generator in the legacy
branch will generate a dataset appropriate for the LogisticRegression workload: https://github.com/CODAIT/spark-bench/blob/legacy/LogisticRegression/bin/gen_data.sh
I also have hard time creating dataset for LogisticRegression.
I tried using the 'gen_data' from legacy as suggested above.
I'm getting:
Exception in thread "main" java.io.FileNotFoundException: File file:/opt/spark-bench/LogisticRegression/target/LogisticRegressionApp-1.0.jar does not exist
Is there an alternative way to to create the dataset ? possibly with something from the current 'RELEASE_99' ??