spark-bench icon indicating copy to clipboard operation
spark-bench copied to clipboard

Investigate and Port Label Propagation workload from legacy

Open ecurtin opened this issue 7 years ago • 2 comments

The legacy label propagation workload appears to generate a graph inside the workload. To keep with the standards established in the new version, the data generation should be extracted.

There are two possible paths. One is to investigate whether the graph data generator can be used in combination with the label propagation workload. This would be the preferable route. The other route would be extracting the data generation in the label propagation workload into its own workload.

https://github.com/SparkTC/spark-bench/blob/legacy/LabelPropagation/src/main/scala/LabelPropagationApp.scala

ecurtin avatar Sep 20 '17 15:09 ecurtin

I can't actually get this workload to run or generate any data (the Data Generator - Graph Generator example on the site). The exception reads "Could not find workload graph-data-generator."

I see it called a "legacy" workload in this open issue...is it possible there's a conflict with particular updates? Maybe it's my newer version of Spark (2.2.1)?

JRDetwiler avatar Jan 04 '18 03:01 JRDetwiler

Hi @JDetwiler15, thanks for using Spark-Bench!

The current version of spark-bench is a ground-up rewrite of the legacy codebase. We preserved the legacy version in it's own branch, and that's what's referenced in the link of this issue description.

The current version does provide a Graph Generator as you saw, so let's see if we can get to the bottom of your issue! For cleanliness in bug tracking, I've created a new issue where we can discuss problems with the graph data generator: https://github.com/SparkTC/spark-bench/issues/137

ecurtin avatar Jan 04 '18 14:01 ecurtin