VariantSpark
VariantSpark copied to clipboard
VariantSpark is a framework for applying Spark-based Machine Learning methods to whole-genome variant information
VariantSpark Readme
News
New VariantSpark version (Cursed Forest) available: https://github.com/aehrc/VariantSpark
Install
- Download
variantspark.sh,example.confandvariantspark-1.0.jar - Ensure
variantspark.shis executablechmod +x variantspark.sh
Building From Source
If you have trouble running VariantSpark, you can build it yourself using Maven.
- Check out the repo
cd VariantSpark/variantsparkvi pom.xmland ensure software versions match those on your cluster.mvn packageto build.- If you built it locally, copy
target/VCF-clusterer-0.0.1-SNAPSHOT.jarto your cluster.
Submit a Job
Once installed, use the launcher script, variantspark.sh to submit a job to your cluster.
You need to specify a configuration file with -c. An example file is available as example.conf.
Submit a job using ./variantspark.sh -c example.conf.