popstrat copied to clipboard
The newer version of spark,adam,sparklingwater for "Genomic Analysis Using ADAM, Spark and Deep Learning" to the people who want to reproduce the test
Now i have some advice for Genomic Analysis Using ADAM, Spark and Deep Learning to the people who want to reproduce the test using the newer version tools:
Hi @nfergu ,i have some advice for Genomic Analysis Using ADAM, Spark and Deep Learning to the people who want to reproduce the test .So i post all the changes here ,and i hope it's helpful to others: first, in the .pom file :
- Spark version 1.6.1 replacing 1.2.0
- ADAM version 0.19.0 replacing 0.16.0
- Sparkling Water version 1.6.5 replacing 1.2.5
- H2O version replacing can only modify the version and don't install it after we have installed Sparkling Water)
is modified to
then ,in the codes :
val header = StructType(Array(StructField("Region", StringType)) ++
sortedVariantsBySampleId.first()._2.map(variant => {StructField(variant.variantId.toString, IntegerType)}))
is modified to
val header = DataTypes.createStructType(Array(DataTypes.createStructField("Region", DataTypes.StringType,false)) ++
sortedVariantsBySampleId.first()._2.map(variant => {DataTypes.createStructField(variant.variantId.toString,DataTypes.IntegerType,false)}))
// Create the SchemaRDD from the header and rows and convert the SchemaRDD into a H2O dataframe
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val schemaRDD = sqlContext.applySchema(rowRDD, header)
val h2oContext = new H2OContext(sc).start()
import h2oContext._
val dataFrame = h2oContext.toDataFrame(schemaRDD)
is modified to
// Create the SchemaRDD from the header and rows and convert the SchemaRDD into a H2O dataframe
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
//val dataFrame=sqlContext.createDataFrame(rowRDD, header)
val schemaRDD = sqlContext.applySchema(rowRDD, header)
val h2oContext = new H2OContext(sc).start()
import h2oContext._
val dataFrame1 =h2oContext.asH2OFrame(schemaRDD)
val dataFrame=H2OFrameSupport.allStringVecToCategorical(dataFrame1)
// Split the dataframe into 50% training, 30% test, and 20% validation data
val frameSplitter = new FrameSplitter(dataFrame, Array(.5, .3), Array("training", "test", "validation").map(Key.make), null)
is modified to
// Split the dataframe into 50% training, 30% test, and 20% validation data
val frameSplitter = new FrameSplitter(dataFrame, Array(.5, .3), Array("training", "test", "validation").map(Key.make[Frame](_)), null)
// Set the parameters for our deep learning model.
val deepLearningParameters = new DeepLearningParameters()
deepLearningParameters._train = training
deepLearningParameters._valid = validation
is modified to
// Set the parameters for our deep learning model.
val deepLearningParameters = new DeepLearningParameters()
deepLearningParameters._train = training._key
deepLearningParameters._valid = validation._key
// Score the model against the entire dataset (training, test, and validation data)
// This causes the confusion matrix to be printed
is modified to
// Score the model against the entire dataset (training, test, and validation data)
// This causes the confusion matrix to be printed
import org.apache.spark.sql.types.DataTypes
import hex._
import water.fvec._
import water.support._
import _root_.hex.Distribution.Family
import _root_.hex.deeplearning.DeepLearningModel
import _root_.hex.tree.gbm.GBMModel
import _root_.hex.{Model, ModelMetricsBinomial}
ok ,that's all, i have tested it successfully ,it will be better if you have other advice . Thank you again !