VariantSpark
VariantSpark copied to clipboard
Error when loading json model
Steps to reproduce:
Train a model using e.g. ImportanceCmd:
$./bin/variant-spark --local -- importance -if data/chr22_1000.vcf -ff data/chr22-labels.csv -fc 22_16051249 -rn 10 -rbs 10 -om target/ch22-model.java -sr 13 -v
Then load that model using e.g. AnalyzeRFCmd:
$./bin/variant-spark --local -- analyze-rf -im target/ch22-model.json
Gives the following exception:
java.io.StreamCorruptedException: invalid stream header: 7B0A2020 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:900) at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358) at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:63) at org.apache.spark.serializer.JavaDeserializationStream.<init>(JavaSerializer.scala:63) at org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:122) at au.csiro.variantspark.cli.AnalyzeRFCmd$$anonfun$1.apply(AnalyzeRFCmd.scala:81) at au.csiro.variantspark.cli.AnalyzeRFCmd$$anonfun$1.apply(AnalyzeRFCmd.scala:80) at au.csiro.pbdava.ssparkle.common.utils.LoanUtils$.withCloseable(LoanUtils.scala:18) at au.csiro.variantspark.cli.AnalyzeRFCmd.run(AnalyzeRFCmd.scala:80) at au.csiro.sparkle.common.args4j.ArgsApp.run(ArgsApp.java:46) at au.csiro.sparkle.cmd.CmdApp.runApp(CmdApp.java:9) at au.csiro.sparkle.cmd.CmdApp.runApp(CmdApp.java:18) at au.csiro.sparkle.cmd.MultiCmdApp.runCommandOrClass(MultiCmdApp.java:58) at au.csiro.sparkle.cmd.MultiCmdApp.run(MultiCmdApp.java:54) at au.csiro.sparkle.cmd.CmdApp.runApp(CmdApp.java:9) at au.csiro.sparkle.cmd.CmdApp.runApp(CmdApp.java:18) at au.csiro.pbdava.ssparkle.common.arg4j.AppRunner$.mains(AppRunner.scala:17) at au.csiro.variantspark.cli.VariantSparkApp$.main(VariantSparkApp.scala:26) at au.csiro.variantspark.cli.VariantSparkApp.main(VariantSparkApp.scala)
This is because we can output trained models as json, but currently don't handle json format for input models.
I suggest creating a ModelInputArgs
to mirror ModelOutputArgs
, and add support for reading regular json files as an instance or RandomForestModel
.