popstrat
popstrat copied to clipboard
how to run "now"?
This is a fascinating set of ideas. However, with
%vjcair> *1/bin/sparkling-shell
-----
Spark master (MASTER) : local-cluster[3,2,2048]
Spark home (SPARK_HOME) : /Users/stvjc/Research/SPARK/spark-1.5.2-bin-hadoop2.6
----
15/12/06 10:26:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.5.2
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67)
Type in expressions to have them evaluated.
we have
scala> package com.neilferguson
<console>:1: error: illegal start of definition
package com.neilferguson
^
scala>
scala> import hex.FrameSplitter
<console>:19: error: missing arguments for method hex in object functions;
follow this method with `_' if you want to treat it as a partially applied function
import hex.FrameSplitter
I understand that if we use docker or specific EC2 setups we can get your results. But what about a standalone deployment of spark+H2O+ adam? Is it possible to define and maintain the example scripts so that a fixed source+data+environment on a linux machine with specified endowments can run the example?
Hi
I haven't tried sparkling-shell. It should be easy to get the program running though. Did you try running the following?
YOUR_SPARK_HOME/bin/spark-submit --class "com.neilferguson.PopStrat" --master local[6] --driver-memory 6G target/uber-popstrat-0.1-SNAPSHOT.jar <genotypesfile> <panelfile>
This should run the program using a local spark cluster. I haven't tried any other method, but you should be able to change "local[6]" to the address of a remote cluster if you like, and it should still work.
Thanks for getting back to me. Is there a specific approach to transforming the VCF to adam that needs to be used? I have progressed beyond the previously reported error but having created the adam transformation of chr22 vcf into STORE.adam that has a _SUCCESS file we have "Can not read value at 0 in block 0 ..."
${SPARK_HOME}/bin/spark-submit --class "com.neilferguson.PopStrat" --master local[2] --driver-memory 6G target/uber-popstrat-0.1-SNAPSHOT.jar ./STORE.adam integrated_call_samples_v3.20130502.ALL.panel
2015-12-18 08:09:24 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-12-18 08:09:26 WARN MetricsSystem:71 - Using default name DAGScheduler for source because spark.app.id is not set.
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2015-12-18 08:09:31 ERROR Executor:96 - Exception in task 1.0 in stage 0.0 (TID 1)
parquet.io.ParquetDecodingException: Can not read value at 0 in block 0 in file file:/Users/stvjc/Research/ADAM/STORE.adam/part-r-00001.snappy.parquet
On Fri, Dec 18, 2015 at 6:44 AM, Neil Ferguson [email protected] wrote:
Hi
I haven't tried sparkling-shell. It should be easy to get the program running though. Did you try running the following?
YOUR_SPARK_HOME/bin/spark-submit --class "com.neilferguson.PopStrat" --master local[6] --driver-memory 6G target/uber-popstrat-0.1-SNAPSHOT.jar
This should run the program using a local spark cluster. I haven't tried anything other method, but you should be able to change "local[6]" to the address of a remote cluster if you like, and it should still work.
— Reply to this email directly or view it on GitHub https://github.com/nfergu/popstrat/issues/1#issuecomment-165758428.
That's weird - I haven't seen that error before. Have you tried using the VCF file directly instead of converting to ADAM first?
The only thing that occurs to be about the error is that it might be a permissions issue. Does the user that Spark is running as have access to the files in the STORE.adam directory?
Also, in answer to your question, I'm pretty sure I just used the vcf2adam command to transform the VCF file to ADAM, without any special options. I've just realised that the popstrat docs say to use the transform command, but I'm pretty sure that's wrong.
OK, made a lot more progress with pure VCF. However, after quite a bit of churning
Found 255 samples
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.package$.StructType()Lorg/apache/spark/sql/catalyst/types/StructType$;
at com.neilferguson.PopStrat$.main(PopStrat.scala:99)
at com.neilferguson.PopStrat.main(PopStrat.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
On Mon, Dec 21, 2015 at 5:38 PM, Neil Ferguson [email protected] wrote:
That's weird - I haven't seen that error before. Have you tried using the VCF file directly instead of converting to ADAM first?
The only thing that occurs to be about the error is that it might be a permissions issue. Does the user that Spark is running as have access to the files in the STORE.adam directory?
— Reply to this email directly or view it on GitHub https://github.com/nfergu/popstrat/issues/1#issuecomment-166443445.
My guess is that is an incompatibilty with the latest Spark version. You could try an old Spark version (I've only tested with 1.2.0), or if you want to wait I'll try and get round to testing and fixing with the latest Spark at some point.
Hey,
Did you actually worked on a version compatible to spark 1.6? I was trying myself but since I'm pretty new to Scala and Spark I failed debugging an execution error.
Thanks,
Leo.
@leomeloj
I upgraded and improved similar approach with Spark 2.2.1 and pretty recent versions of Adam and H2O. You can try as well.
But I had to make some changes for the later versions of these libraries/frameworks. Take a look at the code snippet of my book titled "Scala Machine Learning Projects" with Packt 2018.
Link: https://github.com/PacktPublishing/Scala-Machine-Learning-Projects/tree/master/Chapter04/PopulationClustering_v2
<properties>
<spark.version>2.2.1</spark.version>
<scala.version>2.11.8</scala.version>
<h2o.version>3.16.0.2</h2o.version>
<sparklingwater.version>2.2.6</sparklingwater.version>
<adam.version>0.23.0</adam.version>
<properties>