popstrat icon indicating copy to clipboard operation
popstrat copied to clipboard

how to run "now"?

Open vjcitn opened this issue 9 years ago • 8 comments

This is a fascinating set of ideas. However, with

%vjcair> *1/bin/sparkling-shell

-----
  Spark master (MASTER)     : local-cluster[3,2,2048]
  Spark home   (SPARK_HOME) : /Users/stvjc/Research/SPARK/spark-1.5.2-bin-hadoop2.6
----

15/12/06 10:26:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2
      /_/

Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67)
Type in expressions to have them evaluated.

we have

scala> package com.neilferguson
<console>:1: error: illegal start of definition
       package com.neilferguson
       ^

scala> 

scala> import hex.FrameSplitter
<console>:19: error: missing arguments for method hex in object functions;
follow this method with `_' if you want to treat it as a partially applied function
       import hex.FrameSplitter

I understand that if we use docker or specific EC2 setups we can get your results. But what about a standalone deployment of spark+H2O+ adam? Is it possible to define and maintain the example scripts so that a fixed source+data+environment on a linux machine with specified endowments can run the example?

vjcitn avatar Dec 06 '15 22:12 vjcitn

Hi

I haven't tried sparkling-shell. It should be easy to get the program running though. Did you try running the following?

YOUR_SPARK_HOME/bin/spark-submit --class "com.neilferguson.PopStrat" --master local[6] --driver-memory 6G target/uber-popstrat-0.1-SNAPSHOT.jar <genotypesfile> <panelfile>

This should run the program using a local spark cluster. I haven't tried any other method, but you should be able to change "local[6]" to the address of a remote cluster if you like, and it should still work.

nfergu avatar Dec 18 '15 11:12 nfergu

Thanks for getting back to me. Is there a specific approach to transforming the VCF to adam that needs to be used? I have progressed beyond the previously reported error but having created the adam transformation of chr22 vcf into STORE.adam that has a _SUCCESS file we have "Can not read value at 0 in block 0 ..."

${SPARK_HOME}/bin/spark-submit --class "com.neilferguson.PopStrat" --master local[2] --driver-memory 6G target/uber-popstrat-0.1-SNAPSHOT.jar ./STORE.adam integrated_call_samples_v3.20130502.ALL.panel

2015-12-18 08:09:24 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

2015-12-18 08:09:26 WARN MetricsSystem:71 - Using default name DAGScheduler for source because spark.app.id is not set.

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".

SLF4J: Defaulting to no-operation (NOP) logger implementation

SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

2015-12-18 08:09:31 ERROR Executor:96 - Exception in task 1.0 in stage 0.0 (TID 1)

parquet.io.ParquetDecodingException: Can not read value at 0 in block 0 in file file:/Users/stvjc/Research/ADAM/STORE.adam/part-r-00001.snappy.parquet

On Fri, Dec 18, 2015 at 6:44 AM, Neil Ferguson [email protected] wrote:

Hi

I haven't tried sparkling-shell. It should be easy to get the program running though. Did you try running the following?

YOUR_SPARK_HOME/bin/spark-submit --class "com.neilferguson.PopStrat" --master local[6] --driver-memory 6G target/uber-popstrat-0.1-SNAPSHOT.jar

This should run the program using a local spark cluster. I haven't tried anything other method, but you should be able to change "local[6]" to the address of a remote cluster if you like, and it should still work.

— Reply to this email directly or view it on GitHub https://github.com/nfergu/popstrat/issues/1#issuecomment-165758428.

vjcitn avatar Dec 18 '15 13:12 vjcitn

That's weird - I haven't seen that error before. Have you tried using the VCF file directly instead of converting to ADAM first?

The only thing that occurs to be about the error is that it might be a permissions issue. Does the user that Spark is running as have access to the files in the STORE.adam directory?

nfergu avatar Dec 21 '15 22:12 nfergu

Also, in answer to your question, I'm pretty sure I just used the vcf2adam command to transform the VCF file to ADAM, without any special options. I've just realised that the popstrat docs say to use the transform command, but I'm pretty sure that's wrong.

nfergu avatar Dec 21 '15 22:12 nfergu

OK, made a lot more progress with pure VCF. However, after quite a bit of churning

Found 255 samples

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.package$.StructType()Lorg/apache/spark/sql/catalyst/types/StructType$;

at com.neilferguson.PopStrat$.main(PopStrat.scala:99)

at com.neilferguson.PopStrat.main(PopStrat.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:674)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

On Mon, Dec 21, 2015 at 5:38 PM, Neil Ferguson [email protected] wrote:

That's weird - I haven't seen that error before. Have you tried using the VCF file directly instead of converting to ADAM first?

The only thing that occurs to be about the error is that it might be a permissions issue. Does the user that Spark is running as have access to the files in the STORE.adam directory?

— Reply to this email directly or view it on GitHub https://github.com/nfergu/popstrat/issues/1#issuecomment-166443445.

vjcitn avatar Dec 22 '15 01:12 vjcitn

My guess is that is an incompatibilty with the latest Spark version. You could try an old Spark version (I've only tested with 1.2.0), or if you want to wait I'll try and get round to testing and fixing with the latest Spark at some point.

nfergu avatar Dec 22 '15 14:12 nfergu

Hey,

Did you actually worked on a version compatible to spark 1.6? I was trying myself but since I'm pretty new to Scala and Spark I failed debugging an execution error.

Thanks,

Leo.

leomeloj avatar Jul 20 '16 19:07 leomeloj

@leomeloj

I upgraded and improved similar approach with Spark 2.2.1 and pretty recent versions of Adam and H2O. You can try as well.

But I had to make some changes for the later versions of these libraries/frameworks. Take a look at the code snippet of my book titled "Scala Machine Learning Projects" with Packt 2018.

Link: https://github.com/PacktPublishing/Scala-Machine-Learning-Projects/tree/master/Chapter04/PopulationClustering_v2

<properties>
	<spark.version>2.2.1</spark.version>
	<scala.version>2.11.8</scala.version>
	<h2o.version>3.16.0.2</h2o.version>
	<sparklingwater.version>2.2.6</sparklingwater.version>
	<adam.version>0.23.0</adam.version>
<properties>

rezacsedu avatar Sep 22 '18 17:09 rezacsedu