spookystuff icon indicating copy to clipboard operation
spookystuff copied to clipboard

Sample example which does not works

Open fabiofumarola opened this issue 9 years ago • 15 comments

The library looks interesting. I tried a simple example with a sample app but I got the following error



[error] (run-main-0) org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 3.0 failed 1 times, most recent failure: Lost task 5.0 in stage 3.0 (TID 29, localhost): java.lang.NullPointerException
[error]     at com.tribbloids.spookystuff.utils.Utils$.uriSlash(Utils.scala:55)
[error]     at com.tribbloids.spookystuff.utils.Utils$$anonfun$uriConcat$1.apply(Utils.scala:49)
[error]     at com.tribbloids.spookystuff.utils.Utils$$anonfun$uriConcat$1.apply(Utils.scala:48)
[error]     at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
[error]     at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
[error]     at com.tribbloids.spookystuff.utils.Utils$.uriConcat(Utils.scala:48)
[error]     at com.tribbloids.spookystuff.pages.PageUtils$.autoRestore(PageUtils.scala:183)
[error]     at com.tribbloids.spookystuff.actions.TraceView$$anonfun$4.apply(TraceView.scala:95)
[error]     at com.tribbloids.spookystuff.actions.TraceView$$anonfun$4.apply(TraceView.scala:95)
[error]     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[error]     at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
[error]     at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
[error]     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
[error]     at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
[error]     at scala.collection.AbstractTraversable.map(Traversable.scala:105)
[error]     at com.tribbloids.spookystuff.actions.TraceView.fetchOnce(TraceView.scala:95)
[error]     at com.tribbloids.spookystuff.actions.TraceView$$anonfun$2.apply(TraceView.scala:83)
[error]     at com.tribbloids.spookystuff.actions.TraceView$$anonfun$2.apply(TraceView.scala:83)
[error]     at scala.util.Try$.apply(Try.scala:161)
[error]     at com.tribbloids.spookystuff.utils.Utils$.retry(Utils.scala:22)
[error]     at com.tribbloids.spookystuff.actions.TraceView.fetch(TraceView.scala:82)
[error]     at com.tribbloids.spookystuff.sparkbinding.PageRowRDD$$anonfun$26.apply(PageRowRDD.scala:491)
[error]     at com.tribbloids.spookystuff.sparkbinding.PageRowRDD$$anonfun$26.apply(PageRowRDD.scala:490)
[error]     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[error]     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[error]     at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
[error]     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[error]     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[error]     at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
[error]     at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:99)
[error]     at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73)
[error]     at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
[error]     at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
[error]     at org.apache.spark.scheduler.Task.run(Task.scala:88)
[error]     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
[error]     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[error]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[error]     at java.lang.Thread.run(Thread.java:745)

The application is pretty simple

object SimpleApp {

  def main(args: Array[String]) {
    val conf = new SparkConf().setMaster("local[*]").setAppName("Test")
    val sc = new SparkContext(conf)
    val spooky = new com.tribbloids.spookystuff.SpookyContext(sc)
    import spooky.dsl._

    val df = spooky.wget("https://news.google.com/?output=rss&q=barack%20obama"
    ).join(S"item title".texts)(
      Wget(x"http://api.mymemory.translated.net/get?q=${'A}&langpair=en|fr")
    )('A ~ 'title, S"translatedText".text ~ 'translated).toDF()


    val csv = df.toCSV()

    csv.foreach(println)
  }
}

Do you have any ideas?

fabiofumarola avatar Jan 10 '16 11:01 fabiofumarola

Could you be interested in help for the library development?

fabiofumarola avatar Jan 10 '16 11:01 fabiofumarola

Hallo, I have a similar issue. I tried to run the sample app, but I got the following error:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.Accumulator.<init>(Ljava/lang/Object;Lorg/apache/spark/AccumulatorParam;Lscala/Option;)V
    at com.tribbloids.spookystuff.Metrics$.accumulator(SpookyContext.scala:20)
    at com.tribbloids.spookystuff.Metrics$.$lessinit$greater$default$1(SpookyContext.scala:25)
    at com.tribbloids.spookystuff.SpookyContext.<init>(SpookyContext.scala:68)
    at com.tribbloids.spookystuff.SpookyContext.<init>(SpookyContext.scala:72)
    at FTest$.main(FTest.scala:15)
    at FTest.main(FTest.scala)
16/08/18 11:48:16 INFO SparkContext: Invoking stop() from shutdown hook
16/08/18 11:48:16 INFO SparkUI: Stopped Spark web UI at http://127.0.1.1:4040

The application has the following code:

object FTest {
    def main(args: Array[String]) {
    //val logFile = "/home/ait/spark/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application").setMaster("local[*]")
    val sc = new SparkContext(conf)
    assert(sc.parallelize(1 to 100).reduce(_ + _) == 5050)
    val spooky = new SpookyContext(sc)
    import spooky.dsl._
    spooky.wget("https://news.google.com/?output=rss&q=barack%20obama").join(S"item title".texts)(
      Wget(x"http://api.mymemory.translated.net/get?q=${'A}&langpair=en|fr"))('A ~ 'title, S"translatedText".text ~ 'translated).toDF()
  }
}

Could it be because of a wrong configuration? Furthermore, I loaded all the jar files I need in my IDE, so spark should be working. So can you help me or give me hint why this error occured?

Thx in advance.

DominikRoy avatar Aug 18 '16 10:08 DominikRoy

Hello @DominikRoy this seems to me a version incompatibility. Use Spark dependencies of correct versions supported by your spookystuff version.

fahadsiddiqui avatar Oct 10 '16 16:10 fahadsiddiqui

I'm getting the same error and I'm wondering what version of spark I should be using? I don't see this specified in the documentation.

Currently I'm trying spark 1.6.2 with scala 2.10.5 trying to use com.tribbloids.spookystuff:spookystuff-core:0.3.2

Have also tried with spark 2.1.1 (scala 2.11) but that broke even sooner.

what version of spark works?

also get the same error with spark 1.5.1

nimbusgo avatar Nov 15 '17 19:11 nimbusgo

can you use the master branch (0.4.0-SNAPSHOT) by compiling on your computer? Sorry about releasing very slowly, some other components are still not close to feature freeze.

0.3.2 is very old and out of maintenance. Yours Peng

On Wed, 15 Nov, 2017 at 2:32 PM, nimbusgo [email protected] wrote:

I'm getting the same error and I'm wondering what version of spark I should be using? I don't see this specified in the documentation.

Currently I'm trying spark 1.6.2 with scala 2.10.5 trying to use com.tribbloids.spookystuff:spookystuff-core:0.3.2

Have also tried with spark 2.1.1 (scala 2.11) but that broke even sooner.

what version of spark works?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

tribbloid avatar Nov 15 '17 19:11 tribbloid

what version of spark would you recommend I use with 0.4.0-SNAPSHOT?

also, should I just be adding it via

spark-shell --jars spookystuff-core-0.4.0-SNAPSHOT.jar for example or do I need to include more?

nimbusgo avatar Nov 15 '17 19:11 nimbusgo

currently attempted (including spookystuff-core-0.4.0-SNAPSHOT.jar ) on spark 1.5.1 and 1.6.2 and I get an error when attempting this:

import com.tribbloids.spookystuff.actions._
import com.tribbloids.spookystuff.dsl._
import com.tribbloids.spookystuff.SpookyContext

//this is the entry point of all queries & configurations
val spooky = SpookyContext(sc)

errors with:

error: bad symbolic reference. A signature in AbstractConf.class refers to term dsl
in package org.apache.spark.ml which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling AbstractConf.class.
error: bad symbolic reference. A signature in AbstractConf.class refers to term utils
in value org.apache.spark.ml.dsl which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling AbstractConf.class.
<console>:36: error: bad symbolic reference. A signature in AbstractConf.class refers to term messaging
in value org.apache.spark.ml.utils which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling AbstractConf.class.
         val spooky = SpookyContext(sc)

org.apache.spark.ml is present, but I'm not sure why it's expecting org.apache.spark.ml.dsl to exist

nimbusgo avatar Nov 15 '17 20:11 nimbusgo

The version for 0.4.0-SNAPSHOT is Spark 1.6.3 Yours Peng

On Wed, 15 Nov, 2017 at 3:05 PM, nimbusgo [email protected] wrote:

currently attempted on 1.5.1 and 1.6.2 and I get an error when attempting this:

import com.tribbloids.spookystuff.actions._ import com.tribbloids.spookystuff.dsl._ import com.tribbloids.spookystuff.SpookyContext

//this is the entry point of all queries & configurations val spooky = SpookyContext(sc) errors with:

error: bad symbolic reference. A signature in AbstractConf.class refers to term dsl in package org.apache.spark.ml which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling AbstractConf.class. error: bad symbolic reference. A signature in AbstractConf.class refers to term utils in value org.apache.spark.ml.dsl which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling AbstractConf.class. :36: error: bad symbolic reference. A signature in AbstractConf.class refers to term messaging in value org.apache.spark.ml.utils which is not available. It may be completely missing from the current classpath, or the version on the classpath might be incompatible with the version used when compiling AbstractConf.class. val spooky = SpookyContext(sc) — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

tribbloid avatar Nov 15 '17 20:11 tribbloid

so, after some guesswork it looks like I should probably be including spookystuff-assembly-0.4.0-SNAPSHOT-spark1.6.jar which I'm doing now.

currently got this happening now: java.lang.UnsupportedClassVersionError: com/tribbloids/spookystuff/session/python/PythonProcess : Unsupported major.minor version 52.0

guessing it's got something to do with java/py4j version inconsistencies

nimbusgo avatar Nov 15 '17 20:11 nimbusgo

I think its just Java version (you are using java 7). py4j shouldn't be in my dependency list. Yours Peng

On Wed, 15 Nov, 2017 at 3:37 PM, nimbusgo [email protected] wrote:

so, after some guesswork it looks like I should probably be including spookystuff-assembly-0.4.0-SNAPSHOT-spark1.6.jar which I'm doing now.

currently got this happening now: java.lang.UnsupportedClassVersionError: com/tribbloids/spookystuff/session/python/PythonProcess : Unsupported major.minor version 52.0

guessing it's got something to do with java/py4j version inconsistencies

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

tribbloid avatar Nov 15 '17 21:11 tribbloid

... or install it into your maven local repository using mvn install. Yours Peng

On Wed, 15 Nov, 2017 at 3:37 PM, nimbusgo [email protected] wrote:

so, after some guesswork it looks like I should probably be including spookystuff-assembly-0.4.0-SNAPSHOT-spark1.6.jar which I'm doing now.

currently got this happening now: java.lang.UnsupportedClassVersionError: com/tribbloids/spookystuff/session/python/PythonProcess : Unsupported major.minor version 52.0

guessing it's got something to do with java/py4j version inconsistencies

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

tribbloid avatar Nov 15 '17 21:11 tribbloid

Thanks for the advice, was able to get it functioning on the cluster without version errors now.

I'm a little new to the library syntax and it seems the quickstart example is a little out of date for 0.4.0-SNAPSHOT.

When executing this:

spooky.wget("https://news.google.com/?output=rss&q=barack%20obama").join(S"item title".texts){
    Wget(x"http://api.mymemory.translated.net/get?q=${'A}&langpair=en|fr")
}('A ~ 'title, S"translatedText".text ~ 'translated).toDF()

I get this error:

error: com.tribbloids.spookystuff.rdd.FetchedDataset does not take parameters
       }('A ~ 'title, S"translatedText".text ~ 'translated).toDF()

Are there any quickstart examples that work for 0.4.0-SNAPSHOT that I can take a look at?

nimbusgo avatar Nov 16 '17 21:11 nimbusgo

Yeah, a lot, can't help, better algorithms keep poping all the time.

I recommend you to refer to the test cases in integration submodule.

It serves as a short example to crawl this dummy website: http://webscraper.io/test-sites. Yours Peng

On Thu, 16 Nov, 2017 at 4:31 PM, nimbusgo [email protected] wrote:

Thanks for the advice, was able to get it functioning on the cluster without version errors now.

I'm a little new to the library syntax and it seems the quickstart example is a little out of date for 0.4.0-SNAPSHOT.

When executing this:

spooky.wget("https://news.google.com/?output=rss&q=barack%20obama").join(S"item title".texts){

Wget(x"http://api.mymemory.translated.net/get?q=${'A}&langpair=en|fr") }('A ~ 'title, S"translatedText".text ~ 'translated).toDF() I get this error:

error: com.tribbloids.spookystuff.rdd.FetchedDataset does not take parameters }('A ~ 'title, S"translatedText".text ~ 'translated).toDF() Are there any quickstart examples that work for 0.4.0-SNAPSHOT that I can take a look at?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

tribbloid avatar Nov 16 '17 21:11 tribbloid

Its been 13 days, should I close it?

tribbloid avatar Nov 30 '17 08:11 tribbloid

org.apache.spark.ml is present, but I'm not sure why it's expecting org.apache.spark.ml.dsl to exist

After check the code source, I see org.apache.spark.ml.dsl is a package contained in the directory mldsl/ of project. You can publish the source code in your local repository, and include into your spark-shell

~/.m2/repository/com/tribbloids/spookystuff/spookystuff-mldsl/0.7.0-SNAPSHOT/spookystuff-mldsl-0.7.0-SNAPSHOT.jar

The module mldsl should be published on maven repository, and add as dependancy in documentation page! spookystuff is a interested project, but if the getting start example doesn't work. It can dissuade many developer to use it. It is urgent to update the documentation website. Where we can modify the documentation page? @tribbloid

dev590t avatar Dec 13 '20 09:12 dev590t