training
training copied to clipboard
SparkStreaming in ampcamp4 is not working.
I followed instruction in ampcamp4 and launched amazon ec2 instances. I tried to run Spark Streaming but it was not working. I suffered from the problem that scala version mismatch to requirement in sbt. Even if I fixed scala version problem. example of Spark Streaming did not work in ec2 instances.
line 3 in build.sbt
scalaVersion := "2.10"
should be
scalaVersion := "2.10.3"
I pasted two errors. First one is the error due to scala version when I complied Spark Streaming. Second one is happened when I ran Spark Streaming.
[root@ip-172-31-43-168 scala]# sbt/sbt package
[info] Set current project to Tutorial (in build file:/root/training/streaming/scala/)
[info] Updating {file:/root/training/streaming/scala/}scala...
[info] Resolving org.scala-lang#scala-library;2.10 ...
[warn] module not found: org.scala-lang#scala-library;2.10
[warn] ==== local: tried
[warn] /root/.ivy2/local/org.scala-lang/scala-library/2.10/ivys/ivy.xml
[warn] ==== public: tried
[warn] http://repo1.maven.org/maven2/org/scala-lang/scala-library/2.10/scala-library-2.10.pom
[info] Resolving org.scala-lang#scala-compiler;2.10 ...
[warn] module not found: org.scala-lang#scala-compiler;2.10
[warn] ==== local: tried
[warn] /root/.ivy2/local/org.scala-lang/scala-compiler/2.10/ivys/ivy.xml
[warn] ==== public: tried
[warn] http://repo1.maven.org/maven2/org/scala-lang/scala-compiler/2.10/scala-compiler-2.10.pom
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: UNRESOLVED DEPENDENCIES ::
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
[warn] :: org.scala-lang#scala-library;2.10: not found
[warn] :: org.scala-lang#scala-compiler;2.10: not found
[warn] ::::::::::::::::::::::::::::::::::::::::::::::
sbt.ResolveException: unresolved dependency: org.scala-lang#scala-library;2.10: not found
unresolved dependency: org.scala-lang#scala-compiler;2.10: not found
at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:213)
at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:122)
at sbt.IvyActions$$anonfun$update$1.apply(IvyActions.scala:121)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:116)
at sbt.IvySbt$Module$$anonfun$withModule$1.apply(Ivy.scala:116)
at sbt.IvySbt$$anonfun$withIvy$1.apply(Ivy.scala:104)
at sbt.IvySbt.sbt$IvySbt$$action$1(Ivy.scala:51)
at sbt.IvySbt$$anon$3.call(Ivy.scala:60)
at xsbt.boot.Locks$GlobalLock.withChannel$1(Locks.scala:98)
at xsbt.boot.Locks$GlobalLock.xsbt$boot$Locks$GlobalLock$$withChannelRetries$1(Locks.scala:81)
at xsbt.boot.Locks$GlobalLock$$anonfun$withFileLock$1.apply(Locks.scala:102)
at xsbt.boot.Using$.withResource(Using.scala:11)
at xsbt.boot.Using$.apply(Using.scala:10)
at xsbt.boot.Locks$GlobalLock.ignoringDeadlockAvoided(Locks.scala:62)
at xsbt.boot.Locks$GlobalLock.withLock(Locks.scala:52)
at xsbt.boot.Locks$.apply0(Locks.scala:31)
at xsbt.boot.Locks$.apply(Locks.scala:28)
at sbt.IvySbt.withDefaultLogger(Ivy.scala:60)
at sbt.IvySbt.withIvy(Ivy.scala:101)
at sbt.IvySbt.withIvy(Ivy.scala:97)
at sbt.IvySbt$Module.withModule(Ivy.scala:116)
at sbt.IvyActions$.update(IvyActions.scala:121)
at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1161)
at sbt.Classpaths$$anonfun$sbt$Classpaths$$work$1$1.apply(Defaults.scala:1159)
at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$73.apply(Defaults.scala:1182)
at sbt.Classpaths$$anonfun$doWork$1$1$$anonfun$73.apply(Defaults.scala:1180)
at sbt.Tracked$$anonfun$lastOutput$1.apply(Tracked.scala:35)
at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1184)
at sbt.Classpaths$$anonfun$doWork$1$1.apply(Defaults.scala:1179)
at sbt.Tracked$$anonfun$inputChanged$1.apply(Tracked.scala:45)
at sbt.Classpaths$.cachedUpdate(Defaults.scala:1187)
at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1152)
at sbt.Classpaths$$anonfun$updateTask$1.apply(Defaults.scala:1130)
at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47)
at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:42)
at sbt.std.Transform$$anon$4.work(System.scala:64)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:237)
at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:237)
at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18)
at sbt.Execute.work(Execute.scala:244)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:237)
at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:237)
at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:160)
at sbt.CompletionService$$anon$2.call(CompletionService.scala:30)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[error] (*:update) sbt.ResolveException: unresolved dependency: org.scala-lang#scala-library;2.10: not found
[error] unresolved dependency: org.scala-lang#scala-compiler;2.10: not found
[error] Total time: 8 s, completed May 5, 2014 7:43:59 AM
This was happened when I ran Spark Streaming example
14/05/05 08:08:15 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/05/05 08:08:16 INFO Remoting: Starting remoting
14/05/05 08:08:16 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:54899]
14/05/05 08:08:16 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:54899]
14/05/05 08:08:16 INFO server.Server: jetty-7.6.8.v20121106
14/05/05 08:08:16 INFO server.AbstractConnector: Started [email protected]:50072
14/05/05 08:08:16 INFO server.Server: jetty-7.6.8.v20121106
14/05/05 08:08:16 INFO server.AbstractConnector: Started [email protected]:46682
14/05/05 08:08:17 INFO server.Server: jetty-7.6.8.v20121106
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/stage,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/pool,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/environment,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/executors,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/metrics/json,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/static,null}
14/05/05 08:08:17 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/,null}
14/05/05 08:08:17 INFO server.AbstractConnector: Started [email protected]:4040
-------------------------------------------
Time: 1399277300000 ms
-------------------------------------------
-------------------------------------------
Time: 1399277301000 ms
-------------------------------------------
-------------------------------------------
Time: 1399277302000 ms
-------------------------------------------
14/05/05 08:08:22 WARN scheduler.TaskSetManager: Lost TID 70 (task 2.0:0)
14/05/05 08:08:22 WARN scheduler.TaskSetManager: Loss was due to java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1706)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1344)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:500)
at org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145)
at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195)
at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/05/05 08:08:22 WARN scheduler.TaskSetManager: Lost TID 71 (task 2.0:0)
14/05/05 08:08:22 WARN scheduler.TaskSetManager: Lost TID 72 (task 2.0:0)
14/05/05 08:08:22 WARN scheduler.TaskSetManager: Lost TID 73 (task 2.0:0)
14/05/05 08:08:22 ERROR scheduler.TaskSetManager: Task 2.0:0 failed 4 times; aborting job
[error] (Thread-39) org.apache.spark.SparkException: Job aborted: Task 2.0:0 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver)
org.apache.spark.SparkException: Job aborted: Task 2.0:0 failed 4 times (most recent failure: Exception failure: java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[trace] Stack trace suppressed: run last compile:run for the full output.
I have the same java.lang.ClassNotFoundException: org.apache.spark.streaming.twitter.TwitterReceiver issue now that I fixed the scalaVersion := "2.10.3" thing.
Same here. Just getting blank tweets.
Same here. Not sure what is wrong. If I changed the sparkUrl to "local[4]", it works perfectly.
I've got the same issue.
I got a bit farther along. I edited the build.sbt: I changed:
scalaVersion := "2.10.4"
and: "org.apache.spark" %% "spark-streaming" % "0.9.0-incubating", "org.apache.spark" %% "spark-streaming-twitter" % "0.9.0-incubating" to: "org.apache.spark" %% "spark-streaming" % "1.1.0", "org.apache.spark" %% "spark-streaming-twitter" % "1.1.0"
I get a few warnings but not classpath errors. The app runs but it's still just getting blank tweets.
Sorry, that should have been: scalaVersion := "2.10.3" That's the version reported when I run ./spark-shell If I let the app run long enough, 15 seconds or so, I'll get the line: WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory Googling that brings up all kinds of stuff, some of which say it's a red herring. All of my slaves have 2gigs, which is the amp-camp default.