jupyterlab-sparkmonitor icon indicating copy to clipboard operation
jupyterlab-sparkmonitor copied to clipboard

Error creating PySpark Job

Open uday1889 opened this issue 4 years ago • 1 comments

I'm trying to start a PySpark job on an AWS EMR instance with all other configuration parameters set to default. I've also installed the jupyterlab-sparkmonitor extension. Note: The spark cluster setup works fine otherwise

config = {'spark.extraListeners': 'sparkmonitor.listener.JupyterSparkMonitorListener',
                'spark.driver.extraClassPath': 'some/path/miniconda3/envs/py_env/lib/python3.7/site-packages/sparkmonitor/listener.jar'}
config = SparkConf().setAll([(param, value) for param, value in config.items()])
    
spark_session = SparkSession \
         .builder.master("yarn") \
         .config(conf = config) \
        .appName(appName) \
        .getOrCreate()

This is the error that results:

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Exception when registering SparkListener
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2398)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: sparkmonitor.listener.JupyterSparkMonitorListener
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2682)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2680)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
	at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2680)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2387)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2386)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2386)

Should I be referencing the Listener class differently? I'm not entirely sure how to work towards a fix here. Could you please help me investigate this? Thanks in advance!

uday1889 avatar May 19 '20 15:05 uday1889

Same problem.

dfdf avatar Jun 23 '20 20:06 dfdf