almond Spark on kubernetes

I am testing Spark on kubernetes, launched through almond jupyter kernel. Conclusion I have for now is that pure dataframe operation work out of the box (assuming http file system is installed), however any lambda functions seems to fail. I noticed that Spark shell is using spark:// protocol for REPL class and almond is using http protocol:

2019-03-27 19:10:46 INFO Executor:54 - Using REPL class URI: http://xxx.xxx.xxx.xxx:xxxx

that's why you need hadoop http filesystem (added in 2.9.x)

However there is a problem with that may prevent almond from fully operating:

2019-03-27 19:10:47 ERROR ExecutorClassLoader:91 - Failed to check existence of class ammonite.$sess.cmd6$Helper$$anonfun$2 on REPL class server at http://xxx.xxx.xxx.xxx:xxxx
java.lang.IllegalArgumentException: Can not create a Path from an empty string
	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:163)
	at org.apache.hadoop.fs.Path.<init>(Path.java:175)
	at org.apache.hadoop.fs.Path.<init>(Path.java:110)
	at org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromFileSystem(ExecutorClassLoader.scala:115)

and then any lambda function inside spark map, etc fail with following exception:

2019-03-27 19:10:47 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2287)

Mar 27 '19 19:03 elyast

I was wondering if anyone has tried it and knows any workaround

Mar 27 '19 19:03 elyast

Is Kubernetes supported on jupyter almond kernel?

May 06 '19 13:05 robbyki

I am having the same issue. I am running kubernetes ERROR ExecutorClassLoader: Failed to check existence of class ammonite.$sess.cmd32$Helper$Addres on REPL class server at http://10.xx.0.xxx:xxxxxx

I assume it's the metabrowser. Need a way to set that port so that it would be possible to set a container port on the jupyter container so that spark workers can access the browser. It's in the code, but I can't find a way to set it without having to rebuild almond

Jul 31 '21 15:07 bitWeaver-arch

I did more investigating on this. Was able to provide firewall access for the random port using a network policy (I run spark on kubernetes) and was able to connect. But instead of not being able to check the existence of the class it gets an empty string. So same error, slightly different

org.apache.spark.repl.RemoteClassLoaderError: ammonite.$sess.cmd7$Helper

But if I use a local master with .master("local[*]") it works fine. For some reason the REPL class server returns empty strings when using a remote master.

Also tried loading different versions of ammonite and spark, and different version of scala. Once all the versions are lined up I get the same error every time.

Aug 02 '21 17:08 bitWeaver-arch

I'm also having the exact same issue with almond. Once I used local spark, it works perfectly. However, when I tried to use a remote master, it says the same error that @bitWeaver-arch mentioned above.

Aug 03 '21 12:08 onrylmz

Another update: It doesn't seem to be an almond issue. I started and ammnite shell and get the same results. Something interesting though, maybe useful

ERROR TaskResultGetter: Could not deserialize TaskEndReason: ClassNotFound with classloader ammonite.runtime.SpecialClassLoader@1ee29c84
@ ammonite.runtime.SpecialClassLoader 
res18: runtime.SpecialClassLoader.type = ammonite.runtime.SpecialClassLoader$@e156110

I wonder if it's normal that the resource ids are different

Aug 04 '21 12:08 bitWeaver-arch

almond almond copied to clipboard

Spark on kubernetes

almond
almond copied to clipboard