almond
almond copied to clipboard
Spark on kubernetes
I am testing Spark on kubernetes, launched through almond jupyter kernel.
Conclusion I have for now is that pure dataframe operation work out of the box (assuming http file system is installed), however any lambda functions seems to fail.
I noticed that Spark shell is using spark://
protocol for REPL class and almond is using http protocol:
2019-03-27 19:10:46 INFO Executor:54 - Using REPL class URI: http://xxx.xxx.xxx.xxx:xxxx
that's why you need hadoop http filesystem (added in 2.9.x)
However there is a problem with that may prevent almond from fully operating:
2019-03-27 19:10:47 ERROR ExecutorClassLoader:91 - Failed to check existence of class ammonite.$sess.cmd6$Helper$$anonfun$2 on REPL class server at http://xxx.xxx.xxx.xxx:xxxx
java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:163)
at org.apache.hadoop.fs.Path.<init>(Path.java:175)
at org.apache.hadoop.fs.Path.<init>(Path.java:110)
at org.apache.spark.repl.ExecutorClassLoader.org$apache$spark$repl$ExecutorClassLoader$$getClassFileInputStreamFromFileSystem(ExecutorClassLoader.scala:115)
and then any lambda function inside spark map, etc fail with following exception:
2019-03-27 19:10:47 ERROR Executor:91 - Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2287)
I was wondering if anyone has tried it and knows any workaround
Is Kubernetes supported on jupyter almond kernel?
I am having the same issue. I am running kubernetes
ERROR ExecutorClassLoader: Failed to check existence of class ammonite.$sess.cmd32$Helper$Addres on REPL class server at http://10.xx.0.xxx:xxxxxx
I assume it's the metabrowser. Need a way to set that port so that it would be possible to set a container port on the jupyter container so that spark workers can access the browser. It's in the code, but I can't find a way to set it without having to rebuild almond
I did more investigating on this. Was able to provide firewall access for the random port using a network policy (I run spark on kubernetes) and was able to connect. But instead of not being able to check the existence of the class it gets an empty string. So same error, slightly different
org.apache.spark.repl.RemoteClassLoaderError: ammonite.$sess.cmd7$Helper
But if I use a local master with .master("local[*]") it works fine. For some reason the REPL class server returns empty strings when using a remote master.
Also tried loading different versions of ammonite and spark, and different version of scala. Once all the versions are lined up I get the same error every time.
I'm also having the exact same issue with almond. Once I used local spark, it works perfectly. However, when I tried to use a remote master, it says the same error that @bitWeaver-arch mentioned above.
Another update: It doesn't seem to be an almond issue. I started and ammnite shell and get the same results. Something interesting though, maybe useful
ERROR TaskResultGetter: Could not deserialize TaskEndReason: ClassNotFound with classloader ammonite.runtime.SpecialClassLoader@1ee29c84
@ ammonite.runtime.SpecialClassLoader
res18: runtime.SpecialClassLoader.type = ammonite.runtime.SpecialClassLoader$@e156110
I wonder if it's normal that the resource ids are different