ammonite-spark
ammonite-spark copied to clipboard
Accessing encoders for classes defined inside the ammonite session
We're running into an issue when we attempt to share classes defined in the interpreter context with a Spark Executor. Everything works up to the point where you want to coerce a string to case class that has been defined in the interactive context.
Here is a simple set of commands that demonstrate this:
import $ivy.`sh.almond::ammonite-spark:0.2.0`
import org.apache.spark.sql._
val spark = AmmoniteSparkSession.builder.getOrCreate
spark.sql("select * from dual").show()
val df = spark.sql("select * from dual")
case class Foo(foo: String)
import spark.implicits._
df.as[Foo].collect().foreach(println)
The spark.sql("select * from dual").show()
command produces:
+--------+
| foo|
+--------+
|prodhive|
+--------+
but when one attempts to coerce the values to the Foo class, you get an error like the following:
ERROR ScalaInterpreter exception in user code (Unable to generate an encoder for inner class `ammonite.$sess.cmd3$Helper$Foo` without access to the scope that this class was defined in.
Try moving this class out of its parent class.;)
org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class `ammonite.$sess.cmd3$Helper$Foo` without access to the scope that this class was defined in.
Try moving this class out of its parent class.;
@alexarchambault Is this a known issue? How can this be addressed?
cc @rdblue
Ah…
So based on a few StackOverflow answers(most notably this one) adding this line (org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this)
) to the same cell as the declaration of the case class is sufficient to make this work.
This line also needs to be added to any other cell that is going to define a case class that will be used in this way (since this
refers to the outer scope of the cell).
Am I understanding this fairly completely or is there more to it?
I think that's all there is to it… It's mentioned here too. I'd really like to automate that.
It looks like we may be able to add another rule to Spark that automatically finds the outer scope based on the case class. This shows a rules that does it for the Spark REPL: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala#L60-L79
@alexarchambault, what do you think the equivalent would be for Ammonite?
If we can figure that out, I'll open a PR for Spark that allows us to extend this with configuration.
Here's what worked:
case AmmoniteREPLClass(cellClassName) =>
() => {
val objClass = Utils.classForName(cellClassName)
val objInstance = objClass.getField("MODULE$").get(null)
objClass.getMethod("instance").invoke(objInstance)
}
private[this] val AmmoniteREPLClass = """^(ammonite\.\$sess\.cmd(?:\d+)\$).*""".r
@rdblue Neat. Would be it nice if spark allowed to add that, yes.
@alexarchambault, the PR is https://github.com/apache/spark/pull/23607. Feel free to review it or test it out in your environment.