ammonite-spark icon indicating copy to clipboard operation
ammonite-spark copied to clipboard

Accessing encoders for classes defined inside the ammonite session

Open mpacer opened this issue 5 years ago • 10 comments

We're running into an issue when we attempt to share classes defined in the interpreter context with a Spark Executor. Everything works up to the point where you want to coerce a string to case class that has been defined in the interactive context.

Here is a simple set of commands that demonstrate this:

import $ivy.`sh.almond::ammonite-spark:0.2.0`
import org.apache.spark.sql._
val spark = AmmoniteSparkSession.builder.getOrCreate
spark.sql("select * from dual").show()
val df = spark.sql("select * from dual")
case class Foo(foo: String)
import spark.implicits._
df.as[Foo].collect().foreach(println)

The spark.sql("select * from dual").show() command produces:

+--------+
|     foo|
+--------+
|prodhive|
+--------+

but when one attempts to coerce the values to the Foo class, you get an error like the following:

ERROR ScalaInterpreter exception in user code (Unable to generate an encoder for inner class `ammonite.$sess.cmd3$Helper$Foo` without access to the scope that this class was defined in.
Try moving this class out of its parent class.;)
org.apache.spark.sql.AnalysisException: Unable to generate an encoder for inner class `ammonite.$sess.cmd3$Helper$Foo` without access to the scope that this class was defined in.
Try moving this class out of its parent class.;

@alexarchambault Is this a known issue? How can this be addressed?

cc @rdblue

mpacer avatar Jan 16 '19 01:01 mpacer

Ah… So based on a few StackOverflow answers(most notably this one) adding this line (org.apache.spark.sql.catalyst.encoders.OuterScopes.addOuterScope(this)) to the same cell as the declaration of the case class is sufficient to make this work.

This line also needs to be added to any other cell that is going to define a case class that will be used in this way (since this refers to the outer scope of the cell).

Am I understanding this fairly completely or is there more to it?

mpacer avatar Jan 16 '19 01:01 mpacer

I think that's all there is to it… It's mentioned here too. I'd really like to automate that.

alexarchambault avatar Jan 16 '19 11:01 alexarchambault

It looks like we may be able to add another rule to Spark that automatically finds the outer scope based on the case class. This shows a rules that does it for the Spark REPL: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala#L60-L79

@alexarchambault, what do you think the equivalent would be for Ammonite?

If we can figure that out, I'll open a PR for Spark that allows us to extend this with configuration.

rdblue avatar Jan 16 '19 17:01 rdblue

Here's what worked:

        case AmmoniteREPLClass(cellClassName) =>
          () => {
            val objClass = Utils.classForName(cellClassName)
            val objInstance = objClass.getField("MODULE$").get(null)
            objClass.getMethod("instance").invoke(objInstance)
          }
  private[this] val AmmoniteREPLClass = """^(ammonite\.\$sess\.cmd(?:\d+)\$).*""".r

rdblue avatar Jan 17 '19 01:01 rdblue

@rdblue Neat. Would be it nice if spark allowed to add that, yes.

alexarchambault avatar Jan 17 '19 09:01 alexarchambault

@alexarchambault, the PR is https://github.com/apache/spark/pull/23607. Feel free to review it or test it out in your environment.

rdblue avatar Jan 22 '19 00:01 rdblue