py4j
py4j copied to clipboard
How to correctly get Scala case classes from jvm?
I'm trying to use a Scala library with Py4J and PySpark. Scala classes with constructors and Scala objects can be accessed from the jvm without problems, but Scala case classes without explicit constructors can not be instantiated. Scala case classes are treated as "JavaClass" in the jvm, but since there's no constructor, I cant not instantiate case classes even with .apply() (function apply not found).
Anyone knows if there's a workaround? (I have to use Python and PySpark, can't use Scala) Thanks!
I thinks case class have its "Java constructor" and you can use that for py4j.
In another way, you can write the Java wrapper class and provide the API to create the Scala case class as well.
Hi, This was the top result when googling for "scala case class py4j". Did you ever figure out how to achieve this?
Same here.
I manage to find a solution there:
def ref_scala_object(object_name):
clazz = jvm.java.lang.Class.forName(object_name+"$")
ff = clazz.getDeclaredField("MODULE$")
o = ff.get(None)
return o
will return the associated object of your case class. You can then call it's apply method like this :
ref_scala_object("fully.qualified.name.of.your.case.class").apply(arg1, ...)
Here is some real-life example that seem to work, I was trying to call Spark's from_avro UDF with pyspark 2.4.5. The good news is that it will be built-in in pySpark 3.0
# By calling the method "from_avro" inside the package object "org.apache.spark.sql.avro"
def from_avro(col, avro_schema):
from pyspark.sql.column import Column
to_java_column = spark.sparkContext._gateway.jvm.org.apache.spark.sql.Column
from_avro = ref_scala_object("org.apache.spark.sql.avro.package").from_avro
jc = from_avro(to_java_column("value"), avro_schema)
return Column(jc)
# By calling the case class "org.apache.spark.sql.avro.AvroDataToCatalyst"
def from_avro_2(col, avro_schema):
from pyspark.sql.column import Column
to_java_column = spark.sparkContext._gateway.jvm.org.apache.spark.sql.Column
AvroDataToCatalyst = ref_scala_object("org.apache.spark.sql.avro.AvroDataToCatalyst")
jc = to_java_column(AvroDataToCatalyst.apply(to_java_column(col).expr(), avro_schema))
return Column(jc)
Thank you for sharing @FurcyPin , that will be very useful indeed