py4j How to correctly get Scala case classes from jvm?

I'm trying to use a Scala library with Py4J and PySpark. Scala classes with constructors and Scala objects can be accessed from the jvm without problems, but Scala case classes without explicit constructors can not be instantiated. Scala case classes are treated as "JavaClass" in the jvm, but since there's no constructor, I cant not instantiate case classes even with .apply() (function apply not found).

Anyone knows if there's a workaround? (I have to use Python and PySpark, can't use Scala) Thanks!

Jun 19 '19 23:06 QKKQQK

I thinks case class have its "Java constructor" and you can use that for py4j.

In another way, you can write the Java wrapper class and provide the API to create the Scala case class as well.

Jun 24 '19 13:06 tobegit3hub

Hi, This was the top result when googling for "scala case class py4j". Did you ever figure out how to achieve this?

May 18 '20 07:05 jamiekt

Same here.

I manage to find a solution there:

def ref_scala_object(object_name):
  clazz = jvm.java.lang.Class.forName(object_name+"$")
  ff = clazz.getDeclaredField("MODULE$")
  o = ff.get(None)
  return o

will return the associated object of your case class. You can then call it's apply method like this :

ref_scala_object("fully.qualified.name.of.your.case.class").apply(arg1, ...)

Here is some real-life example that seem to work, I was trying to call Spark's from_avro UDF with pyspark 2.4.5. The good news is that it will be built-in in pySpark 3.0

# By calling the method "from_avro" inside the package object "org.apache.spark.sql.avro"
def from_avro(col, avro_schema):
    from pyspark.sql.column import Column
    to_java_column = spark.sparkContext._gateway.jvm.org.apache.spark.sql.Column
    from_avro = ref_scala_object("org.apache.spark.sql.avro.package").from_avro
    jc = from_avro(to_java_column("value"), avro_schema)
    return Column(jc)

# By calling the case class "org.apache.spark.sql.avro.AvroDataToCatalyst"
def from_avro_2(col, avro_schema):
    from pyspark.sql.column import Column
    to_java_column = spark.sparkContext._gateway.jvm.org.apache.spark.sql.Column
    AvroDataToCatalyst = ref_scala_object("org.apache.spark.sql.avro.AvroDataToCatalyst")
    jc = to_java_column(AvroDataToCatalyst.apply(to_java_column(col).expr(), avro_schema))
    return Column(jc)

May 29 '20 17:05 FurcyPin

Thank you for sharing @FurcyPin , that will be very useful indeed

Jun 01 '20 19:06 jamiekt

py4j py4j copied to clipboard

How to correctly get Scala case classes from jvm?

py4j
py4j copied to clipboard