spark-scala3 icon indicating copy to clipboard operation
spark-scala3 copied to clipboard

Improve error message when encoding/decoding recursive case classes

Open jberkel opened this issue 1 year ago • 4 comments

I'm not sure if this is something supported by Spark, but I have the following recursive case class:

case class SerializedSense(
    definition: String,
    subsenses: Seq[SerializedSense] = Seq.empty
) derives upickle.default.ReadWriter

I can serialize/deserialize this successfully using upickle, but when using the same case class inside Spark

import scala3encoders.given

def process(input: DataFrame): Unit =
  input
    .as[SerializedSense]
    …

I get the following compiler error:

No given instance of type scala3encoders.derivation.Deserializer[Seq[SerializedSense]] was found. I found:
…

Is this a limitation of Spark or is this a case not yet supported by the encoder?

jberkel avatar Nov 07 '23 09:11 jberkel

@jberkel the error message is misleading - not sure how to fix that, but spark does not support recursive (circular) types

when trying to compile encoding/decoding of a recursive class in vanilla spark (using scala 2.13) the error message is a bit more helpful maybe:

[error] org.apache.spark.SparkUnsupportedOperationException: cannot have circular references in class, but got the circular reference of class <classname>.

michael72 avatar Nov 07 '23 16:11 michael72

Ok, that's what I suspected. If the error message can be improved, good, otherwise it's not a big deal.

jberkel avatar Nov 07 '23 22:11 jberkel

Correction on my part - it is not a compile time error but a runtime error. So I'm not sure if we can do it in this library. Maybe using walkedTypePath for that. However there are more complicated scenarios and even spark doesn't respond with a proper error message - e.g.

trait Node
case class Tree(children: List[Node]) extends Node
case class Leaf(info: String) extends Node

val chk = List(Tree(List(Leaf("hello")))).toDF().as[Tree]

will give you:

[error] org.apache.spark.SparkUnsupportedOperationException: [ENCODER_NOT_FOUND] Not found an encoder of the type Node to Spark SQL internal representation. Consider to change the input type to one of supported at 'https://spark.apache.org/docs/latest/sql-ref-datatypes.html'.

whereas List(Leaf("hello")).toDF().as[Leaf] is fine.

OK - maybe a bad example. Let's just say types are tricky ;-)

michael72 avatar Nov 08 '23 07:11 michael72

Correction on my part - it is not a compile time error but a runtime error. So I'm not sure if we can do it in this library. Maybe using walkedTypePath for that.

The fact that this is now a compile time error and not a runtime error is already a huge win!

I'm wondering if @implicitNotFound could be used to customize the message?

jberkel avatar Nov 08 '23 08:11 jberkel