fixed binary serde issue

Open comphead opened this issue 5 years ago • 0 comments

Fixed serde issue when reading/writing a dataframe in binary mode. Please consider next example

case class Outer(
                  arr0: Array[Inner],
                  str0: String,
                  str1: String,
                  arr1: Array[String],
                  str2: String)
case class Inner(str0: String, id0: Int)
    
  def testDF[T](df: Dataset[T]): Unit = {
    df.printSchema()
    val schema = df.schema
    df.write
      .mode(SaveMode.Overwrite)
      .format("org.apache.spark.sql.redis")
      .option("table", "t")
      .option("model", "binary")
      .save()


    val df0 = session.read.format("org.apache.spark.sql.redis")
      .schema(schema)
      .option("table", "t")
      .option("model", "binary")
      .load()

    df0.printSchema()

    df0.show(false)
  }

testDF(Seq(
      Outer(
        arr0 = Array(Inner("str0", 0)),
        str0 = "str0",
        str1 = "str1",
        arr1 = Array("arr1"),
        str2 = "str2"
      )
    ).toDS())

That fails with Caused by: java.lang.IllegalArgumentException: The value (1) of the type (java.lang.String) cannot be converted to an array of structstr0:string,id0:int

The reason of that is:

In Redis we already have an object stored with attrs order arr0, str0, str1, arr1, str2
buildScan however gets the requiredColumns in another order str0, arr1, str1, arr0, str2
binary decoder didn't apply attrs position, just set the updated schema which is not enough The proposed fix makes correct attr order for binary deserialized value

Also please note, without provided schema its difficult to deserealize the binary value as we dont have an initial order. Added warning for that

Jun 10 '20 11:06 comphead