spark-redis
spark-redis copied to clipboard
fixed binary serde issue
Fixed serde issue when reading/writing a dataframe in binary mode. Please consider next example
case class Outer(
arr0: Array[Inner],
str0: String,
str1: String,
arr1: Array[String],
str2: String)
case class Inner(str0: String, id0: Int)
def testDF[T](df: Dataset[T]): Unit = {
df.printSchema()
val schema = df.schema
df.write
.mode(SaveMode.Overwrite)
.format("org.apache.spark.sql.redis")
.option("table", "t")
.option("model", "binary")
.save()
val df0 = session.read.format("org.apache.spark.sql.redis")
.schema(schema)
.option("table", "t")
.option("model", "binary")
.load()
df0.printSchema()
df0.show(false)
}
testDF(Seq(
Outer(
arr0 = Array(Inner("str0", 0)),
str0 = "str0",
str1 = "str1",
arr1 = Array("arr1"),
str2 = "str2"
)
).toDS())
That fails with Caused by: java.lang.IllegalArgumentException: The value (1) of the type (java.lang.String) cannot be converted to an array of structstr0:string,id0:int
The reason of that is:
- In Redis we already have an object stored with attrs order arr0, str0, str1, arr1, str2
- buildScan however gets the requiredColumns in another order str0, arr1, str1, arr0, str2
- binary decoder didn't apply attrs position, just set the updated schema which is not enough The proposed fix makes correct attr order for binary deserialized value
Also please note, without provided schema its difficult to deserealize the binary value as we dont have an initial order. Added warning for that