fury icon indicating copy to clipboard operation
fury copied to clipboard

[Scala] support default not-null value in COMPATIBLE mode.

Open LoranceChen opened this issue 1 year ago • 6 comments

Currently, for scala, add new field and deserialize from old binary data will get a null as new field value. But avoid null is a good practice in scala language.

  val personBytes = readBytesFromFile("person_v1") // there are old version `Person(1,true,some text)`
  //  append field as new person: case class Person(a: Int, b: Boolean, c: String, d: String = "default d")
  val deserPerson = fury.deserializeJavaObject(personBytes, classOf[Person])
  println(s"deserPerson: ${deserPerson}") // deserPerson: Person(1,true,some text,null)

I think it's better using default value our a empty value to set the new field. Such as give the result:

// deserPerson: Person(1,true,some text, "default d")

And if there not a default value in the field define, can give a empty value. For String is "" will be better for null.

If the new field is a structure, can using a default value to instance this one. case class Foo(a: String, b: Int) can setting the default value as Foo("", 0)

However, for some performance care scenario. Using null should be better and handle by developer.

I'm advice add a new configuration to decide using the default value or null for new field.

LoranceChen avatar Jun 12 '24 04:06 LoranceChen

Hi @LoranceChen , thanks for bring this up. It's very necessary to support this in Apache Fury.

Scala didn't provide a method to construct object with default value at bytecode level. It generate bytecode to invoke constructor with all parameters provided, and default params are provided at callsite.

If we need to provide default value when creating object, we need to extract the default value. Fortunately, scala generate a method like SomeClass$.apply$default$2:()I:

case class SomeClass(v: List[IdAnyVal], x:Int=1)

// Callsite bytecode
      34: getstatic     #131                // Field org/apache/fury/serializer/SomeClass$.MODULE$:Lorg/apache/fury/serializer/SomeClass$;
      37: invokevirtual #135                // Method org/apache/fury/serializer/SomeClass$.apply$default$2:()I
      40: invokespecial #138                // Method org/apache/fury/serializer/SomeClass."<init>":(Lscala/collection/immutable/List;I)V
      43: putstatic     #87                 // Field p:Lorg/apache/fury/serializer/SomeClass;
      46: getstatic     #143                // Field scala/Predef$.MODULE$:Lscala/Predef$;

We may can detect whether such method exists to know which parameter has default value, and provide it as default value when constructing object. This will take some horse work. We don't have time for this currently. Would you like to contribute to this? The record contructor in Fury org.apache.fury.builder.ObjectCodecBuilder#createRecord/org.apache.fury.serializer.ObjectSerializer#read can be taken as an example.

chaokunyang avatar Jun 16 '24 15:06 chaokunyang

Hi, great to see can solve it. Glade to take a PR if possiable and I need sometime to familiar with the repository.

LoranceChen avatar Jun 17 '24 02:06 LoranceChen

If the field doesn't exist in serialization process, but does exist in deserialization process, we can invoke method like SomeClass$.apply$default$2:()I to get default value for such field, and set it the object

chaokunyang avatar Jun 17 '24 02:06 chaokunyang

Hi, @chaokunyang , do you some advice to debug the codegen init process? The generated code seems not easy to trace the logic where it is. image

Thanks

LoranceChen avatar Jul 02 '24 12:07 LoranceChen

You can configure FURY_CODE_DIR environment variable to set generated code dir, if you set it to src directory, then you can debug it in IDE when rerun the code

chaokunyang avatar Jul 02 '24 12:07 chaokunyang

Hi @LoranceChen , are you still working on this issue?

chaokunyang avatar Dec 04 '24 05:12 chaokunyang