[Java] Align object array to collection serialization protocol v2
Is your feature request related to a problem? Please describe.
In #923 , we implemented the collection protocol in #927 , which can be 1X faster at most. This can improve space cost by 1X at most. Such optimization should be applied to array too.
Describe the solution you'd like
Refactor io.fury.serializer.ArraySerializers.ObjectArraySerializer to forward serialization to FuryArrayAsListSerializer.
- Wrap array into ArrayAsList`, note this wrapped object can't be reused if nested serialization invoke to this place too.
- Push array component generics to
FuryArrayAsListSerializer - If array component doesn't have nested generics, reimplement the serialization in
ObjectArraySerializerfor better peroformance.
Additional context
#1228
Hello! Is this issue still available?
I would like to contribute to this issue as it's my first issue I may need some help getting into it.
Thanks
Yes, it's available. Welcome to take over it. I'm willing to provide help
Great so I have started looking at the code for ObjectArraySerializer and have a couple of questions:
Wrap array into ArrayAsList, note this wrapped object can't be reused if nested serialization invoke to this place too.
- Seems to be the same logic as used in StringArraySerializer methods is this accurate?
Push array component generics to FuryArrayAsListSerializer
- Does this mean using setElementSerializer() method with genric serializer? ie.
this.componentTypeSerializer = fury.getClassResolver().getSerializer(componentType);
If array component doesn't have nested generics, reimplement the serialization in ObjectArraySerializer for better peroformance.
- I am not sure I understand what nested generics means in this case? Is it something along the line of given a T[] where T is a <k, v> Map<k, v> for example? This would mean in case of T[] where T is just some class with no generics it should continue to use the previous serialization approach?
- Seems GenericType::hasGenericParameters already implements this logic?
Yes, you are right. wrapped object can't be reused if nested serialization invoke to this place too.
setElementSerializer can not be used. Since it can only push one layer serializer. Push generics is enough. It's just like normal collection serializer.
For Nested generics, It's the array component type info. For example, we should push List<List<Foo>> For Foo[][].
For more simple array type without nested generics, we could implement a faster version in a special serializer
Thanks!
Based on your comment I am understanding it as for nested generics we need to take into account dimensions of the array which is already calculated in the array serialization code.
int dimension = 0;
while (t != null && t.isArray()) {
dimension++;
t = t.getComponentType();
if (t != null) {
innerType = t;
}
}
With regards to setElementSerializer can not be used. Since it can only push one layer serializer. Push generics is enough. It's just like normal collection serializer.
I think I understand with regards too "it can only push one layer serializer" as meaning only able to handle one dimension (ie. String[] and not String[][]). Push generics is enough I am not sure exactly on the meaning of this?
Suppose we build String[][] into GenericType in way like List<List<String>>.
And we push this generic type before invoking org.apache.fury.serializer.collection.AbstractCollectionSerializer#writeElements. In writeElements, it will get element type List<String>, then use this type for every elem of list type serialization. Before serializing every elem, it will push List<String> generic type, so when serializing inner most elem, the CollectionSerializer will know that the elem type is String
One more thing to note is that primitive one-dimensional array use special serializers, and doesn't follow this pattern.