kryo icon indicating copy to clipboard operation
kryo copied to clipboard

MapSerializer fails to serialize custom Map implementation

Open mxm opened this issue 9 months ago • 5 comments

Describe the bug Apache Iceberg uses two custom Map implementations. Kryo serialization worked fine for these two with Kryo 2.24.0, but fails to deserialize with 5.62:

  • https://github.com/apache/iceberg/blob/d14298fe9e826f67e0a2c4d8e279c6b032912726/core/src/main/java/org/apache/iceberg/SerializableByteBufferMap.java#L31
  • https://github.com/apache/iceberg/blob/d14298fe9e826f67e0a2c4d8e279c6b032912726/core/src/main/java/org/apache/iceberg/util/SerializableMap.java#L28

We get this error on deserialization:

Caused by: java.lang.NullPointerException: Cannot invoke "java.util.Map.put(Object, Object)" because "this.wrapped" is null
	at org.apache.iceberg.SerializableByteBufferMap.put(SerializableByteBufferMap.java:132)
	at org.apache.iceberg.SerializableByteBufferMap.put(SerializableByteBufferMap.java:31)
	at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:236)
	at com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:42)
	at com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:796)
	at com.esotericsoftware.kryo.serializers.ReflectField.read(ReflectField.java:124)
	... 139 more

To Reproduce Test case to reproduce: https://github.com/apache/iceberg/blob/d14298fe9e826f67e0a2c4d8e279c6b032912726/flink/v1.20/flink/src/test/java/org/apache/iceberg/flink/TestDataFileSerialization.java#L156

The Object constructor is used internally by Kryo to instantiate the classes (SunReflectionFactoryInstantiator). The actual constructor does not get called, leading to the map being uninitialized when the put methods are called.

Environment:

  • OS: MacOS
  • JDK Version: 11, 17
  • Kryo Version: 5.6.2

Additional context This has been working fine until Flink 2.0 bumped Kryo from 2.24.0 to 5.6.2. Adding a custom Kryo serializer is not straight-forward in our setup because we run Iceberg on top of Apache Flink. We don't have full control over Flink's serialization config.

mxm avatar Mar 21 '25 12:03 mxm

Thanks for the report.

I'm wondering why these classes are instantiated by SunReflectionFactoryInstantiator. They have a no-arg constructor that should be called by Kryo's DefaultInstantiatorStrategy. What does the Kryo setup in Flink look like? Does it provide a custom InstantiatorStrategy?

theigl avatar Mar 21 '25 17:03 theigl

Please try to reproduce this with pure Kryo without any external dependencies. That makes it much easier to see what's going on and to compare Kryo 2.x and Kryo 5.

theigl avatar Mar 21 '25 17:03 theigl

Thanks for the quick response! I did a minimal repro and figured out what causes this: Flink configures

setInstantiatorStrategy(new org.objenesis.strategy.StdInstantiatorStrategy());

and also loads the Twitter Chill library. In Flink 2.0.0 this library has been removed. So the regression isn't directly caused by the Kryo upgrade but by removing Chill and relying solely on StdInstantiatorStrategy.

Using DefaultInstantiatorStrategy works fine.

Is it expected that Kryo cannot handle custom map types with only StdInstantiatorStrategy? I can reproduce the error even with Kryo 2.24.0 when I remove Chill and use StdInstantiatorStrategy, so I'm inclined to say yes :)

mxm avatar Mar 24 '25 09:03 mxm

I've opened https://issues.apache.org/jira/browse/FLINK-37546.

mxm avatar Mar 24 '25 09:03 mxm

OK that's good to hear. So there is a clear path to fixing this by using DefaultInstantiatorStrategy with a fallback on StdInstantiatorStrategy.

Is it expected that Kryo cannot handle custom map types with only StdInstantiatorStrategy? I can reproduce the error even with Kryo 2.24.0 when I remove Chill and use StdInstantiatorStrategy, so I'm inclined to say yes :)

The StdInstantiatorStrategy (despite its name) should really only be used as a fallback for classes without no-args constructors. See the relevant section in the docs.

theigl avatar Mar 24 '25 09:03 theigl