amoro icon indicating copy to clipboard operation
amoro copied to clipboard

[Improvement]: Object Serialization Optimization and Support

Open czy006 opened this issue 1 year ago • 3 comments

Search before asking

  • [X] I have searched in the issues and found no similar issues.

What would you like to be improved?

Currently, we use Java serialization and Kyro serialization. This serialization method may have some issues, including low performance, We use Kyro serialization for PUT and GET operations on Rocksdb, which is a lookup join feature in Mixed Format

In the objects we store in the database, we also need to serialize and deserialize. During the upgrade process, we occasionally encounter deserialization errors and issues (as shown in the figure below)

Through research, we found that Apache Fury can improve serialization performance and solve deserialization problems. We will provide performance testing reports in the future to compare before and after replacement

How should we improve?

  • Abstract resource serialization interface, implementation of native serialization in current Java, implementation of Kyro serialization
  • Implement Fury serialization and provide configuration options for Fury serialization, while marking other serialization methods as expired
  • When the Amoro LTS version is completed, we will remove the implementation of Kyro serialization

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

czy006 avatar Dec 09 '24 08:12 czy006

This problem is caused by our upgrading to different iceberg versions. There seems to be nothing we can do to intervene

java.lang.IllegalArgumentException: deserialization error 
	at org.apache.amoro.utils.SerializationUtil.simpleDeserialize(SerializationUtil.java:68)
	at org.apache.amoro.optimizer.spark.SparkOptimizerExecutor.jobDescription(SparkOptimizerExecutor.java:85)
	at org.apache.amoro.optimizer.spark.SparkOptimizerExecutor.executeTask(SparkOptimizerExecutor.java:58)
	at org.apache.amoro.optimizer.common.OptimizerExecutor.start(OptimizerExecutor.java:53)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.InvalidClassException: org.apache.iceberg.BaseFile; local class incompatible: stream classdesc serialVersionUID = 4493876333706690896, local class serialVersionUID = -6272254142325460014
	at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1883)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1749)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1883)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1749)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2040)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1973)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1565)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2285)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2209)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2067)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1571)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.amoro.utils.SerializationUtil.simpleDeserialize(SerializationUtil.java:65)
	... 4 more

czy006 avatar Dec 09 '24 08:12 czy006

i think json is the best way for object serialization. it can be parsed by any language.Anyway, using java default serialization isn't the efficient. Kyro is enough.

ihadoop avatar Dec 21 '24 07:12 ihadoop

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Jun 20 '25 00:06 github-actions[bot]

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions[bot] avatar Jul 04 '25 00:07 github-actions[bot]