[Python] Support meta share for type forward/backward compatibility
Feature Request
Meta share is used for support type forward/backward compatibility. Currently this is only supported in native java/javascript serializaiton.
We should support meta share for xlang serialization format, so that when a class in java/python/... add/delete a field, the deserializers in python can still succeed
Is your feature request related to a problem? Please describe
No response
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
#2159
Hi, I just looked at the implementation of metashare and compatible in Java version. But looking at the Python code, it seems that the implementation of "compatible based on metashare" hasn't even started yet. Starting from scratch is a bit challenging. Do you have any suggestions? I plan to start by implementing a simple meta_context, related resolvers, and serializers based on the protocol documentation.
Hi @urlyy , you can start from MetaContext. The most compliated part is encoding class schema as org.apache.fury.meta.ClassDef. But currently ClassDef only support pure java serialization, we need to support encode struct fields schema into ClassDef first for xlang serialization format based on our spec def: https://fury.apache.org/docs/specification/fury_xlang_serialization_spec#schema-evolution. I will finish this in next days, you can take it as an example for python
@chaokunyang Hi, I just read through the specification document and have some questions to ask:
meta_shareis used when a Fury object processes some objects of the same class multiple times. So if I use a Fury object sometimes deal with the parent class and sometimes the subclass, does that mean I shouldn't use meta_share? Are there any other situations where I need to be compatible but should not use meta_share?- Is
schema evolutionthe implementation approach of "type forward/backward compatibility"? - Does this issue involve
codegen?
Thank you!
I don't find an example in MetaSharedCompatibleTest.class which has fury.withLanguage(Language.XLANG), so I add this, and add fury.register(beanA), fury.register(beanB), but test failed when xwrite in fury1.serialize(beanA);.
I don't know if it's my test code incorrect or the implementation of XLANG & COMPATIBLE is incorrect.
Here is my code. Is there someone can help me?
@Test
public void testWriteCompatibleCollectionSimple() throws Exception {
BeanA beanA = BeanA.createBeanA(2);
String pkg = BeanA.class.getPackage().getName();
String code =
""
+ "package "
+ pkg
+ ";\n"
+ "import java.util.*;\n"
+ "import java.math.*;\n"
+ "public class BeanA {\n"
+ " private List<Double> doubleList;\n"
+ " private Iterable<BeanB> beanBIterable;\n"
+ " private List<BeanB> beanBList;\n"
+ "}";
Class<?> cls1 =
loadClass(
BeanA.class,
code,
MetaSharedCompatibleTest.class + "testWriteCompatibleCollectionBasic_1");
Fury fury1 =
furyBuilder()
.withCodegen(false)
.withMetaShare(true)
.withLanguage(Language.XLANG)
.withCompatibleMode(CompatibleMode.COMPATIBLE)
.withClassLoader(cls1.getClassLoader())
.build();
fury1.register(beanA.getClass(), "test.BeanA");
fury1.register(BeanB.class, "test.BeanB");
code =
""
+ "package "
+ pkg
+ ";\n"
+ "import java.util.*;\n"
+ "import java.math.*;\n"
+ "public class BeanA {\n"
+ " private List<Double> doubleList;\n"
+ " private Iterable<BeanB> beanBIterable;\n"
+ "}";
Class<?> cls2 =
loadClass(
BeanA.class,
code,
MetaSharedCompatibleTest.class + "testWriteCompatibleCollectionBasic_2");
Object o2 = cls2.newInstance();
ReflectionUtils.unsafeCopy(beanA, o2);
Fury fury2 =
furyBuilder()
.withCodegen(false)
.withMetaShare(true)
.withLanguage(Language.XLANG)
.withCompatibleMode(CompatibleMode.COMPATIBLE)
.withClassLoader(cls2.getClassLoader())
.build();
fury2.register(beanA.getClass(), "test.BeanA");
fury2.register(BeanB.class, "test.BeanB");
MetaContext context1 = new MetaContext();
MetaContext context2 = new MetaContext();
fury1.getSerializationContext().setMetaContext(context1);
byte[] objBytes = fury1.serialize(beanA);
fury2.getSerializationContext().setMetaContext(context2);
Object obj2 = fury2.deserialize(objBytes);
Assert.assertTrue(ReflectionUtils.objectCommonFieldsEquals(obj2, o2));
}
@chaokunyang Hi, I just read through the specification document and have some questions to ask:
meta_shareis used when a Fury object processes some objects of the same class multiple times. So if I use a Fury object sometimes deal with the parent class and sometimes the subclass, does that mean I shouldn't use meta_share? Are there any other situations where I need to be compatible but should not use meta_share?- Is
schema evolutionthe implementation approach of "type forward/backward compatibility"?- Does this issue involve
codegen?Thank you!
Hi @urlyy , for your quesiont:
- If you use Fury to deal with the parent class, fury need to write meta for parent class. If you use Fury to deal with the subclass, fury will write meta for subclass, but note that meta for subclass includes meta for parent class
- You are right,
schema evolution specis the the implementation approach of "type forward/backward compatibility" - This issue doesn't involve
codegen,, we should implement codegen in another PR. Currently we don't implement codegen for xlang serialization in python. Codegen will be supported after we merge ComplexObjectSerializer and DataclassSerializer into one serializer.
I don't find an example in
MetaSharedCompatibleTest.classwhich hasfury.withLanguage(Language.XLANG), so I add this, and addfury.register(beanA),fury.register(beanB), but test failed whenxwriteinfury1.serialize(beanA);. I don't know if it's my test code incorrect or the implementation ofXLANG & COMPATIBLEis incorrect.
Fury java hasn't support type forward/backward compatibility for xlang yet. I will implement that in next a few days.
Hi @chaokunyang , I'm mapping spec to code, I have read at the field info block corresponding to the Meta share#Single layer class meta
| field info: variable bytes |
+-------------------------------+
| header + type id + field name |
but I don't understand the last line in green:
After writing type id and filed name, a classId(or typeId?) is also written to buffer via fieldType.write(...), which seems to be not reflected in the spec. what is it for? Or does it reflected in the end spec paragraph?
Field order are left as implementation details, which is not exposed to specification,
the deserialization need to resort fields based on Fury field comparator.
In this way, fury can compute statistics for field names or types and using a more compact encoding.
Thank you!
@chaokunyang Sorry, I have another question. I've started implementing python meta_share according to spec. But I'm new to cython. I found that _fury.py is pretty much the same as _serialization.pyx, and _serialization.pyx only has some cython definitions. And it seems that all the tests are using _serialization.pyx logic. So do I have to implement the same logic(Paste + minor modifications) in both codes? It's a bit of a hassle😨
@urlyy Serialization is a complex, when the deserialization fail, we need to debug and parse data byte by byte, and it's not easy to do it in cython. So we implement a pure python version in _fury.py. This implementation is only used for debug. You can make a most simple implementation without any optimization. But it does still have some duplciated code.
Hi @chaokunyang , I'm mapping spec to code, I have read at the
field info blockcorresponding to the Meta share#Single layer class meta
The spec needs updates. I'm working on it in https://github.com/apache/fury/pull/2197. Still need some time.
Hi @urlyy , I updated type meta spec in #2216 , and finished reference implementation in https://github.com/apache/fury/pull/2197. Please take a look.
@chaokunyang I wan to use pyfury (not pyx, just py) to better debug, but got error:
TypeError: NoneSerializer.__init__()
Cannot convert Fury to pyfury._serialization.Fury
I comment some code to not import pyx version.
# try:
# from pyfury._serialization import ENABLE_FURY_CYTHON_SERIALIZATION
# except ImportError:
# ENABLE_FURY_CYTHON_SERIALIZATION = False
ENABLE_FURY_CYTHON_SERIALIZATION = False
and my test code is:
from pyfury import Fury, Language
from pyfury.meta.meta_share import CompatibleMode
def test_xlang_metashare():
fury = Fury(language=Language.XLANG, ref_tracking=True, compatible_mode=CompatibleMode.COMPATIBLE)
binary = fury.serialize(-1.0)
Since using Cython pyx requires recompilation every time it's updated and makes debugging difficult, I still want to start with a simple Python version and then switch to Cython.
Can you teach me how to run a pure Python implementation?
set env variable ENABLE_FURY_CYTHON_SERIALIZATION to 0 will use python for serialization
In pyfury, sometimes it's "class" and sometimes it's "type". Are there any naming conventions? I often confused about naming now.
We plan to transition to type gradually, new code should already use type instead. It's more common in cross language context. Languages like go/rust only use type for declare class