fury icon indicating copy to clipboard operation
fury copied to clipboard

[Python] Support meta share for type forward/backward compatibility

Open chaokunyang opened this issue 8 months ago • 2 comments

Feature Request

Meta share is used for support type forward/backward compatibility. Currently this is only supported in native java/javascript serializaiton.

We should support meta share for xlang serialization format, so that when a class in java/python/... add/delete a field, the deserializers in python can still succeed

Is your feature request related to a problem? Please describe

No response

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

#2159

chaokunyang avatar Apr 19 '25 04:04 chaokunyang

Hi, I just looked at the implementation of metashare and compatible in Java version. But looking at the Python code, it seems that the implementation of "compatible based on metashare" hasn't even started yet. Starting from scratch is a bit challenging. Do you have any suggestions? I plan to start by implementing a simple meta_context, related resolvers, and serializers based on the protocol documentation.

urlyy avatar Apr 20 '25 15:04 urlyy

Hi @urlyy , you can start from MetaContext. The most compliated part is encoding class schema as org.apache.fury.meta.ClassDef. But currently ClassDef only support pure java serialization, we need to support encode struct fields schema into ClassDef first for xlang serialization format based on our spec def: https://fury.apache.org/docs/specification/fury_xlang_serialization_spec#schema-evolution. I will finish this in next days, you can take it as an example for python

chaokunyang avatar Apr 20 '25 16:04 chaokunyang

@chaokunyang Hi, I just read through the specification document and have some questions to ask:

  1. meta_share is used when a Fury object processes some objects of the same class multiple times. So if I use a Fury object sometimes deal with the parent class and sometimes the subclass, does that mean I shouldn't use meta_share? Are there any other situations where I need to be compatible but should not use meta_share?
  2. Is schema evolution the implementation approach of "type forward/backward compatibility"?
  3. Does this issue involve codegen?

Thank you!

urlyy avatar Apr 27 '25 19:04 urlyy

I don't find an example in MetaSharedCompatibleTest.class which has fury.withLanguage(Language.XLANG), so I add this, and add fury.register(beanA), fury.register(beanB), but test failed when xwrite in fury1.serialize(beanA);. I don't know if it's my test code incorrect or the implementation of XLANG & COMPATIBLE is incorrect.

Here is my code. Is there someone can help me?

@Test
  public void testWriteCompatibleCollectionSimple() throws Exception {
    BeanA beanA = BeanA.createBeanA(2);
    String pkg = BeanA.class.getPackage().getName();
    String code =
        ""
            + "package "
            + pkg
            + ";\n"
            + "import java.util.*;\n"
            + "import java.math.*;\n"
            + "public class BeanA {\n"
            + "  private List<Double> doubleList;\n"
            + "  private Iterable<BeanB> beanBIterable;\n"
            + "  private List<BeanB> beanBList;\n"
            + "}";
    Class<?> cls1 =
        loadClass(
            BeanA.class,
            code,
            MetaSharedCompatibleTest.class + "testWriteCompatibleCollectionBasic_1");
    Fury fury1 =
        furyBuilder()
            .withCodegen(false)
            .withMetaShare(true)
            .withLanguage(Language.XLANG)
            .withCompatibleMode(CompatibleMode.COMPATIBLE)
            .withClassLoader(cls1.getClassLoader())
            .build();
    fury1.register(beanA.getClass(), "test.BeanA");
    fury1.register(BeanB.class, "test.BeanB");
    code =
        ""
            + "package "
            + pkg
            + ";\n"
            + "import java.util.*;\n"
            + "import java.math.*;\n"
            + "public class BeanA {\n"
            + "  private List<Double> doubleList;\n"
            + "  private Iterable<BeanB> beanBIterable;\n"
            + "}";
    Class<?> cls2 =
        loadClass(
            BeanA.class,
            code,
            MetaSharedCompatibleTest.class + "testWriteCompatibleCollectionBasic_2");
    Object o2 = cls2.newInstance();
    ReflectionUtils.unsafeCopy(beanA, o2);
    Fury fury2 =
        furyBuilder()
            .withCodegen(false)
            .withMetaShare(true)
            .withLanguage(Language.XLANG)
            .withCompatibleMode(CompatibleMode.COMPATIBLE)
            .withClassLoader(cls2.getClassLoader())
            .build();
    fury2.register(beanA.getClass(), "test.BeanA");
    fury2.register(BeanB.class, "test.BeanB");
    MetaContext context1 = new MetaContext();
    MetaContext context2 = new MetaContext();
    fury1.getSerializationContext().setMetaContext(context1);
    byte[] objBytes = fury1.serialize(beanA);
    fury2.getSerializationContext().setMetaContext(context2);
    Object obj2 = fury2.deserialize(objBytes);
    Assert.assertTrue(ReflectionUtils.objectCommonFieldsEquals(obj2, o2));
  }

urlyy avatar Apr 28 '25 09:04 urlyy

@chaokunyang Hi, I just read through the specification document and have some questions to ask:

  1. meta_share is used when a Fury object processes some objects of the same class multiple times. So if I use a Fury object sometimes deal with the parent class and sometimes the subclass, does that mean I shouldn't use meta_share? Are there any other situations where I need to be compatible but should not use meta_share?
  2. Is schema evolution the implementation approach of "type forward/backward compatibility"?
  3. Does this issue involve codegen?

Thank you!

Hi @urlyy , for your quesiont:

  1. If you use Fury to deal with the parent class, fury need to write meta for parent class. If you use Fury to deal with the subclass, fury will write meta for subclass, but note that meta for subclass includes meta for parent class
  2. You are right, schema evolution spec is the the implementation approach of "type forward/backward compatibility"
  3. This issue doesn't involve codegen,, we should implement codegen in another PR. Currently we don't implement codegen for xlang serialization in python. Codegen will be supported after we merge ComplexObjectSerializer and DataclassSerializer into one serializer.

chaokunyang avatar Apr 28 '25 15:04 chaokunyang

I don't find an example in MetaSharedCompatibleTest.class which has fury.withLanguage(Language.XLANG), so I add this, and add fury.register(beanA), fury.register(beanB), but test failed when xwrite in fury1.serialize(beanA);. I don't know if it's my test code incorrect or the implementation of XLANG & COMPATIBLE is incorrect.

Fury java hasn't support type forward/backward compatibility for xlang yet. I will implement that in next a few days.

chaokunyang avatar Apr 28 '25 15:04 chaokunyang

Hi @chaokunyang , I'm mapping spec to code, I have read at the field info block corresponding to the Meta share#Single layer class meta

|  field info: variable bytes   |
+-------------------------------+
| header + type id + field name |

but I don't understand the last line in green:

Image

After writing type id and filed name, a classId(or typeId?) is also written to buffer via fieldType.write(...), which seems to be not reflected in the spec. what is it for? Or does it reflected in the end spec paragraph?

Field order are left as implementation details, which is not exposed to specification, 
the deserialization need to resort fields based on Fury field comparator. 
In this way, fury can compute statistics for field names or types and using a more compact encoding.

Thank you!

urlyy avatar Apr 30 '25 07:04 urlyy

@chaokunyang Sorry, I have another question. I've started implementing python meta_share according to spec. But I'm new to cython. I found that _fury.py is pretty much the same as _serialization.pyx, and _serialization.pyx only has some cython definitions. And it seems that all the tests are using _serialization.pyx logic. So do I have to implement the same logic(Paste + minor modifications) in both codes? It's a bit of a hassle😨

urlyy avatar Apr 30 '25 17:04 urlyy

@urlyy Serialization is a complex, when the deserialization fail, we need to debug and parse data byte by byte, and it's not easy to do it in cython. So we implement a pure python version in _fury.py. This implementation is only used for debug. You can make a most simple implementation without any optimization. But it does still have some duplciated code.

chaokunyang avatar May 01 '25 16:05 chaokunyang

Hi @chaokunyang , I'm mapping spec to code, I have read at the field info block corresponding to the Meta share#Single layer class meta

The spec needs updates. I'm working on it in https://github.com/apache/fury/pull/2197. Still need some time.

chaokunyang avatar May 01 '25 16:05 chaokunyang

Hi @urlyy , I updated type meta spec in #2216 , and finished reference implementation in https://github.com/apache/fury/pull/2197. Please take a look.

chaokunyang avatar May 11 '25 15:05 chaokunyang

@chaokunyang I wan to use pyfury (not pyx, just py) to better debug, but got error:

TypeError: NoneSerializer.__init__()

Cannot convert Fury to pyfury._serialization.Fury

I comment some code to not import pyx version.

# try:
#     from pyfury._serialization import ENABLE_FURY_CYTHON_SERIALIZATION
# except ImportError:
#     ENABLE_FURY_CYTHON_SERIALIZATION = False

ENABLE_FURY_CYTHON_SERIALIZATION = False

and my test code is:

from pyfury import Fury, Language
from  pyfury.meta.meta_share import CompatibleMode

def test_xlang_metashare():
    fury = Fury(language=Language.XLANG, ref_tracking=True, compatible_mode=CompatibleMode.COMPATIBLE)
    binary = fury.serialize(-1.0)

Since using Cython pyx requires recompilation every time it's updated and makes debugging difficult, I still want to start with a simple Python version and then switch to Cython.

Can you teach me how to run a pure Python implementation?

urlyy avatar May 20 '25 09:05 urlyy

set env variable ENABLE_FURY_CYTHON_SERIALIZATION to 0 will use python for serialization

chaokunyang avatar May 20 '25 11:05 chaokunyang

In pyfury, sometimes it's "class" and sometimes it's "type". Are there any naming conventions? I often confused about naming now.

urlyy avatar May 31 '25 09:05 urlyy

We plan to transition to type gradually, new code should already use type instead. It's more common in cross language context. Languages like go/rust only use type for declare class

chaokunyang avatar May 31 '25 09:05 chaokunyang