[Roadmap] Full support for fory-cpp
Feature Request
I find that the community is developing very vigorously. Almost every language has its corresponding PR being generated, but the cpp seems not to have been updated for a long time
Is your feature request related to a problem? Please describe
So I plan and promote the development of the cpp module
Describe the solution you'd like
- I hope to upgrade the build tool of the cpp module first and upgrade bazel
- I will gradually implement some cross-language logic of the cpp module
- I hope that pyfory can use nanobind to share the calls of the cpp library, which means that the underlying code of the two modules is shared instead of using cython
- I hope to further enhance the execution efficiency of pyfory, such as simd and so on
- I hope to integrate fory into more modules for use in areas such as reasoning, distributed computing, and so on
Describe alternatives you've considered
No response
Additional context
No response
If anyone in the community has ideas or would like to offer support, feel free to join the discussion!
Some things that came in my head, which needs be done:
- [ ] Make arrow integration optional, not all users needs this integration
- [ ] Make fory cpp reflection system decoupled from arrow type system, @PragmaTwice added the first version of cpp reflection implementation, I'm not sure whether he still has time for this.
- [ ] Add cmake build support, since most cpp users use cmake for build
- [ ] Add type resolver for object graph in c++
- [ ] Add reference resolver for object graph in c++
- [ ] Build a macro/template based compile-codegen system for fory-cpp. Note that fory cpp use c++ 17, and we won't use c++ 20. Build compile-reflection systemm may take some efforts.
- [ ] Support serialization for basic types
- [ ] Support serialization for nested collection/map/set types using compile-codegen module
- [ ] Support serialization for struct and nested struct using compile-codegen module
- [ ] Support type meta encoding for schema evolution
- [ ] Support schema evolution serialization based on type meta, we need to generate two version of deserializaion code at compile-time
- [ ] Support circular and shared reference such as shared_ptr, weak_ptr, unique_ptr
Some things that came in my head, which needs be done:
- [ ] Make arrow integration optional, not all users needs this integration[ ] Make fory cpp reflection system decoupled from arrow type system, @PragmaTwice added the first version of cpp reflection implementation, I'm not sure whether he still has time for this.[ ] Add cmake build support, since most cpp users use cmake for build[ ] Add type resolver for object graph in c++[ ] Add reference resolver for object graph in c++[ ] Build a macro/template based compile-codegen system for fory-cpp. Note that fory cpp use c++ 17, and we won't use c++ 20. Build compile-reflection systemm may take some efforts.[ ] Support serialization for basic types[ ] Support serialization for nested collection/map/set types using compile-codegen module[ ] Support serialization for struct and nested struct using compile-codegen module[ ] Support type meta encoding for schema evolution[ ] Support schema evolution serialization based on type meta, we need to generate two version of deserializaion code at compile-time[ ] Support circular and shared reference such as shared_ptr, weak_ptr, unique_ptr
Yes, these are all very good ideas!
- All these are functions that need to be gradually implemented in the future. However, I have another idea. In fact, pyfory can share a code logic with the cpp module and then call it using methods such as nanobind. This can further upgrade the old cython system and update the API at the same time.
- Because I think very few people would directly call the c++ serializer. Such users seem not to be too many.
- However, pyfory calls the cpp module at the bottom layer. I'm not sure if this compatibility is sufficient?
- And pyfory still has greater potential for development, such as being integrated into ray? Or it can be used in various scientific computations, which are all very good ideas!
- Also, what kind of thinking would the fory cpp reflection system have?
- Will fory, in the future, follow arrow's example and implement inter-memory sharing?
@pandalee99 C++ don't have runtime reflection system. For serialization in c++, you must generate code at compile-time. The reflection system should do that. C++ 17 doesn't have reflection support, C++ 20 does have. But we need to keep c++17 for a broader users targets. And to implement reflection in c++17, you need to combine macro and templlate meta programming. @PragmaTwice has did some work in #1144. You can take a look at it if you are interested.
For other questions:
- inter-memory sharing is not our target, it's not the goal of a serialization framework. It's more like some feature of an obejct store. And arrow does't support share memory, it's just buffers. It's the plasma that supports shared memory, but plasma has been deperacated. Or users allocate a buffer with share memory and write arrow data into it. In such cases, the share memory is done by users instead of by arrow. Another thing is that the object graph format can't be used for random access, we have compression in the format. Even we have some kind of share memory, it doesn't provide any gains.
- The target of pyfory is to provide a drop-in replacement for pickle and cloudpickle. All python projects can benefit from this library.
- pyfory already call into cpp in the current implemention, the
_serialization.pyxalready used some c++ utils and libraries. - As for nanobind, it should be faster and give more chances to share codebase between c++ and python implementation. But it would be great if we can make a POC and do some benchmark first.
finished in #2908