Experiment: Allow to share C utility code
[!NOTE]
This PR is purely experimental. PR contains some basic set of tests which are passing. There is failing test with compilation of shared library using limited API (this is known and ignored for now)
This PR implements offloading C Utility code to shared library. The saved size depends on the amount of utility code used in the compiled module. My experiment shows that compiling ParseTreeTransforms.py will save ~50 kB with default setuptools build and ~7kB when debug symbols are removed from resulting .so file (strip -S ParseTreeTransforms.cpython-311-x86_64-linux-gnu.so)
Usage
Building is the same as in curent branch described here: https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html#shared-utility-module
Potential improvements
- Current implementation focuses allowing to offload C utility code to shared library. Hence, I avoided more complex refactoring of C utility code. Hence, several other functions could be offloaded after some utility code restructuring. E.g.
__Pyx_State_RemoveModulecould be moved to shared utility module after moving to separate utility section + small other changes: https://github.com/cython/cython/blob/b67b7ce6e6788f52ae75630c1e42abedabee80e0/Cython/Utility/ModuleSetupCode.c#L3001 - Several function declarations are encapsulated within macros - e.g. https://github.com/cython/cython/blob/b67b7ce6e6788f52ae75630c1e42abedabee80e0/Cython/Utility/ObjectHandling.c#L411-L421 For those more complicated code generation is needed (but theoretically can be done).
Downsides
- Not sure whether the saved space is worth of code complexity. Maybe if shared module would be distributed as separate library shared across multiple projects as dependency would be worth it but this opens another can of worms (maintenance of shared library etc...)
- Not all shared functions can be offloaded to shared module. Shared functions must not be called before this line:
Otherwise compilation is successful but loading the module will end up with segmentation fault./*--- Global type/function init code ---*/ (void)__Pyx_modinit_global_init_code(__pyx_mstate); (void)__Pyx_modinit_variable_export_code(__pyx_mstate); (void)__Pyx_modinit_function_export_code(__pyx_mstate); if (unlikely((__Pyx_modinit_type_init_code(__pyx_mstate) < 0))) __PYX_ERR(0, 1, __pyx_L1_error) if (unlikely((__Pyx_modinit_type_import_code(__pyx_mstate) < 0))) __PYX_ERR(0, 1, __pyx_L1_error) (void)__Pyx_modinit_variable_import_code(__pyx_mstate); (void)__Pyx_modinit_function_import_code(__pyx_mstate); // <= After calling this function we can start using shared module functions /*--- Execution code ---*/
Let me know what do you think about it. Is the complexity worth the gains?
I haven't yet had a look at the implementation:
Several function declarations are encapsulated within macros - e.g.
In a lot of cases those are functions that shouldn't be shared. __Pyx_PyObject_Dict_GetItem is a good example in that it's really simple (so no real space saving available) and is always intended to just be expanded inline. The same applies for a lot of the list/tuple indexing functions.
__Pyx_State_RemoveModule
The module-state handling functions are a good example of a few different complications:
- They only exist when
CYTHON_USE_MODULE_STATE != 0so we need to be selective about loading them. - With
CYTHON_USE_MODULE_STATE ==0, looking up the module state should be really quick (because it's just reading a global variable) so we don't want to make that worse for that case because it's used everywhere. - They use a C global
staticdata store that definitely shouldn't be shared between modules (although some of the functions that operate it definitely could).
Maybe if shared module would be distributed as separate library shared across multiple projects as dependency would be worth it but this opens another can of worms (maintenance of shared library etc...)
There's also handling the combination of compile-time options: CYHON_USE_MODULE_STATE and limited API are probably the two big ones that both have real uses and change the code quite a bit.
I think mostly we'd just need to be selective about what code was allowed to be shared.
Yes as I said, offloading C utility code is much more "low level" which can easily go wrong. I did not study C functions in deep so take my examples with pinch of salt. (At least the tested functions should work :-D). My point was to show that there are possible other functions but in current implementation they cannot be offloaded due limits of the implementation. That said, I am not 100% convinced that this PR is good idea but I spent a lot of time playing with it so I decided to share the results...
I haven't yet had a look at the implementation
Please keep in mind that I did try to inject offloading logic as clean as possible but the code is not 100% finished - some parts can be done better and naming is bad.
Edit: yes and also compilation produces some warnings about not used functions. For now I decided not to fix the warnings...
It actually looks simpler than I expected, which suggests it probably is worth doing. It also looks like something that could be expanded gradually (which is always nice for not rewriting everything at once)
(I think it would be possible to get almost all of CythonFunction.c, Coroutine.c, and AsyncGen.c shared. But they're a bit different from the rest of the C code so it would be better to do then separately)
Several function declarations are encapsulated within macros
That's fine and is done for performance reasons. Macros and inline functions should not be moved to the shared module but always stay in the user code module. (EDIT: but they can obviously also be used in the shared module.)
Not sure whether the saved space is worth of code complexity
Let's consider this a feature for larger Cython based packages with many modules, all built at the same time with the same Cython version and an easy way to distribute the shared module as part of the distribution. Anything beyond that is really out of scope, at least for another while, and that's perfectly ok.
I agree that we should start with the infrastructure implementation and the obvious/easy utility code sections, and then expand on them in separate refactoring PRs.
I created such PR but I missed your comments here so the PR needs to be reworked...
There's also handling the combination of compile-time options:
CYHON_USE_MODULE_STATEand limited API are probably the two big ones that both have real uses and change the code quite a bit.
Whether the Limited API is used seems something to enable for a complete Python package and not a single module, at least if utility code sharing is used. But flags like CYTHON_USE_MODULE_STATE can reasonably be enabled or disabled at a per-module level and must not conflict between modules when sharing utility code. I think that's a restriction we can dare to enforce. Or at least document mixed flags as undefined behaviour.