C-API Capsule and Cython Bindings
Description
I think this project is super cool and could benefit users who want to build off msgspec but at a lower level in C or in Cython. The Encoder/Decoder Classes would be a very cool thing to have in Cython and C for anyone who wants to implement more obscure protocols but at lower levels.
I've been thinking a lot about this, and as an off-topic it would be amazing to be able to integrate msgspec in custom written python packages which can be fully compiled with mypyc. Although they're not quite there yet, there have been mentions in the following issues: #1137, #1098.
A possible approach to this would be to move most of the content of _core.c into an header file, so that C and Cython extensions can import it, and then provide some utility methods to import the include files for compilation, similar to how numpy does. Then the regular msgspec package can simply export the header as a module.
I'm just wondering how much of a breaking change would that be.
I've been thinking a lot about this, and as an off-topic it would be amazing to be able to integrate
msgspecin custom written python packages which can be fully compiled withmypyc. Although they're not quite there yet, there have been mentions in the following issues: #1137, #1098.A possible approach to this would be to move most of the content of
_core.cinto an header file, so that C and Cython extensions can import it, and then provide some utility methods to import the include files for compilation, similar to hownumpydoes. Then the regularmsgspecpackage can simply export the header as a module.I'm just wondering how much of a breaking change would that be.
@jacopoabramo
I actually have a fork that I am attempting to do if this idea was accepted. I actually have a bit of experience with brining in C-API Capsules in cython/cpython and python since me and avestlov were coming up with one for multidict so that the http-parser and writer would run smoother in Cython for aiohttp.
I'll probably try using a similar setup used with that project in this one so that things can remain organized and so that the _core.c file doesn't get too overwhelmed with C-API Capsule bindings. I'll probably have it marked simillar to how _core.c currently has things marked.
I might be able to start a pull request soon for this depening on my IRL work schedule. But if I feel fully motivated to develop msgspec a c-api the quickest I can get to doing it is tomorrow or sometime next week.
As for testing the only costly downside will be adding another C module but really this would be tiny and a simple environment variable could skip compiling it and packing it with the wheels by default even though it would have to fit in msgspec's folder since there was no logical way for me to put a mesgspec._testcapi compiled binary elsewhere as it would always trigger workflow failures which is something I've learned from my past experiences with this.
Other than that it shouldn't be too costly to add to msgspec, afterwards and brining it to pytest (optionally) will be a piece of cake to add on in. Afterwards comes the fun part of giving msgspec an __init__.pxd cython module.
Also related https://github.com/cython/cython/issues/6672 I once attempted to get sqlalchemy + msgspec to properly mix as a SQLModel-like library but the problem had to do with strictness. Hopefully adding in the C-API may help eliminate a few problems with my past ideas.
Hey there! I'm intrigued by your proposal and by instinct is to accept every contribution that doesn't regress performance or UX. However, I'm still getting up to speed with this domain and low level CPython code in general. Could you please explain precisely what this would unlock with an example, and why it's impossible to do currently?
Hey there! I'm intrigued by your proposal and by instinct is to accept every contribution that doesn't regress performance or UX. However, I'm still getting up to speed with this domain and low level CPython code in general. Could you please explain precisely what this would unlock with an example, and why it's impossible to do currently?
@ofek It is currently impossible to access msgspec from a lower level or create custom sterilization objects at lower levels. My goal is to expose these types to a C-API capsule to help developers who want to do faster sterilizations of different object types as well as create more custom encode/decoder types. But also to help me possibly bring back an older idea I had of making it possible to make an ORM-Like library extension for users wanting to create SQL Tables with msgspec structures, I'm hoping that by putting it in at a C-API Level it could prevent performance regressions.
@Vizonex I'm as well not familiar with the python C-API so I'm flying blind in the hopes to learn lessons via osmosis. Why exactly a PyCapsule? Why not just a header file importable in a C extension? Is there an upside?
@Vizonex I'm as well not familiar with the python C-API so I'm flying blind in the hopes to learn lessons via osmosis. Why exactly a PyCapsule? Why not just a header file importable in a C extension? Is there an upside?
@jacopoabramo https://docs.python.org/3/c-api/capsule.html#c.PyCapsule I'll try and put it in simple terms but I would be giving msgspec similar features to numpy where it can be precompiled with other libraries that need it in C, Cython and pyo3 (or more). The upside is obtaining the functions msgspec uses and carrying them over to other modules at lower-levels making it so that things such as argument parsing can be skipped and functions are more directly used. Types will not appear the same unless we find a way to carry these types along elsewhere hence needing a Capsule to do it with.
@Vizonex ok, but a PyCapsule is used to exchange information between compiled extensions when Python is in the way. If you want to allow extensions (regardless of the language) to build on top of msgspec to create customized extensions then you don't need a PyCapsule, right?
In a C-extension compiled on top on numpy, it's as simple as
#include <numpy.h> // ... or whatever the include file is
Or am I missing something?
Types will not appear the same unless we find a way to carry these types along elsewhere hence needing a Capsule to do it with.
Sorry, I jumped the gun and the answer to my question should be this one. I'll need some time to elaborate it I guess
If anybody needs updates I have opened #961 I might need a reviewer pretty soon since I want to make sure that what I do is in the maintainer's interest.