pyo3 icon indicating copy to clipboard operation
pyo3 copied to clipboard

convert module initialization to use multi-phase-init

Open davidhewitt opened this issue 8 months ago • 12 comments

Part of #2274, inspired by all of #2245, #4081, and #4162

This converts the module initialization itself to use module slots and multi-phase init APIs. Module state is deliberately not covered here and left for follow ups. I also don't change anything related to subinterpreter checking, that is also left for a follow up.

davidhewitt avatar May 14 '25 09:05 davidhewitt

Note to self: this might need to be opt-out (for a few versions) and be mentioned in the migration guide.

davidhewitt avatar May 30 '25 16:05 davidhewitt

I like this very much!

Is this compatible with how we are currently putting pyclass' type objects in statics? We should figure out a way to store them in module state if possible.

mejrs avatar May 30 '25 17:05 mejrs

I think it's sorta fine to leave the type objects in statics for now given we don't support subinterpreters, but yes:

  • we should seek to use module state if we can anyway as a followup
  • and we'd need to do so for any potential eventual subinterpreter support

davidhewitt avatar May 30 '25 18:05 davidhewitt

I suppose eventually we'll want to allow users to have their own module state as well for their statics :-/

alex avatar May 30 '25 18:05 alex

I think it's sorta fine to leave the type objects in statics for now given we don't support subinterpreters, but yes:

* we should seek to use module state if we can anyway as a followup

* and we'd need to do so for any potential eventual subinterpreter support

This will also put the nail in the coffin for https://github.com/PyO3/pyo3/issues/1444. I'm fine with that personally, but it'll be disappointing for quite a lot of people. Maybe we should start thinking about other ways to address that use case.

mejrs avatar May 30 '25 18:05 mejrs

I agree, I think there should be ways to take inspiration from how pybind11 makes that possible, with the caveat that Rust's lack of stable ABI makes things more tricky.

RE this PR - it looks like PyPy does not support all the APIs for multi phase init of the submodules, I will see if there's a way I can make that work with some conditional code. (🤮)

davidhewitt avatar Jun 11 '25 13:06 davidhewitt

We should also ping the PyPy folks (h/t @mattip) with the APIs that are missing so that we can eventually remove the special cases.

alex avatar Jun 11 '25 13:06 alex

PyPy does not support all the APIs for multi phase init of the submodules

I thought PyPy does support multi phase init, which APIs did I miss?

EDIT: I see a missing PyModule_ExecDef here but that should be PyPyModule_ExecDef. Maybe there needs to be a #[cfg_attr(PyPy, link_name = "PyPyModule_ExecDef")] decorator?

mattip avatar Jun 11 '25 13:06 mattip

Yep PyModule_ExecDef our definition is wrong, but we also need PyModule_FromDefAndSpec2 which I couldn't see in the pypy exports.

davidhewitt avatar Jun 11 '25 15:06 davidhewitt

I implemented the missing PyModule_FromDefAndSpec2 and PyModule_FromDefAndSpec on latest pypy-3.11 HEAD. With that, I will point out that cython does not use these low-level interfaces, rather it uses PyModuleDef_Init. Here is a small example in C that demonstrates a multiphase module with PyModuleDef_Init from the PyPy tests

C code using multiphase initialization
    #include <Python.h>

    
        static PyModuleDef multiphase_def;

        static PyObject* multiphase_create(PyObject *spec, PyModuleDef *def) {
            PyObject *module = PyModule_New("altname");
            PyObject_SetAttrString(module, "create_spec", spec);
            PyObject_SetAttrString(module, "create_def_eq",
                                   PyBool_FromLong(def == &multiphase_def));
            return module;
        }

        static int multiphase_exec(PyObject* module) {
            Py_INCREF(Py_True);
            PyObject_SetAttrString(module, "exec_called", Py_True);
            return 0;
        }

        static PyModuleDef_Slot multiphase_slots[] = {
            {Py_mod_create, multiphase_create},
            {Py_mod_exec, multiphase_exec},
            {0, NULL}
        };

        static PyModuleDef multiphase_def = {
            PyModuleDef_HEAD_INIT,                      /* m_base */
            "multiphase",                               /* m_name */
            "example docstring",                        /* m_doc */
            0,                                          /* m_size */
            NULL,                                       /* m_methods */
            multiphase_slots,                           /* m_slots */
            NULL,                                       /* m_traverse */
            NULL,                                       /* m_clear */
            NULL,                                       /* m_free */
        };
        

    PyMODINIT_FUNC
    PyInit_multiphase(void) {
    
        return PyModuleDef_Init(&multiphase_def);
        
    }
testing code
import multiphase
assert multiphase.create_spec
assert multiphase.create_spec is multiphase.__spec__
assert multiphase.create_def_eq
assert multiphase.exec_called

mattip avatar Jun 13 '25 02:06 mattip

Thanks, that's perfect, I will seek to retest with pypy nightly.

I will point out that cython does not use these low-level interfaces, rather it uses PyModuleDef_Init.

Indeed, we primarily use that here too, but there are other code paths which directly create module objects from the definitions now written as multi-phase (e.g. submodule objects added to a root module during the exec phase)

davidhewitt avatar Jun 13 '25 07:06 davidhewitt

Cool.

I will seek to retest with pypy nightly.

Sorry, I was a bit too late for last night's build, it will only appear tomorrow.

mattip avatar Jun 13 '25 07:06 mattip

Superseded by #5525

davidhewitt avatar Oct 19 '25 08:10 davidhewitt