pybind11 fix: Make non-embedded module clear their locals before deallocation

Description

This fix aims to solve the issue #3776. The problem was that every non-embedded module had their own locals, which could only be accessed through the get_local_internals function of the shared library they were in. So when these modules were deallocated, their locals were not cleared, and if the modules were initialized again, C++ types could not be registered in the locals. I used the PyModuleDef::m_free field to register a cleanup callback which clears the locals on non-embedded module deallocation. I added another function for creating embedded module, so that embedded modules do not clear the locals on deallocation. I also registered a local class in the external_module used for testing. This new line was making the "Restart the interpreter" test case fail.

Mar 12 '22 19:03 StarQTius

Tagging @jbms: is there a chance that you could look at this PR (and my uncertainties under https://github.com/pybind/pybind11/pull/3798#discussion_r829737887)?

Mar 18 '22 07:03 rwgk

I wasn't too clear on exactly when the PyModuleDef.m_free function is called, so I did a bit of experimentation. My conclusion is that this change is NOT safe.

First let's consider just the single interpreter case:

In the most common case, you import some pybind11 modules, and don't do anything funny. In that case, the PyModuleDef.m_free functions of all the modules will be called during interpreter finalization. In that case it is probably fairly safe --- even if the locals are shared by multiple modules, clearing the locals multiple times doesn't really hurt anything, as long as none of the module free functions/destructors invoked by pybind11 attempt to use any pybind11 apis.

However, it appears that the PyModuleDef.m_free function can be called before finalization, if you import the extension, remove it from sys.modules, then reload it. However, note that even though m_free is called, the PyInit_modulename function is not called again. Here is a test case I put together:

mymodule.cc:

#include <pybind11/pybind11.h>
#include <stdio.h>

namespace py = ::pybind11;

struct MyClass {
  std::string get_name() { return "myclass"; }
};

PYBIND11_MODULE(mymodule, m) {
  m.doc() = "pybind11 example plugin";  // optional module docstring

  py::class_<MyClass>(m, "MyClass", py::module_local())
      .def(py::init([] { return MyClass{}; }))
      .def("get_name", &MyClass::get_name);

  fprintf(stderr, "Loading mymodule\n");
  fflush(stderr);
}

test.py:

import sys

import mymodule
cls = mymodule.MyClass()
assert cls.get_name() == "myclass"

del sys.modules['mymodule']
del mymodule
import mymodule
print('Reloaded module')
cls = mymodule.MyClass()
assert cls.get_name() == "myclass"

I also modified the m_free function to print a message.

The result with Python 3.9.9 is:

Loading mymodule
m_free called
Reloaded module
zsh: segmentation fault  python3 test.py

The crash happens in:

pybind11::class_<MyClass>::init_instance(pybind11::detail::instance*, void const*)

Presumably due to attempting to reference a type record that has been already deallocated.

This may be a bug in Python.

I don't really have any experience with starting and finalizing the Python interpreter multiple times in a single process, but from what I have seen when looking closely at the Python finalization process it is not really properly supported by Python --- some memory is sure to be leaked, especially if threads are used. Therefore I wouldn't recommend doing that in production code, and even doing it in tests seems kind of questionable since why test something that doesn't match what happens in production?

It appears there is support for multi-phase initialization intended to better support module reloading, and also sub-interpreters: https://docs.python.org/3/c-api/module.html#multi-phase-initialization

We might consider changing over to using that (which would probably require API changes in pybind11) , though my understanding is that there are still some Python APIs where the module state cannot easily be obtained, which would make it challenging to use this module state mechanism. Additionally, I think it would require major API changes in pybind11 to support that.

I think the objective of this PR, fixing module_local registration when restarting the interpreter, could be solved by instead arranging to clear the local internals only when the interpreter is finalized, or after the interpreter is restarted, rather than using m_free. I'm not sure what the best way to do that would be --- maybe adding an atexit handler would work, although that would add some overhead. Alternatively we could attempt to detect if the interpreter has been restarted, and clear the locals only in that case. I'm not sure how to do that exactly.

More generally, I think it would be best if module_local went away, and instead casters and exception translators could be overridden via explicitly passed-around context objects or something, so that there is no hidden semi-global state dependent on symbol visibility.

Mar 18 '22 15:03 jbms

I think the objective of this PR, fixing module_local registration when restarting the interpreter, could be solved by instead arranging to clear the local internals only when the interpreter is finalized, or after the interpreter is restarted, rather than using m_free. I'm not sure what the best way to do that would be --- maybe adding an atexit handler would work, although that would add some overhead. Alternatively we could attempt to detect if the interpreter has been restarted, and clear the locals only in that case. I'm not sure how to do that exactly.

Good thing Python has an auditing event just for this: https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx

Mar 18 '22 16:03 Skylion007

Upon looking at it, atexit is a viable alternative, but I do dislike having to manage yet another global static state as would be needed with either solution.

Mar 18 '22 16:03 Skylion007

Therefore I wouldn't recommend doing that in production code, and even doing it in tests seems kind of questionable since why test something that doesn't match what happens in production?

The reason of writing tests in C++ instead of python is because of debugging, gtest + Visual Studio or Xcode is a lot better and faster than debugging with pytest + pdb.

Mar 18 '22 17:03 jasjuang

You can run the Python executable itself in a debugger --- obviously works better if you have the debugging symbols, though.

What I mean, though, is that running Py_Initialize more than once is not something well-tested or well-supported by CPython, and there will be a lot of behavior differences compared to running Python normally. For example, some state may stick around from a previous interpreter run, and therefore a test may pass even if the functionality does not actually work in normal Python usage, or a test may fail even though the functionality does work in normal Python usage.

Of course some tradeoffs have to be made to make testing more efficient, though.

Mar 18 '22 17:03 jbms

It seems problematic if there are a lot of behavior differences between Python and CPython. I think most users (at least me) would implicitly assume that CPython is a C version of Python, therefore their behavior should ideally be identical. If the consensus is they shouldn't be identical, is there a README somewhere that documents the behavior differences between Python and CPython that I can read?

Mar 18 '22 18:03 jasjuang

CPython is the main implementation of Python --- PyPy is another less widely used implementation. I mentioned CPython because I have some familiarity with its implementation, while I have essentially no familiarity with PyPy.

Mar 18 '22 18:03 jbms

General comment: we (mostly CPython) is sending people into traps by giving the impression that restarting the interpreter is safe, well-defined behavior. It simply isn't. It probably never will be. It would be best if people gave up trying. It's just a trap, ultimately, that's burning human time, like here, for no good reason.

If it was just me, I'd put a static bool into pybind11, that made it impossible to initialize the interpreter a second time, terminating the process with a message pointing here:

https://docs.python.org/3/c-api/init.html#c.Py_FinalizeEx

Bugs and caveats: The destruction of modules and objects in modules is done in random order; this may cause destructors (__del__() methods) to fail when they depend on other objects (even functions) or modules. Dynamically loaded extension modules loaded by Python are not unloaded. Small amounts of memory allocated by the Python interpreter may not be freed (if you find a leak, please report it). Memory tied up in circular references between objects is not freed. Some memory allocated by extension modules may not be freed. Some extensions may not work properly if their initialization routine is called more than once; this can happen if an application calls Py_Initialize() and Py_FinalizeEx() more than once.

A couple years ago I did some really simple experiments: Leaks!

https://github.com/rwgk/stuff/blob/71f5300061aaecb0f499aab77343abd4ca360c6c/noddy/embedded_noddy_main_run.c

Mar 18 '22 18:03 rwgk

pybind11 pybind11 copied to clipboard

fix: Make non-embedded module clear their locals before deallocation

Description

pybind11
pybind11 copied to clipboard