pybind11 icon indicating copy to clipboard operation
pybind11 copied to clipboard

Bus error when importing numpy from a separate thread.

Open RealLast opened this issue 3 years ago • 3 comments

Required prerequisites

  • [X] Make sure you've read the documentation. Your issue may be addressed there.
  • [X] Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
  • [X] Consider asking first in the Gitter chat room or in a Discussion.

Problem description

I have a very simple program in which I want to embed python code. For this, I use py::initialize_interpreter() to start the interpreter. Afterwards, I directly want to import numpy. Consider the following code:

#include <iostream>
#include <thread>

#include <pybind11/embed.h>
namespace py = pybind11;

void test()
{
    py::initialize_interpreter(); // if I understand correctly, the GIL is still locked by now, thus we can safely call python.
    py::module_* mod = new py::module_(py::module_::import("numpy"));
    while(true); // block the function
}

This works well, if I call this function from the main thread:

int main()
{
    test();
    while(true);
}

However, if I call the same function from a different thread, it does not work. The following yields a bus error, when importing numpy.

int main()
{
    std::thread testThread(&test);
    while(true);
}

I am building the application using cmake:

set (CMAKE_CXX_STANDARD 14)
add_subdirectory(pybind11)
add_executable(test main.cpp)
include_directories(${CMAKE_CURRENT_LIST_DIR}/pybind11/include)
target_link_libraries(test PRIVATE pybind11::embed)

I understand, that using Python functions from different thread is not trivial, and that you need to manage the GIL properly. However, this is a very simple test case. All I want to do is to simply start and use the Python interpreter in a separate thread. No other thread is using the interpreter at the same time. The GIL should still be locked after py::initialize_interpreter() (adding a py::gil_scoped_acquire after py::inizialize_interpreter does not solve the issue). Thus, the only thing I can imagine is that the python interpreter needs to run in the main thread. But if so, why is this the case? And also, this only happens when importing numpy. It does not occure when importing "sys" or "os" for example.

Thanks and best regards

Reproducible example code

// Consider the following code.
// We add a function, that initializes the interpreter and imports numpy.
// If we call this function from the original thread in main, it works.
// If we call it from a separate thread, it does not work.

#include <iostream>
#include <thread>

#include <pybind11/embed.h>
namespace py = pybind11;

void test()
{
    py::initialize_interpreter();

    py::module_* mod = new py::module_(py::module_::import("numpy"));
    printf("done\n");
    while(true);
}

// The following works
int main()
{
    test();
    while(true);
}


// This does not work (bus error).
int main()
{
    std::thread testThread(&test);
    while(true);
}

RealLast avatar Oct 14 '22 09:10 RealLast

I can't reproduce this bug locally using gcc 9.4 and Python 3.10.6 on Ubuntu 20

Can you tell us more about your environment where you have the bug? What compiler, OS, and Python version are you having the bug with?

EthanSteinberg avatar Oct 22 '22 04:10 EthanSteinberg

Hello,

thank you very much for your response.

I am using "apple clang 13.1.6" as compiler. Python version is Python 3.9.9. The OS is Mac OS 12.6 Monterey and I am working on an Apple Macbook 2021 M1 (ARM).

Best regards

RealLast avatar Oct 24 '22 09:10 RealLast

Hello.

(You've probably moved on to other things, but I came across this same problem and want to leave my solution here.)

I am not sure but this could be due to the small stack size for threads on MacOS. The main thread gets 8Mb, subthreads 512K by default. On Linux I understand that sub-threads are also typically given 8Mb, but it is platform specific. I have a project that runs the python interpreter in a std::thread which was hitting a bus error when importing some larger modules. It turns out the stack in my python thread consumes about 1Mb ... I know very little about Python and the stack so I can't explain this though. I'm still wondering if there is a problem with the python modules that are being imported. Anyway I found that increasing the stack size (you have to use pthread API rather than std::thread) solved the problem.

#include <iostream>
#include <pthread.h>

#include <pybind11/embed.h>
namespace py = pybind11;

static void *test(void *)
{
    py::initialize_interpreter();

    py::module_* mod = new py::module_(py::module_::import("numpy"));
    printf("done\n");
    return nullptr;
}

// This *may* work.
int main()
{
    pthread_t _thread;
    pthread_attr_t thread_attr;
    pthread_attr_init(& thread_attr);

    // you can query the default stack size like this
    size_t stacksize;
    pthread_attr_getstacksize(& thread_attr, stacksize);

    // on MacOS Default stack size is 524288, let's try 10x that
    pthread_attr_setstacksize(& thread_attr, 524288*10);

    pthread_create(&_thread, & thread_attr, test, nullptr);
    pthread_join(_thread, NULL);
}

tedwaine avatar Mar 06 '25 12:03 tedwaine