pybind11 icon indicating copy to clipboard operation
pybind11 copied to clipboard

[BUG] Fatal Python error: PyMUTEX_LOCK(gil->mutex) failed

Open coreyjadams opened this issue 2 years ago • 21 comments

Hi,

I've got a package that uses pybind11 (it's awesome, by the way), and had a few users report the following crash. I've been able to reproduce it myself as well. I've asked on the gitter site and had a good discussion with @quantotto but ultimately we came only down to speculation.

I've reduced the issue to a minimum reproducer, so hopefully this is possible to debug. It seems to be a somewhat hidden issue, doesn't appear in every version of python or compiler.

Issue description

When importing a package built with pybind11, the python libraries fail to load with the following error:

Fatal Python error: PyMUTEX_LOCK(gil->mutex) failed
Python runtime state: unknown

Abort trap: 6

This only seems to appear when using conda's python on Mac OS. I haven't reproduced it elsewhere. Suspect that python is built with clang 10 from conda, while the python package in question is built with clang11, and some incompatibility arises.

Reproducible example code

I am sorry I can not give you a more simple example. I've stripped it down as far as I think I can and still reproduce this.

See the repository here: larcv3-pybind11-example This uses scikit-build to call cmake and build a package including pybind11-generated python bindings.

Here's a list of instructions to reproduce this:

bash Miniconda3-latest-MacOSX-x86_64.sh #install a fresh conda
source miniconda3/bin/activate # activate it
conda install cmake # install cmake
pip install scikit-build # install scikit-build
git clone https://github.com/coreyjadams/larcv3-pybind11-example.git # clone the example
cd larcv3-pybind11-example/ 
git submodule update --init # clone pybind11 as a submodule
python setup.py build # compile
python setup.py install #install

Then, in a python interpreter you can do:

>>> from larcv import pylarcv
Fatal Python error: PyMUTEX_LOCK(gil->mutex) failed
Python runtime state: unknown

Abort trap: 6

This also appears to be related to this github issue: https://stackoverflow.com/questions/66026520/fatal-python-error-pymutex-lock-pyruntime-ceval-gil-mutex-failed

coreyjadams avatar Jul 07 '21 16:07 coreyjadams

Can you try otool -L larcv3.dylib and report? I had the same issue when linking against the wrong Python Framework library

melMass avatar Sep 05 '21 01:09 melMass

Has there been any progress on this issue? I've been experiencing the same bug for our software when people tried to install it in MacOS using anaconda (i.e. pip install proposal and import proposal)

Jean1995 avatar Oct 04 '21 14:10 Jean1995

I have also experienced this, building a pybind11 project on MacOS 11.6 with M1 (arm64) architecture and python built locally with pyenv (so no conda or anaconda). It's completely reproducible, but I need to try to reduce it to a minimal example...

cqc-alec avatar Oct 04 '21 14:10 cqc-alec

I have also experienced this, building a pybind11 project on MacOS 11.6 with M1 (arm64) architecture and python built locally with pyenv (so no conda or anaconda). It's completely reproducible, but I need to try to reduce it to a minimal example...

What does otool -L /path/to/lib returns?

I had the issue that in some instance the python exe used for building was hard linked on Mac!

melMass avatar Oct 04 '21 15:10 melMass

Thanks @melMass for the reminder. Using the library I posted above which is a reproducer, I get this:

This one is the python bindings:

$ otool -L /Users/corey.adams/miniconda3/lib/python3.8/site-packages/pybind11_test_symbols-1.0.0-py3.8-macosx-10.9-x86_64.egg/larcv/pylarcv.cpython-38-darwin.so
/Users/corey.adams/miniconda3/lib/python3.8/site-packages/pybind11_test_symbols-1.0.0-py3.8-macosx-10.9-x86_64.egg/larcv/pylarcv.cpython-38-darwin.so:
	@rpath/pylarcv.cpython-38-darwin.so (compatibility version 0.0.0, current version 0.0.0)
	@rpath/liblarcv3.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libpython3.8.dylib (compatibility version 3.8.0, current version 3.8.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 800.7.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
(base)

And, this is the base C++ library that is getting bound to python:

$ otool -L /Users/corey.adams/miniconda3/lib/python3.8/site-packages/pybind11_test_symbols-1.0.0-py3.8-macosx-10.9-x86_64.egg/larcv/lib/liblarcv3.dylib
/Users/corey.adams/miniconda3/lib/python3.8/site-packages/pybind11_test_symbols-1.0.0-py3.8-macosx-10.9-x86_64.egg/larcv/lib/liblarcv3.dylib:
	@rpath/liblarcv3.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libpython3.8.dylib (compatibility version 3.8.0, current version 3.8.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 800.7.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
(base)

coreyjadams avatar Oct 04 '21 15:10 coreyjadams

In my case:

alec@Mac-mini pytket % otool -L pytket/_tket/circuit.cpython-38-darwin.so
pytket/_tket/circuit.cpython-38-darwin.so:
	@loader_path/libtket.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 905.6.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.100.5)

and

alec@Mac-mini pytket % otool -L pytket/_tket/libtket.dylib               
pytket/_tket/libtket.dylib:
	@loader_path/libtket.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 905.6.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.100.5)

cqc-alec avatar Oct 04 '21 16:10 cqc-alec

So I have produced a very minimal example to reproduce this issue: https://github.com/cqc-alec/pybind11-3081 The C++ and binder code is utterly trivial. The build commands are in the Makefile, which includes a hard-coded path to the pybind11 2.7.1 headers installed with conan on a Mac (M1). The output of make test is:

/Library/Developer/CommandLineTools/usr/bin/c++ -I. -stdlib=libc++ -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -fPIC -std=c++2a -o A.cpp.o -c A.cpp
/Library/Developer/CommandLineTools/usr/bin/c++ -stdlib=libc++ -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -dynamiclib -Wl,-headerpad_max_install_names -o libA.dylib -install_name @loader_path/libA.dylib A.cpp.o
/Library/Developer/CommandLineTools/usr/bin/c++ -I. -isystem /Users/alec/.conan/data/pybind11/2.7.1/_/_/package/5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9/include -isystem /Users/alec/.pyenv/versions/3.8.11/include/python3.8 -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -fPIC -fvisibility=hidden -std=c++2a -MD -MT binder.cpp.o -MF binder.cpp.o.d -o binder.cpp.o -c binder.cpp
/Library/Developer/CommandLineTools/usr/bin/c++ -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -bundle -Wl,-headerpad_max_install_names -Xlinker -undefined -Xlinker dynamic_lookup -o A.cpython-38-darwin.so binder.cpp.o -L. -lA  /Users/alec/.pyenv/versions/3.8.11/lib/libpython3.8.a
/Library/Developer/CommandLineTools/usr/bin/strip -x A.cpython-38-darwin.so
/Users/alec/.pyenv/versions/3.8.11/bin/python -c "from A import A"
Fatal Python error: PyMUTEX_LOCK(gil->mutex) failed
Python runtime state: unknown

make: *** [test] Abort trap: 6

(In order to construct this example I extracted these commands from a build system that uses conan and cmake under the hood.)

cqc-alec avatar Oct 05 '21 15:10 cqc-alec

I see that the problem is caused by the hard linkage with libpython3.8.a. If I omit that, it works! So this is looking very much not like a problem with pybind11, but perhaps with either conan or cmake.

cqc-alec avatar Oct 05 '21 17:10 cqc-alec

Hey @cqc-alec thanks for digging into this too. Interesting result - though, I tried in my reproducer and it's not so trivial to remove the link to python: I have direct calls to pybind11 in my normal code (equivalent to A.hpp and A.cpp) and this has to link to python.

I note that your libraries aren't directly linked to python above either, is that deliberate?

Overall, I'm very confused by this crash.

coreyjadams avatar Oct 05 '21 18:10 coreyjadams

@coreyjadams I guess my usage is different, in that my core C++ code knows nothing about python or pybind11. Whatever the actual cause of this crash, I believe my real problem is https://github.com/conan-io/conan-center-index/issues/6605 : the issue arose when I updated to the latest pybind11 conan package, which has the misfeature that it always links against the full list of targets -- including pybind11:embed which could explain the presence of this linkage.

cqc-alec avatar Oct 06 '21 07:10 cqc-alec

@coreyjadams I guess my usage is different, in that my core C++ code knows nothing about python or pybind11. Whatever the actual cause of this crash, I believe my real problem is conan-io/conan-center-index#6605 : the issue arose when I updated to the latest pybind11 conan package, which has the misfeature that it always links against the full list of targets -- including pybind11:embed which could explain the presence of this linkage.

I had the exact same setup when this error occured for me. I'll try to explain how I fixed it:

So I'm, like you, using conan to package all of my dependencies. I'm also using a python virtual env (using poetry) to match the py version I want to target.

Using the classic find_package(pybind11 REQUIRED) & then pybind11_add_module(XX MODULE $SRCS) lead to python being hardlinked causing the PyMutex_Lock.

To solve this and I'm not sure it's the best way but it works, was to add a custom CMake module:

FindPythonPyEnv.cmake
# Find informations about the current python environment.
# by melMass
#
# Finds the following:
#
# - PYTHON_EXECUTABLE
# - PYTHON_INCLUDE_DIR
# - PYTHON_LIBRARY
# - PYTHON_SITE
# - PYTHON_NUMPY_INCLUDE_DIR
#
# - PYTHONLIBS_VERSION_STRING (The full version id. ie "3.7.4")
# - PYTHON_VERSION_MAJOR
# - PYTHON_VERSION_MINOR
# - PYTHON_VERSION_PATCH
#
#

function(debug_message messages)
  # message(STATUS "")
  message(STATUS "🐍 ${messages}")
  message(STATUS "\n")
endfunction()

if (NOT DEFINED PYTHON_EXECUTABLE)
  execute_process(
    COMMAND which python
    OUTPUT_VARIABLE PYTHON_EXECUTABLE OUTPUT_STRIP_TRAILING_WHITESPACE
  )
endif()

execute_process(
  COMMAND ${PYTHON_EXECUTABLE} -c "from __future__ import print_function; from distutils.sysconfig import get_python_inc; print(get_python_inc())"
  OUTPUT_VARIABLE PYTHON_INCLUDE_DIR OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET
)

if (NOT EXISTS ${PYTHON_INCLUDE_DIR})
  message(FATAL "Python include directory not found.")
endif()

execute_process(
  COMMAND ${PYTHON_EXECUTABLE} -c "from __future__ import print_function; import os, numpy.distutils; print(os.pathsep.join(numpy.distutils.misc_util.get_numpy_include_dirs()))"
  OUTPUT_VARIABLE PYTHON_NUMPY_INCLUDE_DIR OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET
)

execute_process(
  COMMAND ${PYTHON_EXECUTABLE} -c "from __future__ import print_function; import distutils.sysconfig as sysconfig; print('-L' + sysconfig.get_config_var('LIBDIR') + '/' + sysconfig.get_config_var('LDLIBRARY'))"
  OUTPUT_VARIABLE PYTHON_LIBRARY OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET
)

execute_process(
  COMMAND ${PYTHON_EXECUTABLE} -c "from __future__ import print_function; import platform; print(platform.python_version())"
  OUTPUT_VARIABLE PYTHONLIBS_VERSION_STRING OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET
)

execute_process(
  COMMAND ${PYTHON_EXECUTABLE} -c "from __future__ import print_function; from distutils.sysconfig import get_python_lib; print(get_python_lib())"
  OUTPUT_VARIABLE PYTHON_SITE OUTPUT_STRIP_TRAILING_WHITESPACE ERROR_QUIET
)

set(PYTHON_VIRTUAL_ENV $ENV{VIRTUAL_ENV})
string(REPLACE "." ";" _VERSION_LIST ${PYTHONLIBS_VERSION_STRING})

list(GET _VERSION_LIST 0 PYTHON_VERSION_MAJOR)
list(GET _VERSION_LIST 1 PYTHON_VERSION_MINOR)
list(GET _VERSION_LIST 2 PYTHON_VERSION_PATCH)



debug_message("Found Python ${PYTHON_VERSION_MAJOR} (${PYTHONLIBS_VERSION_STRING})")
debug_message("PYTHON_EXECUTABLE: ${PYTHON_EXECUTABLE}")
debug_message("PYTHON_INCLUDE_DIR: ${PYTHON_INCLUDE_DIR}")
debug_message("PYTHON_LIBRARY: ${PYTHON_LIBRARY}")
debug_message("PYTHON_NUMPY_INCLUDE_DIR: ${PYTHON_NUMPY_INCLUDE_DIR}")

Let's say you put this CMake module in a cmake folder at the CMAKE_SOURCE_DIR root, this is how you make CMake aware of it:

list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake/")

To solve the issue you need to include it before using find_package(pybind11 REQUIRED), like so: image

AFAIK this is only happening on macOS I hope it will help

melMass avatar Oct 06 '21 10:10 melMass

To be more complete here is how I run the build:

poetry run sh build_osx.sh path/to/installdir

where build_osx.sh is

mkdir -p "_build"
cd "_build"
conan install .. -s build_type=Release --build=missing 
cmake .. -G "Ninja" -DCMAKE_INSTALL_PREFIX="$1" -DCMAKE_BUILD_TYPE=Release
cmake --build . --config release
cmake --install .

melMass avatar Oct 06 '21 10:10 melMass

Thank you @melMass , I will try this!

cqc-alec avatar Oct 06 '21 10:10 cqc-alec

Same issue, so I change the python3 from conda to normal one in '/usr/local/bin/python3', then fixed it. My laptop is MacOS with Intel chip.

prncoprs avatar Nov 10 '21 17:11 prncoprs

Same Issue on an Intel mac (10.15.7) with pybind11==2.9.1. I tried to add linker options, -undefined dynamic_lookup as this article and the solution commit to this issue thread suggested, but it didn't work.

Just migrating to python3.9 solved the issue.

HosikChae avatar Mar 15 '22 12:03 HosikChae

The minimal example given by @cqc-alec is fantastic: https://github.com/cqc-alec/pybind11-3081. I turned it into a cmake one. I changed the module's name to A_core for clarity. The contents of the new files (binder.cpp , CMakeLists.txt, test.py) are displayed by cat. Remember also to put the pybind11 folder.

Click me to see the details
~/demo ❯ ls                
A.cpp          CMakeLists.txt pybind11
A.hpp          binder.cpp     test.py
~/demo ❯ cat binder.cpp 
#include <pybind11/pybind11.h>
#include "A.hpp"

PYBIND11_MODULE(A_core, m) {
  pybind11::class_<A>(m, "A", "An A");
}
~/demo ❯ cat CMakeLists.txt
cmake_minimum_required(VERSION 3.4...3.18)
project(pybindtest)
set(CMAKE_BUILD_TYPE Debug)
add_subdirectory(pybind11)
pybind11_add_module(A_core binder.cpp)

add_library (libA A.cpp A.hpp)
TARGET_LINK_LIBRARIES(libA ${PYTHON_LIBRARIES})

target_link_libraries (A_core PRIVATE libA)

SET_TARGET_PROPERTIES(A_core
        PROPERTIES
                SUFFIX ".so"
)                                                                                
~/demo ❯ cat test.py       
from build import A_core                                                         
~/demo ❯ mkdir build       
~/demo ❯ cd build          
~/d/build ❯ cmake ..       
-- The C compiler identification is AppleClang 13.0.0.13000029
-- The CXX compiler identification is AppleClang 13.0.0.13000029
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- pybind11 v2.11.0 dev1
-- Found PythonInterp: /usr/local/anaconda3/envs/pybind/bin/python (found suitable version "3.10.4", minimum required is "3.6") 
-- Found PythonLibs: /usr/local/anaconda3/envs/pybind/lib/libpython3.10.dylib
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Performing Test HAS_FLTO_THIN
-- Performing Test HAS_FLTO_THIN - Success
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/luod/bio_tools/oxDNA/oxpy_test/minimal_bug/demo/build
~/d/build ❯ make           
[ 25%] Building CXX object CMakeFiles/libA.dir/A.cpp.o
[ 50%] Linking CXX static library liblibA.a
[ 50%] Built target libA
[ 75%] Building CXX object CMakeFiles/A_core.dir/binder.cpp.o
[100%] Linking CXX shared module A_core.so
[100%] Built target A_core
~/d/build ❯ cd ..        
~/demo ❯ python test.py    
[1]    54424 segmentation fault  python test.py

One can debug this in VS code using vscode-lldb with the following launch.json.

Click me to see the `launch.json` (remember to `Change the python path`)
{
            "type": "lldb",
            "request": "launch",
            "name": "LLDB Python test.py",
            "program": "/usr/local/anaconda3/envs/pybind/bin/python", // <Change the python path>
            "args": [
                "${file}"
            ],
            "cwd": "${workspaceFolder}",
            "stopOnEntry": true,
            "env": {
                // "PYTHONPATH": "/Users/luod/bio_tools/oxDNA/build/oxpy_test/python" // set PYTHONPATH if necessary 
            }
        },

In the screenshot below, I highlighted the call stack, the source code that raises this error, and the source code path.

image

The source code section is this:

https://github.com/pybind/pybind11/blob/8756f16ed842e40406018df901f3219b231e2105/include/pybind11/detail/internals.h#L411-L417

The full traceback towards the above code is:

From the user's App (the minimal example):

// This line in binder.cpp
PYBIND11_MODULE(A_core, m) {

https://github.com/pybind/pybind11/blob/8756f16ed842e40406018df901f3219b231e2105/include/pybind11/detail/common.h#L392

https://github.com/pybind/pybind11/blob/8756f16ed842e40406018df901f3219b231e2105/include/pybind11/detail/common.h#L307

Solution for this minimal example

The magic in this minimal example is that you can safely remove TARGET_LINK_LIBRARIES(libA ${PYTHON_LIBRARIES}) from the CMakeLists.txt and everything works.

Solution for generic user Apps

Not really sure at the moment. Most likely, the user needs to link to ${PYTHON_LIBRARIES} for their custom c++ lib.

RodenLuo avatar Sep 07 '22 06:09 RodenLuo

Interestingly enough, it seems in my specific case there in oxDNA repo issue #31, I can safely remove the linking (see below). Not sure why it's not causing any linking errors. Not sure if this applies to all other generic cases.

Removing the linking

before

TARGET_LINK_LIBRARIES(_oxpy_lib ${PYTHON_LIBRARIES} common)

after

TARGET_LINK_LIBRARIES(_oxpy_lib common)

RodenLuo avatar Sep 07 '22 07:09 RodenLuo

@RodenLuo I also experienced that I can remove some of the linking. I am using pybind11 via conan, which links to everything by default (see https://github.com/conan-io/conan-center-index/issues/6605), and experimented with this modified recipe (based on some suggestions found in the linked conan issue): https://github.com/scipp/scipp/pull/2792/files.

I am not certain yet that this is correct, but it seemed to pass most of the relevant part of our CI/builds.

SimonHeybrock avatar Sep 07 '22 07:09 SimonHeybrock

@RodenLuo I also experienced that I can remove some of the linking. I am using pybind11 via conan, which links to everything by default

The conan targets don't really link to anything, as there are no binaries to link to (pybind11 being a header only library). They only supply the path to the pybind11 headers.

It's pybind's CMake scripts that provide the logic to link in the Python library (or not). It shouldn't be linking in the Python library at all for a module, but it clearly does so.

For example, from Pybind11 CMake helpers, this causes libpython to be linked in:

# pybind11 method:
pybind11_add_module(MyModule1 src1.cpp)

Whereas this does not:

# Python method:
Python_add_library(MyModule2 src2.cpp)
target_link_libraries(MyModule2 pybind11::headers)

I'm not convinced this is anything to do with conan, but I have some more investigation to do. A draft PR for the conan recipe is available at https://github.com/conan-io/conan-center-index/pull/13283

planetmarshall avatar Oct 03 '22 16:10 planetmarshall

I recently experienced this issue too. My issue was caused by linking against different python libraries. E.g. my application code is using find_package(Python3 ...) whilst the library using pybind11 was using find_package(Python ...). These found different Python libraries.

This discussion provides an example of how to force the different Python modules to find the same version: https://discourse.cmake.org/t/feature-request-setting-find-package-versions-via-env-cmake-variables/4661/4

My specific issue was caused by setting Python3_ROOT_DIR but not Python_ROOT_DIR. Setting both resolved my issue.

0x6e avatar Oct 12 '22 11:10 0x6e