pybind11 icon indicating copy to clipboard operation
pybind11 copied to clipboard

[BUG]: Import c-module from interpreter embedded in dlopen shared object raises undefined symbol (unix-only)

Open MatteoRagni opened this issue 3 years ago • 4 comments

Required prerequisites

  • [X] Make sure you've read the documentation. Your issue may be addressed there.
  • [X] Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
  • [ ] Consider asking first in the Gitter chat room or in a Discussion.

Problem description

TL;DR; The problem can be seen in a very specific setup: the interpreter is embedded in a shared library. The shared library is dlopened by an application. When the interpreter tries to import a python c-module, some python symbols are undefined and everything crashes. The solution is to dlopen the python library with RTLD_NOW | RTLD_GLOBAL.

I'm noticing the problem while writing a shared library with an exposed C interface that uses the embedded interpreter. If the functions defined in the shared object try to import libraries that imports numpy, or try to directly import numpy, the process fails with the following message:

terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ImportError: 


    https://numpy.org/devdocs/user/troubleshooting-importerror.html

  * The Python version is: Python3.8 from "/usr/bin/python3"
  * The NumPy version is: "1.20.3"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: /usr/local/lib/python3.8/dist-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyExc_RecursionError


At:
  /usr/local/lib/python3.8/dist-packages/numpy/core/__init__.py(51): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(848): exec_module
  <frozen importlib._bootstrap>(686): _load_unlocked
  <frozen importlib._bootstrap>(975): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap>(1050): _handle_fromlist
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap>(961): _find_and_load_unlocked

irb(main):003:1* module TestMain
=> #<FFI::Function address=0x00007f9d0ba43bb6>
irb(main):008:0> 
irb(main):009:0> TestMain.main
terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ImportError: 

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.8 from "/usr/bin/python3"
  * The NumPy version is: "1.20.3"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: /usr/local/lib/python3.8/dist-packages/numpy/core/_multiarray_umath.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyExc_RecursionError


At:
  /usr/local/lib/python3.8/dist-packages/numpy/core/__init__.py(51): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(848): exec_module
  <frozen importlib._bootstrap>(686): _load_unlocked
  <frozen importlib._bootstrap>(975): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap>(1050): _handle_fromlist
  /usr/local/lib/python3.8/dist-packages/numpy/__init__.py(145): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(848): exec_module
  <frozen importlib._bootstrap>(686): _load_unlocked
  <frozen importlib._bootstrap>(975): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap>(961): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load
  • pybind: 2.8.1
  • numpy: tested several versions, from 1.19 to latest, with the latter also compiled locally.
  • python: tested on both 3.6 and 3.8 on Ubuntu 18.04 and ubuntu 20.04. It works on Windows 10 x64

With respect to other issues:

  • #3543 : maybe related?
  • #3112 : I'm quite sure the sample I'm attaching is loading the interpreter once

Reproducible example code

This should be a minimal example to reproduce the issue:

// main.cc
#include "pybind11/embed.h"
namespace py = pybind11;

extern "C" {
int main() {
  py::scoped_interpreter guard{};
  auto py_module = py::module::import("numpy");
  auto version   = py_module.attr("__version__");
  py::print(version);
  return 0;
}
}

Using the following cmake should be possible to compile both an executable (issue_main) and a shared object (libissue.so). The library can be tested with the target loader.

cmake_minimum_required(VERSION 3.14)

include(FetchContent)
FetchContent_Declare(
  pybind11
  GIT_REPOSITORY https://github.com/pybind/pybind11
  GIT_TAG v2.8.1)
FetchContent_MakeAvailable(pybind11)

project(
  pybind_issue
  LANGUAGES C CXX
  VERSION 1.0.0)

add_library(issue SHARED main.cc)
set_target_properties(issue PROPERTIES 
  POSITION_INDEPENDENT_CODE ON 
  CXX_STANDARD 11)
target_link_libraries(issue PRIVATE pybind11::embed)

add_executable(issue_main main.cc)
set_target_properties(issue_main PROPERTIES 
  POSITION_INDEPENDENT_CODE ON
  CXX_STANDARD 11)
target_link_libraries(issue_main PRIVATE pybind11::embed)

add_executable(loader load.cc)
target_link_libraries(loader PRIVATE ${CMAKE_DL_LIBS})

where the loader has the following code:

#include <dlfcn.h>

int main() {
  void * lib = dlopen("./libissue.so", RTLD_NOW);
  int(*fnc)(void) = (int(*)(void))dlsym(lib, "main");
  fnc();
  dlclose(lib);
  return 0;
}

While running ./issue_main it is possible to get the current version of numpy, loading the shared object via ./loader, the previous stack trace is obtained. It is possible to remove the loader from the equation using another method to load the library (e.g. Ruby FFI):

require 'ffi'

module IssueLib
  extend FFI::Library
  ffi_lib './libissue.so'
  attach_function :main, [], :int
end

IssueLib.main()

reports the same stack trace.

EDIT: Test on windows

With the following modifications:

// main.cc
#include "pybind11/embed.h"
namespace py = pybind11;

extern "C" {
__declspec(dllexport) int main() {
  py::scoped_interpreter guard{};
  auto py_module = py::module::import("numpy");
  auto version   = py_module.attr("__version__");
  py::print(version);
  return 0;
}
}
#include <windows.h>

int main() {
  HMODULE lib = LoadLibrary("./issue.dll");
  int(*fnc)(void) = (int(*)(void))GetProcAddress(lib, "main");
  fnc();
  FreeLibrary(lib);
  return 0;
}

it works correctly on Windows 10 x64.

MatteoRagni avatar Dec 16 '21 14:12 MatteoRagni

Important Update: this appears not to be tied to only numpy. If I import decimal (a stdlib numeric class) I get the a similar error:

#include "pybind11/embed.h"
namespace py = pybind11;

extern "C" {
int main() {
  py::scoped_interpreter guard{};
  auto py_module = py::module::import("decimal");
  auto version   = py_module.attr("__name__");
  py::print(version);
  return 0;
}
}

Gives me

terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ImportError: /usr/lib/python3.8/lib-dynload/_contextvars.cpython-38-x86_64-linux-gnu.so: undefined symbol: PyContextVar_Type

At:
  /usr/lib/python3.8/contextvars.py(1): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(848): exec_module
  <frozen importlib._bootstrap>(686): _load_unlocked
  <frozen importlib._bootstrap>(975): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load
  /usr/lib/python3.8/_pydecimal.py(440): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(848): exec_module
  <frozen importlib._bootstrap>(686): _load_unlocked
  <frozen importlib._bootstrap>(975): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load
  /usr/lib/python3.8/decimal.py(8): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(848): exec_module
  <frozen importlib._bootstrap>(686): _load_unlocked
  <frozen importlib._bootstrap>(975): _find_and_load_unlocked
  <frozen importlib._bootstrap>(991): _find_and_load

[1]    3095287 abort (core dumped)  ./loader

More than UB it seems like some symbols cannot be linked correctly in this scenario... Is it even possible?

MatteoRagni avatar Dec 17 '21 08:12 MatteoRagni

I've found a solution. Knowing that it was not tied to numpy halped quite a lot to switch the focus on the real problem: symbol missing. Taking the suggestion from this answer and in particular this point:

Solve a problem. Load the library found in step 1 by dlopen first (use RTLD_GLOBAL there as well).

I've modified the minimum example as follows:

// main.cc
#include "pybind11/embed.h"
#include <dlfcn.h>
namespace py = pybind11;

extern "C" {
void * python;

int create() {
  python = dlopen("/usr/lib/x86_64-linux-gnu/libpython3.8.so", RTLD_NOW | RTLD_GLOBAL);
  return 0;
}

int destroy() {
  dlclose(python);
  return 0;
}

int main() {
  py::scoped_interpreter guard{};
  auto py_module = py::module::import("numpy");
  auto version   = py_module.attr("__version__");
  py::print(version);
  return 0;
}
}
// load.cc
#include <dlfcn.h>

int main() {
  void * lib = dlopen("./libissue.so", RTLD_NOW | RTLD_DEEPBIND);
  int(*fnc)(void) = (int(*)(void))dlsym(lib, "main");
  int(*create)(void) = (int(*)(void))dlsym(lib, "create");
  int(*destroy)(void) = (int(*)(void))dlsym(lib, "destroy");
  create();
  fnc();
  destroy();
  dlclose(lib);
  return 0;
}

(obviously in cmake I had to add ${CMAKE_DL_LIBS} as target link library for issue target).

Final Thoughs: This for me keeps the issue open: it seems a problem in python linking / handling in the specific case of python loaded in a shared object dlopened. I think this can be considered as a new feature to be implemented as a new target (pybind11::embed_in_dlopen_object), which includes relevant code for symbols.

MatteoRagni avatar Dec 17 '21 09:12 MatteoRagni

I'm have a similar issue with pybind11 v2.8.1.

I'm ran python code from shared library linked to main executable via:

libpyrunner.so:
void pyrunner()
{
    py::initialize_interpreter();
    py::eval_file("file_with_numpy.py");
    py::finalize_interpreter();
}

And it fails with same error:

ImportError: 

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.9 from "/usr/bin/python3"
  * The NumPy version is: "1.19.5"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: /usr/lib/python3/dist-packages/numpy/core/_multiarray_umath.cpython-39-x86_64-linux-gnu.so: undefined symbol: PyExc_RecursionError


At:
  /usr/lib/python3/dist-packages/numpy/core/__init__.py(51): <module>
  <frozen importlib._bootstrap>(228): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(790): exec_module
  <frozen importlib._bootstrap>(695): _load_unlocked
  <frozen importlib._bootstrap>(986): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1007): _find_and_load
  <frozen importlib._bootstrap>(228): _call_with_frames_removed
  <frozen importlib._bootstrap>(1066): _handle_fromlist
  /usr/lib/python3/dist-packages/numpy/__init__.py(140): <module>
  <frozen importlib._bootstrap>(228): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(790): exec_module
  <frozen importlib._bootstrap>(695): _load_unlocked
  <frozen importlib._bootstrap>(986): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1007): _find_and_load
  /tmp/file_with_numpy.py(4): <module>

Both shared library and main executable are linked with libpython3.9.so

Error occurred at Debian 11 with python3.9 and numpy installed from main Debian 11 repository.

Same code runs correctly at Debian 9 and Debian 10 with python3.5 and python3.7 from main Debian 9/10 repository.

But at Debian 9 and Debian 10 shared library and main executable are linked with libpythomX.Ym.so. At Debian 11 libpythom3.9m.so is absent.

Pybind11 version is v2.8.1 at all setups.

Thanks to @MatteoRagni. This solution works for me too.

drons avatar Feb 11 '22 07:02 drons

I have come across this same issue (noaa-owp/ngen#655) and have deduced some additional context. This seems to happen when the python interpreter being embedded has been statically linked to libpython.

This can be determined by looking at the python sys config

python3 -m sysconfig

and inspecting the CONFIG_ARGS flags and/or other options.

Since the interperter is statically linked, the symbols from the library are not available to extension modules that are loaded via dlopen, hence why numpy fails to load. I will note that if the config has LINKFORSHARED = "-Xlinker -export-dynamic" then it should be possible to simply load the executable as a shared library and have the symbols exported globally (using RTLD_NOW | RTLD_GLOBAL).

Having searched through the documentation embedding a python interperter via pybind11, I couldn't find any relevant information about this particular issue. It may be out of scope to have pybind11 itself attempt to load the required symbols (using dlopen) but I would be curious to hear thoughts on that.

At the very least, some notes in the embedded documentation about checking for static linked python (no .so) and the implied dependence on extension modules, particularly numpy, being able to get symbols from libpython would be useful.

hellkite500 avatar Sep 26 '23 21:09 hellkite500