pybind11
pybind11 copied to clipboard
[BUG]: free(): invalid pointer
Required prerequisites
- [X] Make sure you've read the documentation. Your issue may be addressed there.
- [X] Search the issue tracker and Discussions to verify that this hasn't already been reported. +1 or comment there if it has.
- [ ] Consider asking first in the Gitter chat room or in a Discussion.
Problem description
Like issue https://github.com/pybind/pybind11/issues/1472, we still have problem in 2.10.0
free(): invalid pointer
c++ code:
#include "pybind11/embed.h"
#include <iostream>
#include <thread>
#include <chrono>
#include <sstream>
namespace py = pybind11;
using namespace std::chrono_literals;
class Wrapper
{
public:
Wrapper()
{
py::gil_scoped_acquire acquire;
_obj = py::module::import("wrapper").attr("Wrapper")();
_wrapperInit = _obj.attr("wrapperInit");
_wrapperFini = _obj.attr("wrapperFini");
}
~Wrapper()
{
_wrapperInit.release();
_wrapperFini.release();
}
int wrapperInit()
{
py::gil_scoped_acquire acquire;
return _wrapperInit(nullptr).cast<int>();
}
void wrapperFini(int x)
{
py::gil_scoped_acquire acquire;
_wrapperFini(x);
}
private:
py::object _obj;
py::object _wrapperInit;
py::object _wrapperFini;
};
void thread_func(int iteration)
{
Wrapper w;
for (int i = 0; i < 1; i++)
{
w.wrapperInit();
std::stringstream msg;
msg << "iteration: " << iteration << " thread: " << std::this_thread::get_id() << std::endl;
std::cout << msg.str();
std::this_thread::sleep_for(100ms);
}
}
int main() {
py::scoped_interpreter guard{};
py::gil_scoped_release release; // add this to release the GIL
std::vector<std::thread> threads;
for (int i = 0; i < 1; ++i)
threads.push_back(std::thread(thread_func, 1));
for (auto& t : threads)
t.join();
return 0;
}
wrapper.py code is
class Wrapper():
serviceId = "mmocr"
version = "backup.0"
'''
服务初始化
@param config:
插件初始化需要的一些配置,字典类型
key: 配置名
value: 配置的值
@return
ret: 错误码。无错误时返回0
'''
def wrapperInit(cls, config: {}) -> int:
import torch
print(config)
print("Initializing ..")
return 0
def wrapperFini(cls) -> int:
return 0
we run this code in ubuntu18.04 docker container. and the repo is public.ecr.aws/iflytek-open/opensource/demo/mmocr:v3.1
Reproducible example code
No response
I'm guessing this is https://github.com/pybind/pybind11/issues/4105.
I verified this is not #4105, this code was broken in 2.9 as well.
I couldn't reproduce the free(): invalid pointer
crash using the code here, but there is certainly a GIL issue that you can confirm by using PR #4146. The problem in the reproducer code is that the GIL is not being held when the destructor for Wrapper::_obj
is running. You can "fix" it by adding _obj.release();
in the Wrapper
destructor. "fix" is in quotation marks because it is simply leaking the Python reference, "masking" would be a more fitting word. To not leak:
--- main_using_embed_h.cpp.orig 2022-10-23 21:29:46.559375849 -0700
+++ main_using_embed_h.cpp 2022-10-23 21:56:25.089334464 -0700
@@ -21,7 +21,12 @@
~Wrapper()
{
+ py::gil_scoped_acquire hold_gil;
+ _obj.dec_ref();
+ _obj.release();
+ _wrapperInit.dec_ref();
_wrapperInit.release();
+ _wrapperFini.dec_ref();
_wrapperFini.release();
}
I'm closing this bug because it's pretty likely that the free(): invalid pointer
has nothing to do with a bug in pybind11.
Until we merge PR #4146, I recommend you patch it locally and run all your tests.
I am encountering this with the same conditon this is my set-up that can be replicated
# dummy_python_script.py
import torch
def simple_return():
return 1
the simple.cpp
#include <iostream>
#include <future>
#include <pybind11/embed.h>
namespace py = pybind11;
std::future<int> callPythonFunctionAsync(py::object &pyFunction) {
return std::async(std::launch::async, [&](){
py::gil_scoped_acquire acquire;
int result = pyFunction().cast<int>();
return result;
});
}
int main() {
py::scoped_interpreter guard{}; // Start the interpreter and keep it alive
// Import the Python module
py::module pyModule = py::module::import("dummy");
py::object pyFunction = pyModule.attr("simple_return");
// Call the function asynchronously
std::cout << "Calling Python function asynchronously..." << std::endl;
py::gil_scoped_release release;
auto futureResult = callPythonFunctionAsync(pyFunction);
// Wait for the result and print it
try {
int result = futureResult.get();
std::cout << "Result from Python: " << result << std::endl;
} catch (const std::exception& e) {
std::cerr << "Exception caught: " << e.what() << std::endl;
}
return 0;
}
with the following cmake
cmake_minimum_required(VERSION 3.10) # Updated minimum required version
project(py_cpp_func)
set(CMAKE_CXX_STANDARD 11) # Setting C++ standard to C++11
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")
# Manually set Python include directories and libraries
set(PYTHON_INCLUDE_DIR /usr/local/include/python3.10)
set(PYTHON_LIBRARY /usr/local/lib/libpython3.10.so)
include_directories(${PYTHON_INCLUDE_DIR})
# Include pybind11
# Include pybind11 from the external directory
add_subdirectory(external/pybind11)
add_executable(py_dummy simple.cpp)
target_link_libraries(py_dummy PRIVATE ${PYTHON_LIBRARIES} pybind11::embed)
configure_file(dummy.py ${CMAKE_BINARY_DIR}/dummy.py COPYONLY)
with the following dockerfile:
FROM ubuntu:18.04
RUN apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository ppa:ubuntu-toolchain-r/test && \
apt-get update && \
apt-get install -y \
gcc \
g++ \
cmake \
libboost-all-dev \
wget
RUN apt-get remove -y cmake && \
wget https://cmake.org/files/v3.10/cmake-3.10.0-Linux-x86_64.sh && \
chmod +x cmake-3.10.0-Linux-x86_64.sh && \
./cmake-3.10.0-Linux-x86_64.sh --skip-license --prefix=/usr/local
RUN apt-get install -y git
# Clone pybind11 into the external directory
RUN mkdir -p /external && \
git clone --branch v2.11.1 https://github.com/pybind/pybind11.git /external/pybind11
# Install Python 3.10.13
ENV PYTHON_VERSION 3.10.13
# Install necessary packages
RUN apt-get update && \
apt-get install -y software-properties-common wget git \
build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev \
libssl-dev libsqlite3-dev libreadline-dev libffi-dev curl libbz2-dev liblzma-dev
RUN apt-get install -y libgomp1 libgl1-mesa-glx
# Download Python 3.10 source
RUN cd /tmp && \
wget https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tar.xz && \
tar -xf Python-$PYTHON_VERSION.tar.xz
# Compile Python 3.10
RUN cd /tmp/Python-$PYTHON_VERSION && \
./configure --enable-optimizations --enable-shared && \
make -j 8 && \
make altinstall && \
ldconfig
# Install pip for Python 3.10
RUN cd /tmp && \
wget https://bootstrap.pypa.io/get-pip.py && \
python3.10 get-pip.py && \
rm get-pip.py
# Install OpenCV for C++
RUN DEBIAN_FRONTEND="noninteractive" apt-get install -y libopencv-dev
WORKDIR /usr/src/three-stage-object-detection
# Install Triton Inference Server
COPY three-stage-object-detection /usr/src/three-stage-object-detection/
RUN python3.10 -m pip install -e .
WORKDIR /usr/src/app
COPY CMakeLists.txt /usr/src/app/
COPY dummy.py /usr/src/app/
COPY simple.cpp /usr/src/app/
RUN mkdir external && \
ln -s /external/pybind11 external/pybind11
RUN mkdir build && \
cd build && \
cmake -DCMAKE_BUILD_TYPE=Debug .. && \
make
WORKDIR /usr/src/app/build
# Clean up
RUN apt-get clean && \
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
I am encountering this with the same conditon this is my set-up that can be replicated
- Does this run successfully if you remove
import torch
? - Do you have a stack trace from the crash?
- I don't think that's it, but I'd make this change:
-std::future<int> callPythonFunctionAsync(py::object &pyFunction)
+std::future<int> callPythonFunctionAsync(py::handle pyFunction)
-
I don't think any of the maintainers will have the time to reproduce the crash. If this is important to you, I recommend you send a PR that adds a .github/workflows/reproducer.yml job to run in GitHub Actions.
-
I really really doubt the root cause is in pybind11.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1 0x00007ffff6c7f7f1 in __GI_abort () at abort.c:79
#2 0x00007ffff6cc8837 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff6df5a7b "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00007ffff6ccf8ba in malloc_printerr (str=str@entry=0x7ffff6df3c76 "free(): invalid pointer") at malloc.c:5342
#4 0x00007ffff6cd6dec in _int_free (have_lock=0, p=0x7fff280e49a8, av=0x7ffff702ac40 <main_arena>) at malloc.c:4167
#5 __GI___libc_free (mem=0x7fff280e49b8) at malloc.c:3134
#6 0x000055555542c508 in __gnu_cxx::new_allocator<std::_Fwd_list_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::destroy<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (this=0x5555556dae78, __p=0x7fff280e6158) at /usr/include/c++/7/ext/new_allocator.h:140
#7 0x000055555542876b in std::allocator_traits<std::allocator<std::_Fwd_list_node<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::destroy<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__a=..., __p=0x7fff280e6158) at /usr/include/c++/7/bits/alloc_traits.h:487
#8 0x000055555542319d in std::_Fwd_list_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::_M_erase_after (this=0x5555556dae78, __pos=0x5555556dae78, __last=0x0) at /usr/include/c++/7/bits/forward_list.tcc:90
#9 0x000055555541e84a in std::_Fwd_list_base<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::~_Fwd_list_base (this=0x5555556dae78, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/forward_list.h:329
#10 0x000055555541a82c in std::forward_list<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >::~forward_list (this=0x5555556dae78, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/forward_list.h:559
#11 0x000055555540fb3b in pybind11::detail::internals::~internals (this=0x5555556dacd0, __in_chrg=<optimized out>) at /external/pybind11/include/pybind11/detail/internals.h:207
#12 0x0000555555419629 in pybind11::finalize_interpreter () at /external/pybind11/include/pybind11/embed.h:263
#13 0x00005555554196ea in pybind11::scoped_interpreter::~scoped_interpreter (this=0x7fffffffe533, __in_chrg=<optimized out>) at /external/pybind11/include/pybind11/embed.h:308
#14 0x0000555555407d2d in main () at /usr/src/app/simple.cpp:16
I got this backtrace also I was able to run if I update to 20.04 on the docker base image.
someone on gitter helped me to get the trace
I am encountering this with the same conditon this is my set-up that can be replicated
- Does this run successfully if you remove
import torch
?- Do you have a stack trace from the crash?
- I don't think that's it, but I'd make this change:
-std::future<int> callPythonFunctionAsync(py::object &pyFunction) +std::future<int> callPythonFunctionAsync(py::handle pyFunction)
- I don't think any of the maintainers will have the time to reproduce the crash. If this is important to you, I recommend you send a PR that adds a .github/workflows/reproducer.yml job to run in GitHub Actions.
- I really really doubt the root cause is in pybind11.
if I do not put torch, the code works, so definitly something with torch
if I do not put torch, the code works, so definitly something with torch
I'd work on sending them a PR that reproduces the crash.