Segfaults in 32-bit linux while running master python test suite, maybe allocation related
I'm trying to port #6188 to master. I'm getting segfaults running the python test suite, only on 32-bit linux.
#0 listen_socket_t (this=0xf69f25a0 <std::string::_Rep::_S_empty_rep_storage>) at ../../include/libtorrent/aux_/session_impl.hpp:155
#1 construct<libtorrent::aux::listen_socket_t> (__p=0xf69f25a0 <std::string::_Rep::_S_empty_rep_storage>, this=<optimized out>)
at ../../include/libtorrent/aux_/session_impl.hpp:155
#2 construct<libtorrent::aux::listen_socket_t> (__p=0xf69f25a0 <std::string::_Rep::_S_empty_rep_storage>, __a=...)
at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/alloc_traits.h:512
#3 _Sp_counted_ptr_inplace<> (__a=..., this=0xf5d022c8) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:551
#4 __shared_count<libtorrent::aux::listen_socket_t, std::allocator<libtorrent::aux::listen_socket_t> > (__a=...,
__p=@0x8468be0: 0xf7065558 <vtable for libtorrent::aux::session_impl+8>, this=0x8468be4) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:682
#5 __shared_ptr<std::allocator<libtorrent::aux::listen_socket_t> > (__tag=..., this=0x8468be0) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:1371
#6 shared_ptr<std::allocator<libtorrent::aux::listen_socket_t> > (__tag=..., this=0x8468be0) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr.h:408
#7 allocate_shared<libtorrent::aux::listen_socket_t, std::allocator<libtorrent::aux::listen_socket_t> > (__a=...)
at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr.h:860
#8 make_shared<libtorrent::aux::listen_socket_t> () at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr.h:876
#9 libtorrent::aux::session_impl::setup_listener(libtorrent::aux::listen_endpoint_t const&, boost::system::error_code&) () at ../../src/session_impl.cpp:1540
#10 0xf6c73022 in libtorrent::aux::session_impl::reopen_listen_sockets(bool) () at ../../src/session_impl.cpp:2083
#11 0xf6c7596c in libtorrent::aux::session_impl::init() () at ../../src/session_impl.cpp:710
#12 0xf6c865ff in libtorrent::aux::session_impl::wrap<void (libtorrent::aux::session_impl::*)()> (this=0x8468be0,
f=(void (libtorrent::aux::session_impl::*)(libtorrent::aux::session_impl * const)) 0xf6c75660 <libtorrent::aux::session_impl::init()>) at ../../src/session_impl.cpp:534
#13 0xf6c5a885 in operator() (__closure=<synthetic pointer>) at ../../src/session_impl.cpp:667
#14 invoke<libtorrent::aux::session_impl::start_session()::<lambda()>, libtorrent::aux::session_impl::start_session()::<lambda()> > (context=<synthetic pointer>,
function=<synthetic pointer>) at /boost_1_76_0/boost/asio/detail/handler_invoke_helpers.hpp:51
#15 boost::asio::detail::executor_op<libtorrent::aux::session_impl::start_session()::{lambda()#1}, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, std::allocator<void>*, boost::system::error_code const&, unsigned int) () at /boost_1_76_0/boost/asio/detail/executor_op.hpp:70
#16 0xf6bde53d in complete (bytes_transferred=<optimized out>, ec=..., owner=0x842dc38, this=0x8453c28) at /boost_1_76_0/boost/asio/detail/scheduler_operation.hpp:40
#17 do_run_one (ec=..., this_thread=..., lock=..., this=0x842dc38) at /boost_1_76_0/boost/asio/detail/impl/scheduler.ipp:486
#18 boost::asio::detail::scheduler::run (this=0x842dc38, ec=...) at /boost_1_76_0/boost/asio/detail/impl/scheduler.ipp:204
#19 0xf6c18d6a in run (this=<optimized out>) at /boost_1_76_0/boost/asio/impl/io_context.ipp:63
#20 operator() (__closure=0x84560b4) at ../../src/session.cpp:297
#21 __invoke_impl<void, libtorrent::session::start(libtorrent::session_handle::session_flags_t, libtorrent::session_params&&, boost::asio::io_context*)::<lambda()> > (
__f=<unknown type in /venv/lib/python3.6/site-packages/python_libtorrent-2.0.4-py3.6-linux-i686.egg/libtorrent/__init__.cpython-36m-i386-linux-gnu.so, CU 0x19a2e83, DIE 0x1a3d23c>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/invoke.h:60
#22 __invoke<libtorrent::session::start(libtorrent::session_handle::session_flags_t, libtorrent::session_params&&, boost::asio::io_context*)::<lambda()> > (
__fn=<unknown type in /venv/lib/python3.6/site-packages/python_libtorrent-2.0.4-py3.6-linux-i686.egg/libtorrent/__init__.cpython-36m-i386-linux-gnu.so, CU 0x19a2e83, DIE 0x1a3d218>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/invoke.h:95
#23 _M_invoke<0> (this=0x84560b4) at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:264
#24 operator() (this=0x84560b4) at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:271
#25 std::thread::_State_impl<std::thread::_Invoker<std::tuple<libtorrent::session::start(libtorrent::flags::bitfield_flag<unsigned char, libtorrent::session_flags_tag, void>, libtorrent::session_params&&, boost::asio::io_context*)::{lambda()#1}> > >::_M_run() () at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:215
#26 0xf6ed5cbd in execute_native_thread_routine ()
from /venv/lib/python3.6/site-packages/python_libtorrent-2.0.4-py3.6-linux-i686.egg/libtorrent/__init__.cpython-36m-i386-linux-gnu.so
#27 0xf7fb8bbc in start_thread () from /lib/libpthread.so.0
Repro steps:
docker run -v /path/to/libtorrent:/lt -it quay.io/pypa/manylinux2014_i686- in docker:
$ curl -O https://boostorg.jfrog.io/artifactory/main/release/1.76.0/source/boost_1_76_0.tar.gz
$ tar xvzpf boost_1_76_0.tar.gz
$ cd /boost_1_76_0
$ ./bootstrap.sh
$ ./b2 headers
$ export BOOST_ROOT=/boost_1_76_0
$ export BOOST_BUILD_PATH=/boost_1_76_0/tools/build
$ export PATH="$BOOST_ROOT:$PATH"
$ yum install -y glibc-static
$ /opt/python/cp36-cp36m/bin/python -m venv /venv
$ source /venv/bin/activate
$ cd /lt
$ git checkout master
$ python setup.py build_ext --b2-args=debug-symbols=on install # installs the module without stripping
$ cd bindings/python
$ python -X dev -m unittest tests/*.py
Notes:
- I've run many builds on macos, windows, and 64-bit linux, and haven't seen any stack traces
- I've seen a few different stack traces. The above is the most common one I've seen. All the ones I've seen happen in constructors, so I assume it's an allocation problem and they're all related
- I've also seen this stack trace in
RC_2_0. To reproduce there, you need to:- build and install the module from
RC_2_0 - check out
master - remove the
mastertests related to #5993, as they are known to crash inRC_2_0 - run the tests
- build and install the module from
- I haven't tested
RC_1_2against themastersuite yet.
@arvidn since the master tests caught a bug in RC_2_0, I take this as some evidence that it would be nice to backport all the python enhancements from master.
If you can show me what to do there, I can put in the work.
trying to reproduce this, the build step fails with:
error: [Errno 2] No such file or directory: 'b2': 'b2'
I tried yum install boost-build, yum install boost, yum install boost-devel, nothing helped. Which package is boost-build in?
I messed up my repro steps. I downloaded boost from source for my test. I used the same setup as CI (download source, bootstrap.sh, b2 headers).
I think I used 1.76.0
Updated my repro steps
jfrog doesn't seem to like links like that. They go to some length to require a full browser in order to download.
Anyway, I get this error:
ImportError: /lt/bindings/python/libtorrent.so: wrong ELF class: ELFCLASS64
Even after rebuilding with --b2-args=address-model=32, I still get the same error.
Oh, do you have a libtorrent.so shadowing the extension? the install step should copy the artifact somewhere under /venv. Do you get the same result after git clean -fxd or similar?
I can reproduce the segfault now. But I can't find a way to analyze it. If I run gdb (in docker) I get permission denied to create the proces. And I can't configure /proc/sys/kernel/core_pattern either, for some reason (it just says "read only filesystem"). So, how can I actually get at the core file?
docker --privileged will let you run gdb within docker.
I got a core file after some test runs, but it may have been due to running with docker --privileged.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Now that #6188 is merged to master, it's easier to make a simple PR to see this segfault.
Note that in #6588 I enable 32-bit builds on manylinux, musllinux and windows. This segfault seems to only happen on manylinux 32-bit.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@arvidn could you reopen this? this issue still occurred last time I tried the cibuildwheel workflow on 32-bit linux.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Bump, I confirmed this still exists at least in master. See #7043
I take it the 64 bit build does not have this problem, right?
Correct, or at least I've never seen this failure on 64-bit builds.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.