libtorrent icon indicating copy to clipboard operation
libtorrent copied to clipboard

Segfaults in 32-bit linux while running master python test suite, maybe allocation related

Open AllSeeingEyeTolledEweSew opened this issue 4 years ago • 17 comments

I'm trying to port #6188 to master. I'm getting segfaults running the python test suite, only on 32-bit linux.

#0  listen_socket_t (this=0xf69f25a0 <std::string::_Rep::_S_empty_rep_storage>) at ../../include/libtorrent/aux_/session_impl.hpp:155
#1  construct<libtorrent::aux::listen_socket_t> (__p=0xf69f25a0 <std::string::_Rep::_S_empty_rep_storage>, this=<optimized out>)
    at ../../include/libtorrent/aux_/session_impl.hpp:155
#2  construct<libtorrent::aux::listen_socket_t> (__p=0xf69f25a0 <std::string::_Rep::_S_empty_rep_storage>, __a=...)
    at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/alloc_traits.h:512
#3  _Sp_counted_ptr_inplace<> (__a=..., this=0xf5d022c8) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:551
#4  __shared_count<libtorrent::aux::listen_socket_t, std::allocator<libtorrent::aux::listen_socket_t> > (__a=..., 
    __p=@0x8468be0: 0xf7065558 <vtable for libtorrent::aux::session_impl+8>, this=0x8468be4) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:682
#5  __shared_ptr<std::allocator<libtorrent::aux::listen_socket_t> > (__tag=..., this=0x8468be0) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr_base.h:1371
#6  shared_ptr<std::allocator<libtorrent::aux::listen_socket_t> > (__tag=..., this=0x8468be0) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr.h:408
#7  allocate_shared<libtorrent::aux::listen_socket_t, std::allocator<libtorrent::aux::listen_socket_t> > (__a=...)
    at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr.h:860
#8  make_shared<libtorrent::aux::listen_socket_t> () at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/shared_ptr.h:876
#9  libtorrent::aux::session_impl::setup_listener(libtorrent::aux::listen_endpoint_t const&, boost::system::error_code&) () at ../../src/session_impl.cpp:1540
#10 0xf6c73022 in libtorrent::aux::session_impl::reopen_listen_sockets(bool) () at ../../src/session_impl.cpp:2083
#11 0xf6c7596c in libtorrent::aux::session_impl::init() () at ../../src/session_impl.cpp:710
#12 0xf6c865ff in libtorrent::aux::session_impl::wrap<void (libtorrent::aux::session_impl::*)()> (this=0x8468be0, 
    f=(void (libtorrent::aux::session_impl::*)(libtorrent::aux::session_impl * const)) 0xf6c75660 <libtorrent::aux::session_impl::init()>) at ../../src/session_impl.cpp:534
#13 0xf6c5a885 in operator() (__closure=<synthetic pointer>) at ../../src/session_impl.cpp:667
#14 invoke<libtorrent::aux::session_impl::start_session()::<lambda()>, libtorrent::aux::session_impl::start_session()::<lambda()> > (context=<synthetic pointer>, 
    function=<synthetic pointer>) at /boost_1_76_0/boost/asio/detail/handler_invoke_helpers.hpp:51
#15 boost::asio::detail::executor_op<libtorrent::aux::session_impl::start_session()::{lambda()#1}, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, std::allocator<void>*, boost::system::error_code const&, unsigned int) () at /boost_1_76_0/boost/asio/detail/executor_op.hpp:70
#16 0xf6bde53d in complete (bytes_transferred=<optimized out>, ec=..., owner=0x842dc38, this=0x8453c28) at /boost_1_76_0/boost/asio/detail/scheduler_operation.hpp:40
#17 do_run_one (ec=..., this_thread=..., lock=..., this=0x842dc38) at /boost_1_76_0/boost/asio/detail/impl/scheduler.ipp:486
#18 boost::asio::detail::scheduler::run (this=0x842dc38, ec=...) at /boost_1_76_0/boost/asio/detail/impl/scheduler.ipp:204
#19 0xf6c18d6a in run (this=<optimized out>) at /boost_1_76_0/boost/asio/impl/io_context.ipp:63
#20 operator() (__closure=0x84560b4) at ../../src/session.cpp:297
#21 __invoke_impl<void, libtorrent::session::start(libtorrent::session_handle::session_flags_t, libtorrent::session_params&&, boost::asio::io_context*)::<lambda()> > (
    __f=<unknown type in /venv/lib/python3.6/site-packages/python_libtorrent-2.0.4-py3.6-linux-i686.egg/libtorrent/__init__.cpython-36m-i386-linux-gnu.so, CU 0x19a2e83, DIE 0x1a3d23c>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/invoke.h:60
#22 __invoke<libtorrent::session::start(libtorrent::session_handle::session_flags_t, libtorrent::session_params&&, boost::asio::io_context*)::<lambda()> > (
    __fn=<unknown type in /venv/lib/python3.6/site-packages/python_libtorrent-2.0.4-py3.6-linux-i686.egg/libtorrent/__init__.cpython-36m-i386-linux-gnu.so, CU 0x19a2e83, DIE 0x1a3d218>) at /opt/rh/devtoolset-10/root/usr/include/c++/10/bits/invoke.h:95
#23 _M_invoke<0> (this=0x84560b4) at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:264
#24 operator() (this=0x84560b4) at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:271
#25 std::thread::_State_impl<std::thread::_Invoker<std::tuple<libtorrent::session::start(libtorrent::flags::bitfield_flag<unsigned char, libtorrent::session_flags_tag, void>, libtorrent::session_params&&, boost::asio::io_context*)::{lambda()#1}> > >::_M_run() () at /opt/rh/devtoolset-10/root/usr/include/c++/10/thread:215
#26 0xf6ed5cbd in execute_native_thread_routine ()
   from /venv/lib/python3.6/site-packages/python_libtorrent-2.0.4-py3.6-linux-i686.egg/libtorrent/__init__.cpython-36m-i386-linux-gnu.so
#27 0xf7fb8bbc in start_thread () from /lib/libpthread.so.0

Repro steps:

  • docker run -v /path/to/libtorrent:/lt -it quay.io/pypa/manylinux2014_i686
  • in docker:
$ curl -O https://boostorg.jfrog.io/artifactory/main/release/1.76.0/source/boost_1_76_0.tar.gz
$ tar xvzpf boost_1_76_0.tar.gz
$ cd /boost_1_76_0
$ ./bootstrap.sh
$ ./b2 headers
$ export BOOST_ROOT=/boost_1_76_0
$ export BOOST_BUILD_PATH=/boost_1_76_0/tools/build
$ export PATH="$BOOST_ROOT:$PATH"
$ yum install -y glibc-static
$ /opt/python/cp36-cp36m/bin/python -m venv /venv
$ source /venv/bin/activate
$ cd /lt
$ git checkout master
$ python setup.py build_ext --b2-args=debug-symbols=on install  # installs the module without stripping
$ cd bindings/python
$ python -X dev -m unittest tests/*.py

Notes:

  • I've run many builds on macos, windows, and 64-bit linux, and haven't seen any stack traces
  • I've seen a few different stack traces. The above is the most common one I've seen. All the ones I've seen happen in constructors, so I assume it's an allocation problem and they're all related
  • I've also seen this stack trace in RC_2_0. To reproduce there, you need to:
    • build and install the module from RC_2_0
    • check out master
    • remove the master tests related to #5993, as they are known to crash in RC_2_0
    • run the tests
  • I haven't tested RC_1_2 against the master suite yet.

@arvidn since the master tests caught a bug in RC_2_0, I take this as some evidence that it would be nice to backport all the python enhancements from master.

If you can show me what to do there, I can put in the work.

trying to reproduce this, the build step fails with:

error: [Errno 2] No such file or directory: 'b2': 'b2'

I tried yum install boost-build, yum install boost, yum install boost-devel, nothing helped. Which package is boost-build in?

arvidn avatar Sep 04 '21 21:09 arvidn

I messed up my repro steps. I downloaded boost from source for my test. I used the same setup as CI (download source, bootstrap.sh, b2 headers).

I think I used 1.76.0

Updated my repro steps

jfrog doesn't seem to like links like that. They go to some length to require a full browser in order to download.

Anyway, I get this error:

ImportError: /lt/bindings/python/libtorrent.so: wrong ELF class: ELFCLASS64

Even after rebuilding with --b2-args=address-model=32, I still get the same error.

arvidn avatar Sep 05 '21 09:09 arvidn

Oh, do you have a libtorrent.so shadowing the extension? the install step should copy the artifact somewhere under /venv. Do you get the same result after git clean -fxd or similar?

I can reproduce the segfault now. But I can't find a way to analyze it. If I run gdb (in docker) I get permission denied to create the proces. And I can't configure /proc/sys/kernel/core_pattern either, for some reason (it just says "read only filesystem"). So, how can I actually get at the core file?

arvidn avatar Sep 05 '21 19:09 arvidn

docker --privileged will let you run gdb within docker.

I got a core file after some test runs, but it may have been due to running with docker --privileged.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Dec 05 '21 07:12 stale[bot]

Now that #6188 is merged to master, it's easier to make a simple PR to see this segfault.

Note that in #6588 I enable 32-bit builds on manylinux, musllinux and windows. This segfault seems to only happen on manylinux 32-bit.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 12 '22 12:03 stale[bot]

@arvidn could you reopen this? this issue still occurred last time I tried the cibuildwheel workflow on 32-bit linux.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 12 '22 00:08 stale[bot]

Bump, I confirmed this still exists at least in master. See #7043

I take it the 64 bit build does not have this problem, right?

arvidn avatar Sep 04 '22 23:09 arvidn

Correct, or at least I've never seen this failure on 64-bit builds.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jan 07 '23 14:01 stale[bot]