libtorrent icon indicating copy to clipboard operation
libtorrent copied to clipboard

pthread_rwlock_wrlock() is called with a NULL pointer.

Open marillat opened this issue 2 years ago • 11 comments

libtorrent version (or branch): 2.0.6 platform/architecture: amd64 compiler and compiler version: gcc 11.3.0 Debian bug report https://bugs.debian.org/1013931 From the bug report

The stacktrace below suggests that this is related to cleanup coderegistered using atexit(3): pthread_rwlock_wrlock() is called with a
NULL pointer.
,----
| $ gdb --args server/nbdkit plugins/torrent/.libs/nbdkit-torrent-plugin.so --help
| GNU gdb (Debian 12.1-2) 12.1
| Copyright (C) 2022 Free Software Foundation, Inc.
| [...]
| (gdb) run
| Starting program: /home/bengen/p/deb/nbdkit/server/nbdkit plugins/torrent/.libs/nbdkit-torrent-plugin.so --help
| [Thread debugging using libthread_db enabled]
| Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
| nbdkit [-4|--ipv4-only] [-6|--ipv6-only]
|        [-D|--debug PLUGIN|FILTER|nbdkit.FLAG=N]
| [...]
| torrent=<TORRENT>   (required) Torrent or magnet link.
| file=DISK.iso                  File to serve within torrent.
| cache=DIR                      Set directory to store partial downloads.
| 
| Program received signal SIGSEGV, Segmentation fault.
| __GI___pthread_rwlock_wrlock (rwlock=0x0) at pthread_rwlock_wrlock.c:27
| 27	pthread_rwlock_wrlock.c: No such file or directory.
| (gdb) bt
| #0  __GI___pthread_rwlock_wrlock (rwlock=0x0) at pthread_rwlock_wrlock.c:27
| #1  0x00007ffff6c8a8e9 in CRYPTO_THREAD_write_lock (lock=<optimized out>) at ../crypto/threads_pthread.c:112
| #2  0x00007ffff6ba7a03 in conf_modules_finish_int () at ../crypto/conf/conf_mod.c:524
| #3  0x00007ffff6ba8132 in CONF_modules_unload (all=1) at ../crypto/conf/conf_mod.c:482
| #4  0x00007ffff726af24 in boost::asio::ssl::detail::openssl_init_base::do_init::~do_init (this=0x55556157dc50, __in_chrg=<optimized out>)
|     at /usr/include/boost/asio/ssl/detail/impl/openssl_init.ipp:90
| #5  std::_Sp_counted_ptr<boost::asio::ssl::detail::openssl_init_base::do_init*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (
|     this=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:348
| #6  0x00007ffff726b09a in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x55556157db90)
|     at /usr/include/c++/11/bits/shared_ptr_base.h:168
| #7  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>)
|     at /usr/include/c++/11/bits/shared_ptr_base.h:705
| #8  std::__shared_ptr<boost::asio::ssl::detail::openssl_init_base::do_init, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (
|     this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154
| #9  std::shared_ptr<boost::asio::ssl::detail::openssl_init_base::do_init>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>)
|     at /usr/include/c++/11/bits/shared_ptr.h:122
| #10 0x00007ffff7bb9f77 in __run_exit_handlers (status=status@entry=0, listp=0x7ffff7d4d738 <__exit_funcs>, 
|     run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
| #11 0x00007ffff7bba11a in __GI_exit (status=status@entry=0) at exit.c:139
| #12 0x000055555555a944 in main (argc=<optimized out>, argv=<optimized out>) at ./server/main.c:678
`----

marillat avatar Jun 28 '22 07:06 marillat

so, this happens when boost.asio is unloading libcrypto. I wonder if this is caused by libtorrent also calling CRYPTO_cleanup_all_ex_data().

I believe both asio's unloading and libtorrent's cleanup are deprecated in newer versions of libcrypto. libtorrent protects the calls with:

#if !defined(OPENSSL_API_COMPAT) || OPENSSL_API_COMPAT < 0x10100000L

So, I'm thinking that maybe it's time to just remove this code in libtorrent, and possibly "leak" libcrypto resources when an old version is used. or perhaps just bump the dependency and say libtorrent requires a OpenSSL 1.1 or later.

arvidn avatar Jun 28 '22 09:06 arvidn

could you test to see if this makes a difference? https://github.com/arvidn/libtorrent/pull/6941

arvidn avatar Jun 28 '22 09:06 arvidn

it looks like it's still happening: https://github.com/arvidn/libtorrent/runs/7090009663?check_suite_focus=true#step:5:16618

arvidn avatar Jun 28 '22 11:06 arvidn

From PR #6941 arvidn wants to merge 1 commit into RC_1_2 from remove-openssl-cleanup

I'm forwarding the bug submitter comment

Perhaps I'm missing something … it seems to me that this patch is
to be appplied onto version 1.2, not 2.0 – and the verrsion nbdkit was
most recently built with is 2.0.6.

marillat avatar Jun 29 '22 12:06 marillat

yeah, I made the patch against 1.2 because the same problem happens there, and I regularly merge 1.2 into 2.0 and master. Either way, it didn't seem to make a different. It's possible it's a new version of OpenSSL and boost.asio that are incompatible. I haven't been able to reproduce it, but maybe with the latest version of openssl I might.

arvidn avatar Jun 29 '22 12:06 arvidn

Hi, I am the person who reported the bug within Debian. (We can continue the conversation without having to rely on @marillat to relay bits and pieces.)

My best uneducateed guess at the moment is that the OpenSSL structure that contains the pthread_rwlock_t might have been zeroed out on destruction, leading to a NULL pointer being passed to pthread_rwlock_wrlock() when the corresponding cleanup function is called a second time.

hillu avatar Jun 29 '22 15:06 hillu

I may have found hints towards what is going on here. Apparently OpenSSL's CONF_modules_unload is run twice (why?) through atexit handlers and then a boost::asio::ssl which also wants to run some cleanup code.

Breakpoint 1, CONF_modules_unload (all=1) at ../crypto/conf/conf_mod.c:482
482	../crypto/conf/conf_mod.c: No such file or directory.
(gdb) bt
#0  CONF_modules_unload (all=1) at ../crypto/conf/conf_mod.c:482
#1  0x00007ffff6ba827e in ossl_config_modules_free () at ../crypto/conf/conf_mod.c:571
#2  0x00007ffff6c7d0b9 in OPENSSL_cleanup () at ../crypto/init.c:414
#3  OPENSSL_cleanup () at ../crypto/init.c:344
#4  0x00007ffff7bb9f77 in __run_exit_handlers (status=status@entry=0, listp=0x7ffff7d4d738 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#5  0x00007ffff7bba11a in __GI_exit (status=status@entry=0) at exit.c:139
#6  0x000055555555a944 in main (argc=<optimized out>, argv=<optimized out>) at ./server/main.c:678
(gdb) c
Continuing.

Breakpoint 1, CONF_modules_unload (all=1) at ../crypto/conf/conf_mod.c:489
489	in ../crypto/conf/conf_mod.c
(gdb) bt
#0  CONF_modules_unload (all=1) at ../crypto/conf/conf_mod.c:489
#1  CONF_modules_unload (all=1) at ../crypto/conf/conf_mod.c:477
#2  0x00007ffff6ba827e in ossl_config_modules_free () at ../crypto/conf/conf_mod.c:571
#3  0x00007ffff6c7d0b9 in OPENSSL_cleanup () at ../crypto/init.c:414
#4  OPENSSL_cleanup () at ../crypto/init.c:344
#5  0x00007ffff7bb9f77 in __run_exit_handlers (status=status@entry=0, listp=0x7ffff7d4d738 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#6  0x00007ffff7bba11a in __GI_exit (status=status@entry=0) at exit.c:139
#7  0x000055555555a944 in main (argc=<optimized out>, argv=<optimized out>) at ./server/main.c:678
(gdb) c
Continuing.

Breakpoint 1, CONF_modules_unload (all=1) at ../crypto/conf/conf_mod.c:482
482	in ../crypto/conf/conf_mod.c
(gdb) bt
#0  CONF_modules_unload (all=1) at ../crypto/conf/conf_mod.c:482
#1  0x00007ffff726af24 in boost::asio::ssl::detail::openssl_init_base::do_init::~do_init (this=0x55556157dc50, __in_chrg=<optimized out>)
    at /usr/include/boost/asio/ssl/detail/impl/openssl_init.ipp:90
#2  std::_Sp_counted_ptr<boost::asio::ssl::detail::openssl_init_base::do_init*, (__gnu_cxx::_Lock_policy)2>::_M_dispose (
    this=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:348
#3  0x00007ffff726b09a in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x55556157db90)
    at /usr/include/c++/11/bits/shared_ptr_base.h:168
#4  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/shared_ptr_base.h:705
#5  std::__shared_ptr<boost::asio::ssl::detail::openssl_init_base::do_init, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (
    this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/11/bits/shared_ptr_base.h:1154
#6  std::shared_ptr<boost::asio::ssl::detail::openssl_init_base::do_init>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>)
    at /usr/include/c++/11/bits/shared_ptr.h:122
#7  0x00007ffff7bb9f77 in __run_exit_handlers (status=status@entry=0, listp=0x7ffff7d4d738 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#8  0x00007ffff7bba11a in __GI_exit (status=status@entry=0) at exit.c:139
#9  0x000055555555a944 in main (argc=<optimized out>, argv=<optimized out>) at ./server/main.c:678
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
__GI___pthread_rwlock_wrlock (rwlock=0x0) at pthread_rwlock_wrlock.c:27
27	pthread_rwlock_wrlock.c: No such file or directory.

hillu avatar Jun 29 '22 15:06 hillu

my suspicion is that as openssl deprecated CONF_modules_unload() they made it be called by their own global destructor. However, when they did, they should have made the publicly visible CONF_modules_unload() be a no-op, but perhaps they didn't. Perhaps most users of OpenSSL doesn't bother calling this function on shutdown, but if you do, the new OpenSSL might end up calling it again.

arvidn avatar Jun 29 '22 16:06 arvidn

Patching the preprocessor magic around ::CONF_modules_unload(1); in boost::asio::ssl::detail::openssl_init_base::do_init::~do_init and rebuilding Boost and libtorrent fixed the issue for me. And Boost seem to have fixed this problem for OpenSSL >= 3, too: https://github.com/boostorg/asio/blob/46f49024e13ed8867cd48cabf1498a0e11db593d/include/boost/asio/ssl/detail/impl/openssl_init.ipp#L89

hillu avatar Jul 01 '22 10:07 hillu

I just managed to reproduce this on MacOS as well, after updating OpenSSL.

My understanding is:

  1. OpenSSL simplified their API by no longer requiring calling CONF_modules_unload(), but it seems like while doing so they also made it an error to call this function (even though it was OK in previous versions). I think this is the root of the issue. The OpenSSL API changed slightly.

  2. boost.asio calls CONF_modules_unload() in a global destructor, which happens to be after the now-built-in call happens, which crashes.

I think the best fix would be to patch OpenSSL to make CONF_modules_unload() a no-op, and have it's at-exit handler call an internal function that does the same thing (i.e. just rename it).

The second best fix would be to patch boost.asio, as @hillu described.

I don't think there's anything I can do in libtorrent however.

arvidn avatar Jul 02 '22 09:07 arvidn

one MacOS, I can work around this by installing brew install [email protected] and the build libtorrent with:

b2 openssl-lib=$(brew --prefix [email protected])/lib openssl-include=$(brew --prefix [email protected])/include

(this is becasue [email protected] won't be installed in /opt/homebrew/include and /opt/homebrew/lib).

Apparently on github actions, it's sufficient to just downgrade. https://github.com/arvidn/libtorrent/pull/6945

arvidn avatar Jul 02 '22 10:07 arvidn

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 01 '22 00:10 stale[bot]

This issue has been fixed.

marillat avatar Oct 01 '22 06:10 marillat