libtorrent icon indicating copy to clipboard operation
libtorrent copied to clipboard

libtorrent.so 0.13.8 crash on Rasberry PI4 kernel due to unaligned access

Open MetalKnight opened this issue 2 years ago • 7 comments

libtorrent 0.13.8, distributed with rtorrent 0.9.8 raspberry PI4 kernel version: 6.1.21-v8+ aarch64 GNU/Linux

rtorrent runs on a Raspberry PI 4 with storage configured to ext4 HDD. was running fine for ages, I've updated the PI4 to the latest update yesterday and now rtorrent crashes after downloading some data from a torrent. restarting the process makes it crash the sameway after the file chcksum has been completed

Caught SIGBUS, dumping stack:
rtorrent() [0x1fdf0]
/lib/arm-linux-gnueabihf/libc.so.6(__default_rt_sa_restorer+0) [0xf7493910]
/usr/lib/arm-linux-gnueabihf/libtorrent.so.21(+0xd3af4) [0xf792aaf4]
/usr/lib/arm-linux-gnueabihf/libtorrent.so.21(_ZN7torrent9PollEPoll7performEv+0xe0) [0xf7894984]
/usr/lib/arm-linux-gnueabihf/libtorrent.so.21(_ZN7torrent9PollEPoll7do_pollExi+0x78) [0xf7894adc]
/usr/lib/arm-linux-gnueabihf/libtorrent.so.21(_ZN7torrent11thread_base10event_loopEPS0_+0x174) [0xf78d189c]
rtorrent() [0x1e660]
/lib/arm-linux-gnueabihf/libc.so.6(__libc_start_main+0x114) [0xf747b740]

Error: Success
Signal code '1': Invalid address alignment.
Fault address: 0x110946d
The fault address is not part of any chunk.

using gdb gets the following information that shows that the issue is indeed in Thread 1 running libtorrent.so

(gdb) thread apply all backtrace

Thread 3 (Thread 0xf5dff200 (LWP 1536) "rtorrent scgi"):
#0  0xf7a7a1dc in epoll_wait (epfd=13, events=0x1f6750, maxevents=1024, timeout=600001) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0xf7dce898 in torrent::PollEPoll::poll(int) () from /usr/lib/arm-linux-gnueabihf/libtorrent.so.21
#2  0xf7dceac8 in torrent::PollEPoll::do_poll(long long, int) () from /usr/lib/arm-linux-gnueabihf/libtorrent.so.21
#3  0xf7e0b89c in torrent::thread_base::event_loop(torrent::thread_base*) () from /usr/lib/arm-linux-gnueabihf/libtorrent.so.21
#4  0xf7af8310 in start_thread (arg=0xf5dff200) at pthread_create.c:477
#5  0xf7a79da8 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (Thread 0xf6770200 (LWP 1535) "rtorrent disk"):
#0  0xf7a7a1dc in epoll_wait (epfd=10, events=0x1f01b8, maxevents=1024, timeout=10001) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0xf7dce898 in torrent::PollEPoll::poll(int) () from /usr/lib/arm-linux-gnueabihf/libtorrent.so.21
#2  0xf7dceac8 in torrent::PollEPoll::do_poll(long long, int) () from /usr/lib/arm-linux-gnueabihf/libtorrent.so.21
#3  0xf7e0b89c in torrent::thread_base::event_loop(torrent::thread_base*) () fro--Type <RET> for more, q to quit, c to continue without paging--
m /usr/lib/arm-linux-gnueabihf/libtorrent.so.21
#4  0xf7af8310 in start_thread (arg=0xf6770200) at pthread_create.c:477
#5  0xf7a79da8 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0xf6c2b040 (LWP 1531) "rtorrent main"):
#0  0xf7e64af4 in ?? () from /usr/lib/arm-linux-gnueabihf/libtorrent.so.21
#1  0xf7dce984 in torrent::PollEPoll::perform() () from /usr/lib/arm-linux-gnueabihf/libtorrent.so.21
#2  0xf7dceadc in torrent::PollEPoll::do_poll(long long, int) () from /usr/lib/arm-linux-gnueabihf/libtorrent.so.21
#3  0xf7e0b89c in torrent::thread_base::event_loop(torrent::thread_base*) () from /usr/lib/arm-linux-gnueabihf/libtorrent.so.21
#4  0x0001e660 in ?? ()
#5  0xf79b5740 in __libc_start_main (main=0xfffef6e4, argc=-139534336, argv=0xf79b5740 <__libc_start_main+276>, init=<optimized out>, fini=0x170f98, rtld_fini=0xf7fcd510 <_dl_fini>, stack_end=0xfffef6e4) at libc-start.c:308
#6  0x0001f2c0 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

MetalKnight avatar Jul 05 '23 19:07 MetalKnight

awaiting instructions on how to provide you further debug info. thanks

MetalKnight avatar Jul 05 '23 19:07 MetalKnight

kernel alignment options

sudo modprobe configs
zgrep ALIGN /proc/config.gz
# CONFIG_COMPAT_ALIGNMENT_FIXUPS is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y
CONFIG_CMA_ALIGNMENT=8
# CONFIG_DEBUG_FORCE_FUNCTION_ALIGN_64B is not set

MetalKnight avatar Jul 06 '23 08:07 MetalKnight

some debug thoughts taken from https://github.com/epics-base/pvDataCPP/issues/84

MetalKnight avatar Jul 06 '23 08:07 MetalKnight

I've found a workaround:

  • build the source code of rakshasa/libtorrent with configure plus option --enable-aligned
  • of course I had to manually build the rtorrent binary as well against the previous library
  • I also had to manually install: libssl-dev, libncurses5-dev, libncursesw5-dev, autoconf-archive, libcurl-dev.

I can revert to the old binary anytime if you still need me to help with debug the unaligned issue.

MetalKnight avatar Jul 06 '23 12:07 MetalKnight

Thanks so much for reporting this information. I would like to add information here. This also happens on x86 - not just ARM. I can confirm that configuring rakshasa/libtorrent with --enable-aligned resolves the problem.

../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0  __memmove_avx_unaligned () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:222
#1  0x00007ffff7dfd269 in torrent::Chunk::to_buffer(void*, unsigned int, unsigned int) () from /lib/libtorrent.so.21
#2  0x00007ffff7e2ad0c in torrent::PeerConnectionBase::up_chunk() () from /lib/libtorrent.so.21
#3  0x00007ffff7e2eb3a in torrent::PeerConnection<(torrent::Download::ConnectionType)1>::event_write() ()
   from /lib/libtorrent.so.21
#4  0x00007ffff7dd0848 in torrent::PollEPoll::perform() () from /lib/libtorrent.so.21
#5  0x00007ffff7df3b62 in torrent::thread_base::event_loop(torrent::thread_base*) () from /lib/libtorrent.so.21
#6  0x000055555558e8c4 in main ()

stickz avatar Jan 03 '24 16:01 stickz

I just saw this error:

 ../rak/socket_address.h:131:72: error: cast from 'sockaddr*' to 'rak::socket_address*' increases required alignment of target type [-Werror=cast-align]
  131 |   static socket_address*       cast_from(sockaddr* sa)        { return reinterpret_cast<socket_address*>(sa); }
      |                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../rak/socket_address.h: In static member function 'static const rak::socket_address* rak::socket_address::cast_from(const sockaddr*)':
../rak/socket_address.h:132:72: error: cast from 'const sockaddr*' to 'const rak::socket_address*' increases required alignment of target type [-Werror=cast-align]
  132 |   static const socket_address* cast_from(const sockaddr* sa)  { return reinterpret_cast<const socket_address*>(sa); }
      |                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../rak/socket_address.h: In member function 'rak::socket_address rak::socket_address_inet6::normalize_address() const':
../rak/socket_address.h:530:28: error: cast from 'const uint8_t*' {aka 'const unsigned char*'} to 'const uint32_t*' {aka 'const unsigned int*'} increases required alignment of target type [-Werror=cast-align]
  530 |   const uint32_t *addr32 = reinterpret_cast<const uint32_t *>(m_sockaddr.sin6_addr.s6_addr);

could this be related?

neheb avatar Jan 13 '25 00:01 neheb

Check if this issue still appears with the latest master that has updated to c++17.

rakshasa avatar Jan 17 '25 16:01 rakshasa

since this is related to a build issue, I am updating the github action in my repo to capture the "--enable-aligned" flag.

beadon avatar Sep 20 '25 07:09 beadon

aded to the package builder, probably a good one to add to the build schecking github actions too ?

beadon avatar Sep 20 '25 08:09 beadon

The configure flag was added specifically for these issues, so it should be added for those archs.

rakshasa avatar Sep 20 '25 08:09 rakshasa