Fix a crash after handling SIGINT and a data race when initializing the Hyprland workspace modules

Open cfillion opened this issue 1 year ago • 1 comments

Bug 1: Crash when receiving signals after `main` returns

The waybar process does not exit instantaneously. Signals may be received after main has started freeing resources.

When a worker thread is in fgets this time window can last forever (until Hyprland writes to the socket). An easy way to duplicate the crash is pressing ^C twice with a Hyprland module waiting on socket2.

Thread 1 "waybar" received signal SIGSEGV, Segmentation fault.
spdlog::sinks::sink::should_log (this=0x5f620b542ca5,
    msg_level=spdlog::level::info)
    at /usr/src/debug/spdlog/spdlog-1.14.1/include/spdlog/sinks/sink-inl.h:13
13	  return msg_level >= level_.load(std::memory_order_relaxed);
(gdb) p $_siginfo._sifields._sigfault.si_addr
$1 = (void *) 0x5f620b542cad

Bug 2: Random crashes and/or Hyprland socket connection failure

Workspaces::* and IPC::startIPC may both call getSocketFolder at the same time and are racing for the same global socketFolder_.

This is likely the root cause of #3663.

Typical crash A:

[2024-10-16 07:42:09.987] [info] Hyprland IPC starting
malloc(): unaligned tcache chunk detected
[2024-10-16 07:42:09.987] [error] Hyprland IPC: Unable to connect?
Thread 1 "waybar" received signal SIGABRT, Aborted.
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
(omitted for brievety)
#9  0x00007ffff64ae745 in operator new (sz=sz@entry=296) at /usr/src/debug/gcc/gcc/libstdc++-v3/libsupc++/new_op.cc:50
#10 0x00007ffff65ab1f1 in std::filesystem::__cxx11::path::_List::_Impl::copy (this=0x555555a23350) at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++17/fs_path.cc:249
#11 0x00007ffff65ab3bd in std::filesystem::__cxx11::path::_List::_List (this=0x7fffffff9d30, other=<optimized out>) at /usr/src/debug/gcc/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:454
#12 0x00005555556f4ab1 in waybar::modules::hyprland::IPC::getSocket1Reply(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#13 0x00005555556f5e3d in waybar::modules::hyprland::IPC::getSocket1JsonReply(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#14 0x000055555571289c in waybar::modules::hyprland::Workspaces::setCurrentMonitorId() ()

Typical crash B:

[2024-10-16 10:01:15.859] [info] Hyprland IPC starting
[2024-10-16 10:01:15.859] [info] Loading persistent workspaces from Hyprland workspace rules
Thread 8 "waybar" received signal SIGSEGV, Segmentation fault.
(gdb) bt
#0  std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy
    (__d=0x5555558fbca8 "/", __s=0x2973961a26d35726 <error: Cannot access memory at address 0x2973961a26d35726>, __n=1)
    at /usr/src/debug/gcc/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:433
(omitted for brievety)
#15 waybar::modules::hyprland::IPC::getSocketFolder[abi:cxx11](char const*)
    (instanceSig=0x7fffffffe604 "4520b30d498daca8079365bdb909a8dea38e8d55_1729051218_1982280648") at ../src/modules/hyprland/backend.cpp:41
#16 0x000055555564230f in waybar::modules::hyprland::IPC::startIPC()::{lambda()#1}::operator()() const ()
    at ../src/modules/hyprland/backend.cpp:70
#17 0x00007ffff64e1c34 in std::execute_native_thread_routine (__p=0x5555558119c0) at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/thread.cc:104
#18 0x00007ffff62a339d in start_thread (arg=<optimized out>) at pthread_create.c:447

Oct 17 '24 10:10 cfillion

On Bug 2: I also believe that was causing issue #3663. After having a look at the return value of getSocketFolder function, I realized that the problem was that the function was cacheing the path of the sockets folder on the global variable socketFolder_, which was being raced by two different threads.

After commenting the lines:

  if (!socketFolder_.empty()) {
    return socketFolder_;
  }

the problem seems to have dissappeared, but I believe that the mutex is the most adequate solution. Thanks!

Oct 18 '24 17:10 vpcano

Fix a crash after handling SIGINT and a data race when initializing the Hyprland workspace modules

Bug 1: Crash when receiving signals after main returns

Bug 2: Random crashes and/or Hyprland socket connection failure

Bug 1: Crash when receiving signals after `main` returns