bspwm icon indicating copy to clipboard operation
bspwm copied to clipboard

Bspwm freezes (process stopped, status 'T') irregularly

Open aaronapple opened this issue 5 years ago • 22 comments

I'm running bspwm (0.9.9-1) on Arch Linux (5.7.6-arch1-1) on a Lenovo Thinkpad X1C (6th gen).

Bspwm will freeze at irregular intervals and I'm not sure why.

When it freezes, I can only interact with the active/focused window using the keyboard. I can move the mouse, but I cannot interact with any windows or the window manager with it. I cannot use sxhkd keybinds to switch desktops or open/close windows. But sxhkd does not seem to be dead -- I have dmenu mapped to a keybind and dmenu gets succesfully drawn on the screen (I tried to send bspc commands through dmenu but that did not work).

After a freeze, if I drop to a different tty I can see that the bspwm process is stopped with status 'T'. There is also a process from xinit (I think, 'dbus-launch --sh-syntax --exit-with-session bspwm') that is stopped with status 'T'. Sometimes I can send a SIGCONT signal to the bspwm process and it will go back to working fine, though other times this trick doesn't work and I have to kill the process.

So far I have not noticed a regular pattern that predicts when bspwm will stop. It doesn't always happen, sometimes I can work all day with no problem. Other times it will freeze every few minutes. This has also happened when doing nothing: I started bspwm and let it sit for an hour without doing anything else, and found it was T'd when I returned.

Can anyone recommend any debugging steps? My plan is to recompile bspwm in debug mode, and then hopefully find some logs that can indicate what might be sending a stop signal to bspwm. Where could I look for this kind of information?

aaronapple avatar Jun 30 '20 21:06 aaronapple

I have the same issue. You've gotten farther than I have. I assumed it was sxhkd freezing.

jbh avatar Jul 07 '20 19:07 jbh

In my case the entire system freezes - I can't move my mouse, cursor, or anything however I can still hear my song running on the computer. It doesn't seem to be receiving keyboard input. Not suer what the source of the problem is but pls send a fix.

pradyungn avatar Jul 09 '20 02:07 pradyungn

I was having a similar issue yesterday, everything would work fine for a few minutes on boot. Then it would freeze locking out any mouse selection except on the window focused when it crashed. It started almost immediately after editing some 'per monitor' settings in bspwmrc. I was trying to remove the extra gap on one of my monitors like this:

bspc config -m DP-2 top_padding $(($PANEL_HEIGHT-$gap)) bspc config -m HDMI-0 top_padding -$gap

It worked as expected but was freezing as you described, reverting it back to universal top padding across both monitors fixed it:

bspc config top_padding $(($PANEL_HEIGHT-$gap))

Hopefully this helps someone troubleshoot it a little further. I don't know how this would translate to a laptop if you're only using one monitor though.

grembino avatar Jul 22 '20 19:07 grembino

Unfortunately, I have no per-monitor configuration. My config is fairly simple. Hope your info helps others, though.

jbh avatar Jul 22 '20 20:07 jbh

maybe try commenting out lines 15 and 16? worth a shot.

grembino avatar Jul 22 '20 20:07 grembino

I might give that a shot, but having default workspace names will be annoying, and I wouldn't see that as a fix.

jbh avatar Jul 22 '20 20:07 jbh

I am experiencing a similar bug both on my pc and laptop.

I have compiled bspwm in debug by editing PKGBUILD like this

build() {
   cd $pkgname
-  make PREFIX=/usr
+  make PREFIX=/usr debug
}

After I attach to the running bspwm process with gdb it is unable to find debugging symbols. How do I locate debugging symbols so that I can attach a backtrace when the bug happens?

Also, when quitting gdb session it presumably sends a SIGCONT signal to bspwm and it bspwm draws windows/switches workspaces according to the keys I pressed while it was in 'T' state.

mmskv avatar Aug 24 '21 11:08 mmskv

@mmskv

After I attach to the running bspwm process with gdb it is unable to find debugging symbols.

you are building with makepkg? check makepkg.conf's manual - specifically OPTIONS array, strip, and debug.

but for the main issue: doesn't status T pretty much mean the cause is external to bspwm?

ortango avatar Aug 25 '21 01:08 ortango

I have the same issue. My cursor freezes into a text editing cursor, even when not hovering above a text field. Mouse clicks do nothing, focusing by hovering doesn't work. So far, it has always happened while I'm working in Visual Studio Code. I'm not sure if that's the cause of the problem, though. I've tried killing most programs but nothing seems to fix it, so I need to restart bspwm and lose all my (unsaved) work. The keyboard still works, but no bspc commands respond.

I noticed that when it freezes, all bspc processes seem to hang indefinitely, like in this Reddit post: https://old.reddit.com/r/bspwm/comments/gtht6i/bspwmbspc_stops_responding_after_a_while/

If I kill all bspc instances and try any bspc command (in a terminal), it will just hang and do nothing, it won't even timeout or return anything. I have to manually kill the process with Ctrl+C.

I think this started happening to me a couple of weeks ago. I haven't noticed the T status for the bspwm process, I will check that next time. I'll compile bspwm in debug mode and update here when it happens again.

EDIT:

It just happened again, and I noticed I don't have the T status for bspwm, or for the frozen bspc instances. All statuses are S.

zjeffer avatar Oct 21 '21 19:10 zjeffer

It froze again today, and I ran gcore on bspwm and two frozen bspc processes (compiled with make debug). I have no idea how to analyze these, but maybe someone else does, here they are:

bspwm-dumps.zip

zjeffer avatar Oct 26 '21 17:10 zjeffer

I used gdb to get a full backtrace of bspwm and one of the bspc processes. Here's what it returned:

bspwm:

#0  0x00007fc016b67907 in write () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x00007fc016af89cd in _IO_file_write@@GLIBC_2.2.5 () from /usr/lib/libc.so.6
No symbol table info available.
#2  0x00007fc016af7d46 in new_do_write () from /usr/lib/libc.so.6
No symbol table info available.
#3  0x00007fc016af9a69 in __GI__IO_do_write () from /usr/lib/libc.so.6
No symbol table info available.
#4  0x00007fc016af7b68 in __GI__IO_file_sync () from /usr/lib/libc.so.6
No symbol table info available.
#5  0x00007fc016aecba6 in fflush () from /usr/lib/libc.so.6
No symbol table info available.
#6  0x000055d5cbfc8766 in print_report (stream=0x55d5ccc88380) at src/subscribe.c:141
No locals.
#7  0x000055d5cbfc8863 in put_status (mask=SBSC_MASK_REPORT) at src/subscribe.c:155
        next = 0x55d5ccc882c0
        sb = 0x55d5ccc8fdc0
        ret = 21973
#8  0x000055d5cbfa7155 in focus_node (m=0x55d5ccc8fc00, d=0x55d5ccc8ef40, n=0x55d5ccc9f0a0) at src/tree.c:657
        guess = true
        desk_changed = false
        has_input_focus = false
#9  0x000055d5cbfa9614 in remove_node (m=0x55d5ccc8fc00, d=0x55d5ccc8ef40, n=0x55d5ccca0100) at src/tree.c:1448
No locals.
#10 0x000055d5cbfb0c23 in unmanage_window (win=29360169) at src/window.c:238
        loc = {monitor = 0x55d5ccc8fc00, desktop = 0x55d5ccc8ef40, node = 0x55d5ccca0100}
#11 0x000055d5cbfadcd1 in unmap_notify (evt=0x55d5ccc9e6f0) at src/events.c:258
        e = 0x55d5ccc9e6f0
#12 0x000055d5cbfad457 in handle_event (evt=0x55d5ccc9e6f0) at src/events.c:51
        resp_type = 18 '\022'
#13 0x000055d5cbf9f729 in main (argc=1, argv=0x7ffd35587d08) at src/bspwm.c:252
        pr = 0x0
        descriptors = {__fds_bits = {8, 0 <repeats 15 times>}}
        socket_path = "/tmp/bspwm_0_0-socket", '\000' <repeats 234 times>
        state_path = '\000' <repeats 255 times>
        run_level = 0
        sock_fd = 4
        cli_fd = 11
        dpy_fd = 3
        max_fd = 4
        n = 8
        sock_address = {sun_family = 1, sun_path = "/tmp/bspwm_0_0-socket", '\000' <repeats 86 times>}
        msg = "node\000-c\000\000iled\000\000g\000\000:^1\000\000ate=fullscreen\000\000\000\000ating\000focus=off\000\000rules/vlc.sh", '\000' <repeats 8121 times>
        event = 0x55d5ccc9e6f0
        end = 0x0
        opt = -1
Detaching from program: /usr/bin/bspwm, process 702709
[Inferior 1 (process 702709) detached]

bspc:

#0  0x00007fc831b6daf7 in poll () from /usr/lib/libc.so.6
No symbol table info available.
#1  0x0000559bf30706e5 in main (argc=0, argv=0x7ffd5f210af8) at src/bspc.c:88
        sock_fd = 3
        sock_address = {sun_family = 1, sun_path = "/tmp/bspwm_0_0-socket", '\000' <repeats 86 times>}
        msg = "node\000-f\000west", '\000' <repeats 8179 times>
        rsp = '\000' <repeats 2632 times>...
        sp = 0x0
        msg_len = 13
        ret = 0
        nb = 0
        fds = {{fd = 3, events = 1, revents = 0}, {fd = 1, events = 16, revents = 0}}
Detaching from program: /usr/bin/bspc, process 1936906
[Inferior 1 (process 1936906) detached]

@ortango does this help at all?

zjeffer avatar Nov 03 '21 18:11 zjeffer

i think you've got a broken fifo issue. if you use the -f option for bspc subscribe and then stop listening to that fifo you will get a backtrace like this.

~~but since 3db5a66f19e0f162414196630ee8b10622b434f4 i'm not sure how easy it is to trigger~~ (actually just having trouble triggering this in general). what version of bspwm are you using?

ortango avatar Nov 04 '21 14:11 ortango

I'm using the latest git version: 0.9.10.r33.ge22d0fa-1. This issue started happening sometime near the end of September. It happens almost every day, sometimes multiple times a day.

Is there anything I can do once it's frozen that might give more info about what triggers the issue?

zjeffer avatar Nov 04 '21 15:11 zjeffer

bspwm hasn't had any updates in that period, maybe a config change that you can recall? a panel is a common usage of bspc subscribe -f report.

you can try lsof -p$(pgrep -x bspwm) and check for fifos that might be causing the issue. then check who else has the file open - which very well could be no one (when the issue is triggered).


just to make sure, if you are running gdb with handle SIGPIPE nostop, do you still get the same gdb backtrace when bspwm locks up?

ortango avatar Nov 04 '21 15:11 ortango

a panel is a common usage of bspc subscribe -f report.

I'm using polybar, which has had updates these last couple of months.

Here's the output of lsof -p$(pgrep -x bspwm), when not frozen:

COMMAND PID    USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
bspwm   931 zjeffer  cwd    DIR              259,7     4096    71281 /home/zjeffer
bspwm   931 zjeffer  rtd    DIR              259,7     4096        2 /
bspwm   931 zjeffer  txt    REG              259,7   515520 31721691 /usr/bin/bspwm
bspwm   931 zjeffer  mem    REG              259,7    26216    22941 /usr/lib/libXdmcp.so.6.0.0
bspwm   931 zjeffer  mem    REG              259,7    14064    22945 /usr/lib/libXau.so.6.0.0
bspwm   931 zjeffer  mem    REG              259,7  2150424     3470 /usr/lib/libc-2.33.so
bspwm   931 zjeffer  mem    REG              259,7    14064    22994 /usr/lib/libxcb-shape.so.0.0.0
bspwm   931 zjeffer  mem    REG              259,7    14064    23009 /usr/lib/libxcb-xinerama.so.0.0.0
bspwm   931 zjeffer  mem    REG              259,7    67312    22979 /usr/lib/libxcb-randr.so.0.1.0
bspwm   931 zjeffer  mem    REG              259,7    54976    27244 /usr/lib/libxcb-ewmh.so.2.0.0
bspwm   931 zjeffer  mem    REG              259,7    17968    27247 /usr/lib/libxcb-icccm.so.4.0.0
bspwm   931 zjeffer  mem    REG              259,7    13944    27252 /usr/lib/libxcb-keysyms.so.1.0.0
bspwm   931 zjeffer  mem    REG              259,7    26304    27229 /usr/lib/libxcb-util.so.1.0.0
bspwm   931 zjeffer  mem    REG              259,7   165648    23027 /usr/lib/libxcb.so.1.1.0
bspwm   931 zjeffer  mem    REG              259,7  1323472     3483 /usr/lib/libm-2.33.so
bspwm   931 zjeffer  mem    REG              259,7   221480     3459 /usr/lib/ld-2.33.so
bspwm   931 zjeffer    0r   CHR                1,3      0t0        4 /dev/null
bspwm   931 zjeffer    1w   CHR                1,3      0t0        4 /dev/null
bspwm   931 zjeffer    2w   REG              259,7    34390    12481 /home/zjeffer/.config/bspwm/.bspwm.err
bspwm   931 zjeffer    3u  unix 0x000000009a9d4b34      0t0    24710 type=STREAM (CONNECTED)
bspwm   931 zjeffer    4u  unix 0x00000000b4b5d9cc      0t0    24711 /tmp/bspwm_0_0-socket type=STREAM (LISTEN)
bspwm   931 zjeffer    5u  unix 0x00000000cf2194c7      0t0    27498 /tmp/bspwm_0_0-socket type=STREAM (CONNECTED)
bspwm   931 zjeffer    6u  unix 0x00000000a54f4522      0t0    28518 /tmp/bspwm_0_0-socket type=STREAM (CONNECTED)
bspwm   931 zjeffer    7u  unix 0x00000000586379f1      0t0    28519 /tmp/bspwm_0_0-socket type=STREAM (CONNECTED)

just to make sure, if you are running gdb with handle SIGPIPE nostop, do you still get the same gdb backtrace when bspwm locks up?

I'll try that the next time it freezes. Do I just type handle SIGPIPE nostop in gdb and then bt full?

zjeffer avatar Nov 04 '21 16:11 zjeffer

I just noticed that executing bspc subscribe -f report triggers the problem. Is this supposed to happen? :D

When executing that, it returns /run/user/1000/bspwm_fifo.cBJ8aN

Executing the command again makes the command hang and shows no output (because all bspc hang when it's frozen)

zjeffer avatar Nov 04 '21 16:11 zjeffer

I'll try that the next time it freezes. Do I just type handle SIGPIPE nostop in gdb and then bt full?

if you are starting gdb before the lockup just enter that in the prompt and continue. if you are starting gdb after lockup has occurred you shouldn't need to worry about it. i just want to avoid a backtrace from a lockup that can occur from gdb (at the exact spot you backtrace is from), which is not what we want.

I just noticed that executing bspc subscribe -f report

yep. if there is no reader then the fifo will block there is no chance for remove_subscriber() to run in that case so it will always happen until you read from that fifo.

ortango avatar Nov 04 '21 16:11 ortango

if you are starting gdb before the lockup just enter that in the prompt and continue. if you are starting gdb after lockup has occurred you shouldn't need to worry about it. i just want to avoid a backtrace from a lockup that can occur from gdb (at the exact spot you backtrace is from), which is not what we want.

Oh right, I never start gdb until after the lockup happens.

yep. if there is no reader then the fifo will block there is no chance for remove_subscriber() to run in that case so it will always happen until you read from that fifo.

Is there a way for me to artificially read from that fifo, so that I can stop bspwm from hanging?

zjeffer avatar Nov 04 '21 16:11 zjeffer

Is there a way for me to artificially read from that fifo, so that I can stop bspwm from hanging?

sure, cat $thatfile will work.

ortango avatar Nov 04 '21 16:11 ortango

I managed to recover from Today's Crash™ by killing lots of programs that were running in the background. I think the one that fixed it was this polybar script. I found that it was still running even after killing polybar.

Next time it crashes (probably in a couple of hours or tomorrow), I'll see if killing it fixes it again.

Here are the outputs of lsof with various programs when it was frozen:

for bspwm

COMMAND PID    USER   FD   TYPE             DEVICE SIZE/OFF     NODE NAME
bspwm   921 zjeffer  cwd    DIR              259,7     4096    71281 /home/zjeffer
bspwm   921 zjeffer  rtd    DIR              259,7     4096        2 /
bspwm   921 zjeffer  txt    REG              259,7   515520 31721691 /usr/bin/bspwm
bspwm   921 zjeffer  mem    REG              259,7    26216    22941 /usr/lib/libXdmcp.so.6.0.0
bspwm   921 zjeffer  mem    REG              259,7    14064    22945 /usr/lib/libXau.so.6.0.0
bspwm   921 zjeffer  mem    REG              259,7  2150424     3470 /usr/lib/libc-2.33.so
bspwm   921 zjeffer  mem    REG              259,7    14064    22994 /usr/lib/libxcb-shape.so.0.0.0
bspwm   921 zjeffer  mem    REG              259,7    14064    23009 /usr/lib/libxcb-xinerama.so.0.0.0
bspwm   921 zjeffer  mem    REG              259,7    67312    22979 /usr/lib/libxcb-randr.so.0.1.0
bspwm   921 zjeffer  mem    REG              259,7    54976    27244 /usr/lib/libxcb-ewmh.so.2.0.0
bspwm   921 zjeffer  mem    REG              259,7    17968    27247 /usr/lib/libxcb-icccm.so.4.0.0
bspwm   921 zjeffer  mem    REG              259,7    13944    27252 /usr/lib/libxcb-keysyms.so.1.0.0
bspwm   921 zjeffer  mem    REG              259,7    26304    27229 /usr/lib/libxcb-util.so.1.0.0
bspwm   921 zjeffer  mem    REG              259,7   165648    23027 /usr/lib/libxcb.so.1.1.0
bspwm   921 zjeffer  mem    REG              259,7  1323472     3483 /usr/lib/libm-2.33.so
bspwm   921 zjeffer  mem    REG              259,7   221480     3459 /usr/lib/ld-2.33.so
bspwm   921 zjeffer    0r   CHR                1,3      0t0        4 /dev/null
bspwm   921 zjeffer    1w   CHR                1,3      0t0        4 /dev/null
bspwm   921 zjeffer    2w   REG              259,7    34994    12481 /home/zjeffer/.config/bspwm/.bspwm.err
bspwm   921 zjeffer    3u  unix 0x00000000efc06079      0t0    23265 type=STREAM (CONNECTED)
bspwm   921 zjeffer    4u  unix 0x000000002bc08b06      0t0    23266 /tmp/bspwm_0_0-socket type=STREAM (LISTEN)
bspwm   921 zjeffer    5u  unix 0x000000000f21a470      0t0    44447 /tmp/bspwm_0_0-socket type=STREAM (CONNECTED)
bspwm   921 zjeffer    6u  unix 0x00000000aff6a9cd      0t0    44448 /tmp/bspwm_0_0-socket type=STREAM (CONNECTED)
bspwm   921 zjeffer    7u  unix 0x000000001298ba52      0t0    91035 /tmp/bspwm_0_0-socket type=STREAM (CONNECTED)
bspwm   921 zjeffer    8u  unix 0x00000000080dbb9b      0t0    91036 /tmp/bspwm_0_0-socket type=STREAM (CONNECTED)

bspc

COMMAND     PID    USER   FD   TYPE             DEVICE  SIZE/OFF     NODE NAME
bspc    1291042 zjeffer  cwd    DIR              259,7      4096    71281 /home/zjeffer
bspc    1291042 zjeffer  rtd    DIR              259,7      4096        2 /
bspc    1291042 zjeffer  txt    REG              259,7     26208 31721693 /usr/bin/bspc
bspc    1291042 zjeffer  mem    REG              259,7     26216    22941 /usr/lib/libXdmcp.so.6.0.0
bspc    1291042 zjeffer  mem    REG              259,7     14064    22945 /usr/lib/libXau.so.6.0.0
bspc    1291042 zjeffer  mem    REG              259,7   2150424     3470 /usr/lib/libc-2.33.so
bspc    1291042 zjeffer  mem    REG              259,7     14064    22994 /usr/lib/libxcb-shape.so.0.0.0
bspc    1291042 zjeffer  mem    REG              259,7     14064    23009 /usr/lib/libxcb-xinerama.so.0.0.0
bspc    1291042 zjeffer  mem    REG              259,7     67312    22979 /usr/lib/libxcb-randr.so.0.1.0
bspc    1291042 zjeffer  mem    REG              259,7     54976    27244 /usr/lib/libxcb-ewmh.so.2.0.0
bspc    1291042 zjeffer  mem    REG              259,7     17968    27247 /usr/lib/libxcb-icccm.so.4.0.0
bspc    1291042 zjeffer  mem    REG              259,7     13944    27252 /usr/lib/libxcb-keysyms.so.1.0.0
bspc    1291042 zjeffer  mem    REG              259,7     26304    27229 /usr/lib/libxcb-util.so.1.0.0
bspc    1291042 zjeffer  mem    REG              259,7    165648    23027 /usr/lib/libxcb.so.1.1.0
bspc    1291042 zjeffer  mem    REG              259,7   1323472     3483 /usr/lib/libm-2.33.so
bspc    1291042 zjeffer  mem    REG              259,7    221480     3459 /usr/lib/ld-2.33.so
bspc    1291042 zjeffer    0r   CHR                1,3       0t0        4 /dev/null
bspc    1291042 zjeffer    1w   REG              259,7 216069828    26743 /home/zjeffer/.config/sxhkd/.sxhkd.err
bspc    1291042 zjeffer    2w   REG              259,7 216069828    26743 /home/zjeffer/.config/sxhkd/.sxhkd.err
bspc    1291042 zjeffer    3u  unix 0x0000000014ef08ab       0t0  2091374 type=STREAM (CONNECTED)

Both of these look pretty normal to me. Next crash, I'll execute lsof on the polybar script and post the output.

zjeffer avatar Nov 06 '21 14:11 zjeffer

i hadn't seen the updated backtrace.

you can also use gdb to print the fifo_path @ the print_report() frame (sb->fifo_path), or inspect that subscriber in general to get a hint of who may be the culprit.

i should have mentioned earlier, but did not. your issue is different from this one; it is not being triggered by a stopped bspwm process. (imo, bspwm "freezing" with a stopped status is expected behavior, but that is besides the point.)

ortango avatar Nov 06 '21 19:11 ortango

i should have mentioned earlier, but did not. your issue is different from this one; it is not being triggered by a stopped bspwm process. (imo, bspwm "freezing" with a stopped status is expected behavior, but that is besides the point.)

I see, I'll move my findings to #1318, which seems to be more in line with my issue. I thought it was a duplicate.

zjeffer avatar Nov 06 '21 20:11 zjeffer