Waybar freezing and high CPU load when there's lots of window events
Hi,
I just had a QEMU + gdbserver session running with a conditional breakpoint, leading to LOTS of window update events (QEMU always updates the window title QEMU ... QEMU [Paused] once per breakpoint, and with a conditional breakpoint somewhere in malloc ... that are quite some updates to be processed^^')
After ending the debugging session, I noticed my PC is still loud (so high CPU load) and waybar is not responding to me scrolling on an empty area (to change workspaces). I forgot to check if the CPU load is really from waybar, but since everything else behaved normally (especially sway), this seems logical to me. I can test this if needed.
I assume waybar was stuck processing window events and I'm not sure, if there is a lot it can do about it .. but in any case, reporting this to explain the behavior to someone else might already be worth it - or you have any ideas how to handle this more graceful :)
I'm using sway as my compositor and have modules sway/workspaces, sway/mode, sway/scratchpad and sway/window enabled in my config.
If you need more info, feel free to ask :)
It does sound plausible. Workspaces, scratchpad and window subscribe to window events.
Can you confirm please, though? ;) And, while this will depend on the machine, I'm curious what ballbark of events per time we're talking about.
The config for these modules might play a role as well, I'm thinking rewrite rules or app icon display, so if you could share the config for these modules, it might help.
At first glance, I share your sentiment of "not sure if there is a lot it can do about it". The modules kinda have to process the events in order to stay consistent after initialisation. On the other hand, if it's specific operations that are very expensive, perhaps there's a way to rate limit. Don't want to promise anything, but let's see where this leads.
Does waybar recover eventually after you stop debugging?
I've tried to reproduce this.
My first attempt was an iced-rs GUI that changes its title once per tick, but that didn't go anywhere. I don't know a lot about iced-rs, but I saw different rates on internal and external screen and I think it was just vsynced. That's not a lot of updates, a wayland client might actually be the wrong choice for this.
My second attempt was looping swaymsg focus commands via swayipc. This achieves roughly 1000 window events per second on my machine, after a while I'm seeing this sway error:
[ERROR] [sway/ipc-server.c:945] Client write buffer too big (4194304), disconnecting client
and for Waybar:
[error] Scratchpad: Unable to receive IPC header
[error] Window: Unable to receive IPC header
and sway/window doesn't update, and sway/workspaces doesn't switch workspaces. I think this might explain the behaviour. Do you see something like this in your sway log output?
p.S.: I also tried looking into qemu/gdb, debugging the linux kernel. I was not sure what to use as a breakpoint, though.
Honestly, I have no idea how to fix this. With a bit of speculation, the reason for the disconnection seems to be that the modules are not processing the events from the window subscription fast enough, but I don't see how that can be sped up.
N.B. It might be clever to wait with the next get_tree until the last one has been processed, in order to not aggravate the situation, the get_tree responses are much bigger after all, but that is a different fd and limiting this does not prevent the disconnection.
The CPU load seems to be caused by rapid event polling after the socket has been disconnected. This can be addressed by adding a longer sleep in the worker catch branch (e.g. sway/window L37), but that might be bad if it's not this specific disconnection scenario.
Even then the modules remain borked and waybar needs reloading.
I don't have any logs sadly and I didn't look at the waybar codebase yet and am only on my phone rn
But how is the reading from the sway IPC socket implemented? Read something, process, read next? If so, maybe reading everything in a ln extra thread and queue up operations would fix that - have the buffer grow in waybar instead of sway and not get disconnected.
This could then also extended to remove older unprocessed events when you read a new event that overrides them? Like you queue a window title change event, you read another one and the first isn't processed - you never gonna have to process that first one anyway, so just remove it
Does that make any sense? Or should I look at the code before I have "good ideas"?^^`
But how is the reading from the sway IPC socket implemented? Read something, process, read next? If so, maybe reading everything in a ln extra thread and queue up operations would fix that - have the buffer grow in waybar instead of sway and not get disconnected.
Well, I've not written that IPC code, but I'd say, yes, pretty much. There is a worker thread receiving the subscription data which fires a signal containing the response.
A queue could work, I suppose. The modules using the IPC code only ever use the signal to do a GET_TREE. So that could also be limited considerably. But I think I need to spend a lot more time to understand this better.
Hi @LittleFox94 , can you try to reproduce the case, but please don't use your own style.css and use waybar config by default