wluma icon indicating copy to clipboard operation
wluma copied to clipboard

[Feature request]: Refresh on events instead of on a timer

Open LordMZTE opened this issue 7 months ago • 8 comments

Please describe your feature request

I use wluma with an IIO sensor on the River compositor. In this scenario, wluma will use several sleep loops:

  • https://github.com/maximbaz/wluma/blob/ea4c8c6ae689655950ccbc1c2c214f5c0e4562ee/src/als/controller.rs#L35
  • https://github.com/maximbaz/wluma/blob/ea4c8c6ae689655950ccbc1c2c214f5c0e4562ee/src/brightness/controller.rs#L75

I think this is a concern for battery usage. I have observed up to 10% CPU usage when moving around my system whereas it's almost 0 without wluma running. I think there is lots of room for improvement, and I think most of these will come down to avoiding sleep loops and generally avoiding computations where possible. Here are some suggestions:

  • I think we should replace the current architecture of multiple worker threads with one event loop-based one. This would ease load on the OS scheduler. This could be implemented low-level with an OS API such as poll or epoll (see the respective man pages), or a Rust event loop library like tokio or smol. This would move the load to one thread, avoiding scheduling and synchronization overhead.
  • Regarding the ALS controller, we should be able to convert at least the IIO backend to be evented. Instead of reading from sysfs periodically, we could instead monitor it using using inotify to get notified of brightness changes (I believe we should get events here; iio-sensor-proxy does it somehow, anyways).
  • Regarding the brightness controller, the loop only seems to do anything meaningful when we either have a new value on the prediction_rx channel or are in an ongoing transition. This would seem easy enough to convert to a system that blocks entirely when neither is the case. If we were to use tokio, we could tokio::spawn a worker for the transition (which isn't a real thread) and otherwise block on the channel to avoid a sleep loop.
  • Regarding periodic screen captures, I think it would be useful to have an opt-in "energy-saver" mode, which could work by not constantly polling the screen but instead only doing so on certain events emitted by the compositor such as:
    • Workspace switches (unfortunately, every compositor has it's own protocol here)
    • Various changes to the toplevel, which can be detected on wlroots with wlr-foreign-toplevel-management-unstable-v1 (see the events described there)
    • The process receiving a SIGUSR1 signal for scripting.

I think all of this should be doable entirely without sleep loops. Please let me know if you have any questions, I'm also open to help with implementing this. Cheers!

LordMZTE avatar May 30 '25 07:05 LordMZTE

Hello! Thank you for opening a very interesting discussion!

I'm all up for making performance better, but I'd like us to tackle more impactful things first.

I don't have anything against tokio or smol for example, but I just don't believe that OS threads have any "noticeable" overhead. We can do it if it makes code more readable or easier to work with, but not because of expected performance improvements. Happy to be proven wrong by the way 😁 I just think there's a reason that rust considered and chose against having built-in green threads (although I vaguely know that there were some design discussions and not everyone agreed with this).

With brightness controller in particular, what you said is correct, but you missed the fact that we also have to detect when a user changed brightness - this starts the learning phase, where we need to relatively quickly detect the change, so that we can record the ALS and screen contents at the moment of manual adjustment. While I do think it still could be improved with what you suggest, again I find it hard to believe that the current operation mode of while true { sleep(100ms) } has any meaningful impact on the CPU usage.

I'm all in favour of replacing polling with events, whenever it's possible! I'm not sure we will actually improve the performance if we start reading IIO sensor values on inotify events rather than once every 100ms - just because I think they change more frequently than that, so we will get wluma to work more and not less!

It's an interesting idea regarding periodic screen captures, I just want to add that it's only half-true that we are periodically polling the screen contents - at least starting with the screencopy protocol, the compositor actually blocks and doesn't send the frame back until the frame has changed. My best guess to your observation of 10% CPU usage lies exactly here - I think you were moving around, changing the screen contents very very often, which in turn made the compositor render a bunch of frames and send them to wluma. If you are working in an editor and typing something occasionally, or reading an article and slowly scrolling it through, the amount of times wluma receives new frames is extremely small (unless you have a flashing ad in the webpage you are reading - an absolutely true example where an adblock saves a ton of battery life! 😅). I see the point here, but I don't know if it's necessarily a good experience if wluma won't react when you switch from a dark to a bright tab in your browser. When nothing happens on the screen, wluma is mostly sleeping and not using almost any energy, and when screen contents is changing we have to spend the energy to react on the screen changes - otherwise what's the point in using screen capturing? (screen capturing can be fully disabled btw, if you didnt know, could maybe be used as an "energy-saver" mode).

Let me know what your thoughts are after reading the above! I want to emphasize that I am not trying to say "no" to any of your ideas, I just want to manage your expectations, so that you don't end up spending a lot of energy rewriting the entire app, only to realize that those optimizations didn't bring any effect or made the app unusable as it doesn't react even on simple browser tab switching.

max-baz avatar May 30 '25 10:05 max-baz

Thanks for your response!

[...] I just don't believe that OS threads have any "noticeable" overhead.

This is actually a very complex topic, but one I think is only of mediocre relevance for this discussion. The main point is that it seems to be mostly agreed upon that repeated sleeping in loops is bad, not only for power reasons (though that is definetely a thing[1]), but also because it introduces latency, making program behavior generally less predictable. I think it also causes "contention" in the kernel's scheduler, but that I'm not too sure on.

The main advantage of using an async runtime is having a nice API around the OS's IO multiplexing APIs (poll, epoll, io_uring, ...), which would help us greatly in coordinating an IO-heavy application like wluma while avoiding threads (which I think would be nice to have). It would also give us the ability to cheaply spawn new short-lived threads for work like the transition we currently do in the brightness controller.

I must also admit that I generally have a gripe with the sleep loop pattern. I think it's elegant if a program that is currently not doing any useful work is completely blocked as well as that a program that gets some work to do does it immediately without any sleeping involved until its work is done. Neither are currently the case and I don't think I'm alone with this view.

[...] you missed the fact that we also have to detect when a user changed brightness

Thanks for the hint! I think this should also be easy enough to rewrite using inotify (I can confirm that I get events for manual brightness changes on /sys/class/backlight/amdgpu_bl1/brightness). This is also doable in one loop using one of the aforementioned IO multiplexing APIs or a rusty abstraction (which tokio and smol provide, see tokio::select!). This would also have the benefit of having zero delay from when the user manually changed brightness to us detecting that.

I'm not sure we will actually improve the performance if we start reading IIO sensor values on inotify events rather than once every 100ms - just because I think they change more frequently than that [...]

It seems that you're correct. On further investigation, it seems that there are no inotify events for in_illuminance_raw at all. I think this is probably caused by a read of that file translating directly to a hardware read operation on the physical sensor, without there even being any sort of event emitted. We'll probably not get away without polling here, but even then I think we should make the interval configurable. While the overhead might be negligible, I'd personally prefer something way higher than 100ms. (I suppose this would be a generalized version of #91)

[...] the compositor actually blocks and doesn't send the frame back until the frame has changed.

This is really interesting! I still believe that there are many cases where it's undesirable to capture the screen every time the frame changes - say when I'm watching a video. (Note: of course you could say that you'd want your screen brightness adjusting to the video regularly - but I actually see this differently. A well-made movie for example will already have a well-balanced brightness in of itself, and I'd like that to be respected.) I'd likely prefer automatic brightness changes be entirely disabled while watching a video, with the option to manually trigger a recalculation, which is nicely doable by binding a key to pkill -USR1 wluma if we implement my proposal of handling SIGUSR1. Not sure how many users will share my opinion here, but that's why I think this would be best as an opt-in.

Your example of switching a browser tab is actually something that would be handled quite well by my proposal. All browsers I know of would adjust their window title in this case, emitting a title change event we could use as a trigger to recalculate brightness.

A separate note: It seems that wluma also incurs a non-negligible GPU load. On my Radeon 780M iGPU, nvtop reports a constant GPU utilization by wluma of 2%, even when I move the window with nvtop to an external monitor so the laptop screen is entirely still. This however seems to only cause an extra power consumption of 2W, which doesn't seem too bad. Still a little strange. Are we re-processing the frame each loop iteration even when it hasn't changed?


[1]: https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdf (I don't pretend to undestand this, I've pretty much only been able to read "sleeping causes high power onsumption because scheduler shenanigans" here :P)

LordMZTE avatar May 30 '25 14:05 LordMZTE

Note: of course you could say that you'd want your screen brightness adjusting to the video regularly - but I actually see this differently. A well-made movie for example will already have a well-balanced brightness in of itself, and I'd like that to be respected.

Haha I actually agree with you on this, I have even in my taskbar a button "movie mode", which basically stops or starts wluma systemd service on press 😂 I don't mind SIGUSR1 idea at all, but just to say that you can simply stop and start the entire process already now, and move this idea to the end of the "nice ideas" list 🙃

Your example of switching a browser tab is actually something that would be handled quite well by my proposal. All browsers I know of would adjust their window title in this case, emitting a title change event we could use as a trigger to recalculate brightness.

There would be other edge-cases, e.g. it's not uncommon for a single long webpage to have "dark" and "bright" sections, so you might be scrolling through a single page and wondering why wluma doesn't help me out here. It's a neat idea, I just can't stop thinking that it goes against the idea of "just launch wluma and forget, your brightness value will always be perfect" due to those edge-cases... So I know I wouldn't personally use that events filter, and I wouldn't be recommending others to activate these filters, exactly because it will move us from the world "if brightness value is unpleasant, it's 100% wluma's fault" to a world "if brightness value is unpleasant, maybe it's a bug in wluma, or maybe user did something wrong". Let's continue brainstorming this idea, I just want to say that I'm much more excited about getting the "out of the box" experience better / less power hungry 😄

While the overhead might be negligible, I'd personally prefer something way higher than 100ms. (I suppose this would be a generalized version of https://github.com/maximbaz/wluma/pull/91)

Yeah, although let's keep in mind the issue that blocked #91 in the first place - if we detect a user changing the brightness, but the information about screen contents or ambient light is outdated, we risk corrupting the model with invalid data points - e.g. we save the entry that the user wants 90% brightness in total darkness, while in reality they have turned on the lights - that's why we can't poll say every few seconds. It might be easier to solve with IIO than the webcam though, if we have a way to quickly take the latest value once we detect brightness change. This needs testing, for example on my old laptop if I polled it too infrequently, it felt like sensor was going to sleep, and the first reading was totally wrong, it just returned some old value while the sensor was "booting up" to take the actual reading.

Are we re-processing the frame each loop iteration even when it hasn't changed?

We are only processing a frame once we receive it from the compositor. But you are right, I just tried now as well, and even when nothing is changing on my screen, I am also continuously receiving new frames from the compositor. I remember for sure that this wasn't the case, maybe something changed 🤔 It's something potentially to look into!

I think it's elegant if a program that is currently not doing any useful work is completely blocked as well as that a program that gets some work to do does it immediately without any sleeping involved until its work is done.

I agree 😁

To conclude on the tokio / smol idea: I don't deny that there might be some minor perf improvements, but my expectations are low; however I would place a lot of value if this makes the code more elegant, smaller, easier or more readable. The current dance with creating a dynamic number of threads in the main.rs is definitely not newcomer-friendly. And similarly, if we make the code even bigger or harder to grasp while chasing potential reduction of latency, then I think I would vote against it.

How does this sound to you? If you feel passionate to give it a go - I'm sold 👍 Do you think we can get a feeling of how it will look like, to evaluate that it is getting easier and not even more complex, before investing too much time?

Performance-wise, currently my biggest bet for us to gain some battery life is on figuring out how to make compositor not send identical frames back to us, like it used to do in my memory 😅 I think a lot of times my screen contents is static, this is a huge saving to not reprocess the same frames over and over again. By the way, ext-image-capture-source-v1 is close to be out for general public, I'm also curious to see how wluma works there - although the support is implemented, I haven't actually tested it for real. Would be curious to see whether that protocol skips unchanged frames or not.

max-baz avatar May 30 '25 15:05 max-baz

There would be other edge-cases, e.g. it's not uncommon for a single long webpage to have "dark" and "bright" sections, so you might be scrolling through a single page and wondering why wluma doesn't help me out here.

Very good point. This feature would certainly not be without its downsides, but I think it's a neat idea nonetheless - and certainly not something for everyone.

Yeah, although let's keep in mind the issue that blocked #91 in the first place [...]

That's some strange behavior. I don't think getting up-to-date value once we've detected a brightness change is much of a problem - but a quirky IIO sensor reporting wrong values on the first few readings certainly is. Perhaps we could look into what iio-sensor-proxy does? Maybe, it has some code to handle such edge cases, as I can confirm that it does not poll the sensor regularly unless someone requests the value from the daemon. (I also haven't read much of the discussion there yet)

To conclude on the tokio / smol idea [...]

The main advantage an event loop library like that would get us is that we could turn code that looks like this:

loop {
    if can_do_a() { do_a(); }
    if can_do_b() { do_b(); }
    // ...

    sleep(100);
}

...into this:

loop {
    // smol almost certainly has something equivalent, haven't looked too far into it.
    tokio::select! {
        a = get_a_async() => do_a(a),
        b = get_b_async() => do_b(b),
    };
}

...which involves no sleeping at all.

I think this can already be applied to the brightness controller (where the transition timer would just be one more branch in that select!, which we can simply disable when we're not in a transition to avoid sleep looping otherwise).

But there is one major problem we have here: wayland-rs doesn't support async (as I've just come to discover), so we're left with a few suboptimal options:

  • We could use tokio/smol, but keep all direct wayland IO on another thread outside the runtime (perhaps using channels to communicate with code running inside the runtime)
  • We could not use an async runtime at all and instead use OS-level APIs to implement the IO multiplexing like in the above pseudocode manually (which - by the way - we can also use with wayland-rs assuming that crate hands out the file descriptor of the wayland socket and exposes an API to process already buffered events only - the underlying C library supports this). I've written code like that in Zig using C-like APIs before, but I'm completely unaware of any way to abstract this to idiomatic Rust (which isn't to say that doesn't exist, if you know of something, please let me know!); this wouldn't be pretty, but probably the fastest and most lightweight.
  • We could spawn a small ad-hoc async runtime on each of the worker threads just for this, which I don't think is all that much extra "bloat", especially with smol.

I also wouldn't use an async runtime purely because of some "magic" (and probably non-existant) performance gains, my main reason for suggesting that was so we can restructure our code to function more efficiently, like in the above case.

How does this sound to you?

I'd love to work on this, especially to get the groundwork laid here, but there are still a few uncertainties :D

Performance-wise, currently my biggest bet for us to gain some battery life is on figuring out how to make compositor not send identical frames back to us [...]

This sounds very tricky to me. If the screen capture wayland protocol (which I have zero clue about, by the way) isn't built for this, it might not be possible to detect changes at all. In this case, I think our only bet will be either a heuristic approach based on other events like I suggested or yet another sleep loop 😆

LordMZTE avatar May 30 '25 16:05 LordMZTE

Perhaps we could look into what iio-sensor-proxy does?

I am totally fine with that 👍 Just to add a disclaimer, unfortunately I currently don't have hardware with ALS sensor, so I can't be of a much help even with testing this 😞

But there is one major problem we have here: wayland-rs doesn't support async (as I've just come to discover), so we're left with a few suboptimal options:

Which one do you prefer? If you are the one to drive it, I want to make sure you feel empowered to do what you believe is the right thing. My preferences are simple - prioritize reducing code and complexity over gaining the final 0.0001% performance boost 😁 I guess that's a slight preference for option 1 + opening an issue in wayland-rs asking for async support?

This sounds very tricky to me. If the screen capture wayland protocol (which I have zero clue about, by the way) isn't built for this, it might not be possible to detect changes at all. In this case, I think our only bet will be either a heuristic approach based on other events like I suggested or yet another sleep loop 😆

Leave this for me, I want anyway to deep dive into this topic again once the new sway version is released with the new protocol, hopefully very very soon 🤞

max-baz avatar May 30 '25 16:05 max-baz

I did some testing today to see how much extra battery use I get with wluma compared to without it. I tested this at Uni today, attending 2 lectures (90 min each) while using my laptop (Framework 16 w/ Ryzen 9) to take notes in Typst (so very low load). These are the battery levels before and after the lectures:

lecture 1 with wluma:    80% -> 58% (Δ: 22%)
lecture 2 without wluma: 46% -> 27% (Δ: 19%)

This is also somewhat in favor of wluma, as not only was the battery already more empty in the second one, but that one was also so boring that I started refactoring code half way through which involved a few compiler invocations. This is obviously not entirely scientific, but I see room for improvement here :P

I also started work on moving wluma over to smol, but it's been pretty complicated so far and I also don't have a lot of time. It'll probably still take a while.

LordMZTE avatar Jun 03 '25 18:06 LordMZTE

Amazing, thanks for the update and running some tests! No rush on the timeline, but if you feel like you can scope this down, so that we can do this in smaller more manageable chunks, let's do that - for example using smol only for one controller as a start. I'm more than happy to merge multiple PRs, we dont have to go with one huge one - unless there's of course a reason for it.

max-baz avatar Jun 03 '25 20:06 max-baz

sway 1.11 is released, and I can confirm that now when I see Using ext-image-copy-capture-v1 protocol to request frames in RUST_LOG=trace wluma output, wluma only processes frames (i.e. uses GPU) when something is changing on the screen - if I move to an empty workspace, wluma is basically staying silent, no new logs are produced. This should improve the battery usage!

max-baz avatar Jun 24 '25 20:06 max-baz