cosmic-epoch icon indicating copy to clipboard operation
cosmic-epoch copied to clipboard

Modern Hybrid Graphics in COSMIC

Open WatchMkr opened this issue 1 year ago • 20 comments

Putting this here as it touches lots of repos.

Separate intel/hybrid/nvidia modes are no longer necessary. Apps that are setup to prefer the dGPU automatically request it and users can request the dGPU by right-clicking an app icon and choosing the dGPU. The Power & Battery applet will display when the dGPU is in use and which apps are using it. If the user has a situation where they need the longest possible battery life or their battery is low and they need it to last longer, choosing Battery mode from the same applet will disable the dGPU (among other power saving functions like reducing CPU freq, turning off keyboard backlight, and reducing display brightness).

Power & Battery applet

  • a notification dot next to the battery icon when the dGPU is in use
  • a list in the battery applet that shows dGPU apps
  • copy that says the nvidia/amd discrete GPU is in use and will reduce battery life. Choose battery life mode to disable the dGPU.
  • Link to Power Settings is already on the applet mockups)

Power & Battery applet & Power Settings

Additions to the Extended Battery Life power mode copy inside the Power & Battery panel in Cosmic Settings (only for systems with dGPU): Reduced power usage and performance. Disables discrete graphics and might disable external displays.

Power Settings

  • An option to enable/disable notifications when the discrete gpu becomes active

Launcher/App Library/Dock

  • Right-click context menu with options to start the app on a chosen GPU with "default" shown next to the apps default (.desktop) GPU choice. Hybrid systems only.

WatchMkr avatar Jan 02 '24 18:01 WatchMkr

How will this work if the iGPU literally doesn't have the hardware to output a desired display resolution? Personally, I'm using an 8K display that can only be rendered at 60fps with the dGPU, but I can imagine very similar scenarios for other people with say 4K@240Hz or something crazy that the iGPU can't output.

SUPERCILEX avatar Jan 11 '24 23:01 SUPERCILEX

Hybrid graphics means both integrated and dedicated graphics are available to use. You can try out the hybrid graphics mode option in Pop!_OS today to compare. The dGPU would be used for the display connected to the dGPU.

mmstick avatar Jan 11 '24 23:01 mmstick

The dGPU would be used for the display connected to the dGPU.

How can I set this up? By default, hybrid just straight up doesn't work because it tries to use the iGPU. In general, I'm pretty sure the idea of going to hybrid only graphics will only work if there's some way to default the DE to the dGPU.

SUPERCILEX avatar Jan 12 '24 00:01 SUPERCILEX

This will be a feature of COSMIC's compositor to handle GPU allocation better between displays.

mmstick avatar Jan 12 '24 00:01 mmstick

Sweet, as long as it isn't broken. :)

SUPERCILEX avatar Jan 12 '24 00:01 SUPERCILEX

I'm curious about this change too. Having similar concerns as @SUPERCILEX that today's hybrid mode doesn't work on my setup. I have my laptop (thinkpad x1) connected to a usb-c dock which connects to 2 high resolution monitors. This is only working when I enable the dGPU. I really hope this doesn't break with these new changes :)

oyvindaakre avatar Jan 12 '24 13:01 oyvindaakre

Maybe you should make every option toggable when on Battery Saving mode, like:

  • Reduce display brightness, on or off (personally I hate the automatic reduction of brightness)
  • Disable connectivity (if nothing's connected, otherwise don't), on or off
  • Exceptions for dGPU (turn it off except when an application, or a group of applications, will run, for example)

Also, given possible issues with Nvidia, as always, the possibility to always use iGPU even when outside the Battery Saving mode is a blessing.

gabriele2000 avatar Jan 13 '24 18:01 gabriele2000

a list in the battery applet that shows dGPU apps

What is the plan to query the list of apps running on an Nvidia dGPU? nvidia-smi is the standard way usually used by system monitoring utilities, but nvidia-smi itself will keep the dGPU awake. Something like fuser -v /dev/dri/render* shouldn't wake the dGPU, but the downside is that it won't show processes from other users/processes running as root, unless the command itself is ran as root.

See also: https://gitlab.com/mission-center-devs/mission-center/-/issues/30 https://github.com/Syllo/nvtop/issues/230 https://github.com/NVIDIA/open-gpu-kernel-modules/discussions/582

qwertychouskie avatar Jan 17 '24 02:01 qwertychouskie

a list in the battery applet that shows dGPU apps

What is the plan to query the list of apps running on an Nvidia dGPU? nvidia-smi is the standard way usually used by system monitoring utilities, but nvidia-smi itself will keep the dGPU awake. Something like fuser -v /dev/dri/render* shouldn't wake the dGPU, but the downside is that it won't show processes from other users/processes running as root, unless the command itself is ran as root.

See also: https://gitlab.com/mission-center-devs/mission-center/-/issues/30 Syllo/nvtop#230 NVIDIA/open-gpu-kernel-modules#582

First off all, we are only concerned about laptops here, so this applet won't be visible on MultiGPU desktop setups.

Secondly we will likely deploy different strategies across different vendors.

nVidia

  • Use /proc/driver/nvidia/[...]/power to poll get the general power state for the notification dot. This does not wake the GPU fortunately.
  • If the applet is expanded and the gpu is in an active state anyway, parse nvidia-smi output. Additionally don't use the built-in monitoring function, but rather call the utility separately with a large enough polling interval to allow the gpu to enter R3 sleep.

AMD

I haven't looked too much into the APIs utilities like radeontop are using, but we are mostly concerned with desktop applications here. So if nothing else works, fuser/lsof/etc is probably the best we can achieve here.

I am not aware of any other vendors shipping hybrid configurations, but the fuser/lsof solution could be used as a fallback in those cases as well.

This is as far as the technical side of this is planned out for now, but we are very aware, that keeping the GPU active is a non-goal and that the monitoring will thus have to be a best-effort solution. The goal is not to present accurate stats, but to give users a hint at why their laptop is drawing more power.

Drakulix avatar Jan 17 '24 10:01 Drakulix

Use /proc/driver/nvidia/[...]/power to poll get the general power state for the notification dot. This does not wake the GPU fortunately.

You can probably use cat /sys/class/drm/card*/device/power_state or cat /sys/class/drm/card*/device/power/runtime_status instead, as this should work with all GPU vendors AFAIK.

If the applet is expanded and the gpu is in an active state anyway, parse nvidia-smi output. Additionally don't use the built-in monitoring function, but rather call the utility separately with a large enough polling interval to allow the gpu to enter R3 sleep.

Just note that it can take 20+ seconds for an nVidia dGPU to enter the sleep state, so either you should set a very, very long interval to be safe (e.g. 30+ seconds) and hope that you don't encounter hardware where this timeout is even longer, or use alternative methods (lsof/fuser/etc). Perhaps it would be useful to create a small daemon that runs as root, and just returns a true/false value for each GPU in the system for whether it is in use or not.

qwertychouskie avatar Jan 17 '24 21:01 qwertychouskie

Perhaps it would be useful to create a small daemon that runs as root, and just returns a true/false value for each GPU in the system for whether it is in use or not.

We already have system76-power running on our machines anyway, that could be used for that. However if a more generic solution emerges, I am happy to support that as well, and either way I am going to make sure this has fallbacks should no daemons be available.

Drakulix avatar Jan 18 '24 11:01 Drakulix

I realize this might be very complicated and out of scope for the time being but it would be really nice if there is way to dynamically switch apps between discrete and iGPU. Imagine running an app on the dGPU on a 4k external display and then you unplug and want to continue on battery. As far as I understood apps on Linux generally don't support restart of the OpenGL context gracefully but one way to do it maybe would be to have some sort of abstraction in the compositor that is being exposed to the app and then the dynamic switching is managed underneath by the compositor

efouladi avatar Jan 25 '24 18:01 efouladi

As far as I understood apps on Linux generally don't support restart of the OpenGL context gracefully but one way to do it maybe would be to have some sort of abstraction in the compositor that is being exposed to the app and then the dynamic switching is managed underneath by the compositor

There is nothing the compositor can really do here. In fact cosmic-comp already tells applications that the main device has changed. What's required to make this work is the application sending new buffers to the compositor residing on the new GPU, which requires re-creating the OpenGL or Vulkan contexts.

The abstraction you are talking about would not work, as we essentially would have to re-implement the complete OpenGL/Vulkan api surface. Even if we could manage to do that and perfectly re-create the OpenGL/Vulkan state on a different GPU, then this is still impossible to do transparently, because different gpus may support different OpenGL/Vulkan extensions.

It is far more realistic to have applications re-create their contexts. E.g. QT already can handle GPU resets and with patches even a compositor restart: https://blog.davidedmundson.co.uk/blog/qt6_wayland_robustness/ Once that is well supported in most toolkits, simply forcing the app to re-create it's state becomes an option to facilitate this.

Drakulix avatar Jan 25 '24 19:01 Drakulix

As far as I understood apps on Linux generally don't support restart of the OpenGL context gracefully but one way to do it maybe would be to have some sort of abstraction in the compositor that is being exposed to the app and then the dynamic switching is managed underneath by the compositor

There is nothing the compositor can really do here. In fact cosmic-comp already tells applications that the main device has changed. What's required to make this work is the application sending new buffers to the compositor residing on the new GPU, which requires re-creating the OpenGL or Vulkan contexts.

The abstraction you are talking about would not work, as we essentially would have to re-implement the complete OpenGL/Vulkan api surface. Even if we could manage to do that and perfectly re-create the OpenGL/Vulkan state on a different GPU, then this is still impossible to do transparently, because different gpus may support different OpenGL/Vulkan extensions.

It is far more realistic to have applications re-create their contexts. E.g. QT already can handle GPU resets and with patches even a compositor restart: https://blog.davidedmundson.co.uk/blog/qt6_wayland_robustness/ Once that is well supported in most toolkits, simply forcing the app to re-create it's state becomes an option to facilitate this.

Thanks a lot for the thorough explanation. Do you know if there is any such effort on the gtk side?

efouladi avatar Jan 25 '24 20:01 efouladi

Thanks a lot for the thorough explanation. Do you know if there is any such effort on the gtk side?

There do exist patches for wayland-robustness (prototyped by Qt devs spear-heading this effort if I remember correctly), but GTK devs have voiced concern over them for possibly causing a bunch of inconsistent state within the framework.

I don't know what the current state of affairs is nor do I remember fully, if they were sympathetic to the proposal in principle. All I am saying is, that it might take a bunch more time for other frameworks to adopt this, but this is far from being unrealistic.

And I don't think this is necessarily a bad thing. Legacy apps, that don't support this (or in case of e.g. Wine simply cannot ever support this) will always exist. Once we have any toolkit supporting this, we could design a protocol to allow the compositor to request a reset or even have a heuristic in the framework to do this itself, which would then allow other toolkits to support this approach and COSMIC to implement what is necessary to support this procedure. It will always be a best-effort kinda situation anyway.

Drakulix avatar Jan 25 '24 20:01 Drakulix

I get a lot of kernel panics when I use my laptops integrated AMD card instead of the nvidia dgpu, so please consider adding/restoring NVIDIA graphics only mode. Aside from a few screenshots I've been able to catch when it happens at boot instead of the DE popos is basically the only linux OS I can use on my laptop.

superusercode avatar Jan 26 '24 01:01 superusercode

I get a lot of kernel panics when I use my laptops integrated AMD card instead of the nvidia dgpu, so please consider adding/restoring NVIDIA graphics only mode. Aside from a few screenshots I've been able to catch when it happens at boot instead of the DE popos is basically the only linux OS I can use on my laptop.

What laptop, CPU and GPU do you have? That sounds like a serious bug that needs to be fixed, not hacked around.

qwertychouskie avatar Jan 26 '24 01:01 qwertychouskie

a lenovo ideapad 5 pro 16ARH7, configured with a ryzen 5 6600hs (has a integrated radeon 660m) nvidia 3050 laptop gpu and 16gb of ram. attaching one of my most recent kernel panics here: Photo-2

superusercode avatar Jan 26 '24 07:01 superusercode

attaching one of my most recent kernel panics here

I think you could just do sudo dmesg after restarting or something like that, I'll verify further after dinner. Plus, go to settings (search it in the app-launcher grid thing, it's available even if you're on COSMIC DE, like I am), go to support and generate the file. Compress it to .zip (because github doesn't support .tar.gz) and send it here.

gabriele2000 avatar Jan 26 '24 20:01 gabriele2000

This isn't really the appropriate place for hardware support. You can report hardware issues on the Linux kernel bugzilla

mmstick avatar Jan 26 '24 20:01 mmstick

I love @WatchMkr 's proposal to just run hybrid always. I'm running the latest COSMIC and it works well enough, but even in battery saving mode my dGPU is active. My galp5 has a tiny battery, so my most common concern is longest battery runtime without crippling my CPU performance (as tends to happen in battery saving mode, in part due to my contribution of a thermal target in battery saving mode).

+-----------------------------------------------------------------------------------------+
Sat May 11 14:02:07 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1650 Ti     Off |   00000000:23:00.0 Off |                  N/A |
| N/A   50C    P8              4W /   30W |      49MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2765      G   cosmic-comp                                     1MiB |
|    0   N/A  N/A      2797    C+G   cosmic-workspaces                              43MiB |
+-----------------------------------------------------------------------------------------+

curiousercreative avatar May 11 '24 18:05 curiousercreative

@curiousercreative That's because cosmic-workspaces currently defaults to the dGPU because that's what wgpu prefers. The applet needs to be spawned with either WGPU_ADAPTER_NAME={{NAME_OF_ADAPTER}}, or WGPU_POWER_PREF=low if on a hybrid graphics system.

mmstick avatar May 11 '24 19:05 mmstick