wlroots icon indicating copy to clipboard operation
wlroots copied to clipboard

Support GPU hotplug

Open emersion opened this issue 5 years ago • 20 comments

This means keep scanning for GPU nodes with udev and create/destroy subbackends when they show up or go away.

We probably want to teardown everything if the main GPU goes away.


wlroots has migrated to gitlab.freedesktop.org. This issue has been moved to:

https://gitlab.freedesktop.org/wlroots/wlroots/-/issues/1278

emersion avatar Oct 03 '18 17:10 emersion

I would love to see egpu hotplug support. I will keep sending donation for that in the following months. (sorry if it is not the place for such comment)

Oliph avatar Apr 04 '19 09:04 Oliph

I acquired a USB displaylink connector not too long ago, which for all intents and purposes acts as a separate GPU, just lacking its own rendering capabilities. Once I get around to getting that to work, it would solve this, but it's not something I'm actively working on right now.

ascent12 avatar Apr 04 '19 09:04 ascent12

We did chat wit someone at XDC who was working on getting it into gnome (collabora employee iirc), we can probably use some of that work.

Iirc a lot of the weirdness is about picking the "best" main gpu

Also something I saw at work: some of the Wacom tablets ( the one I saw was a businessy signing pad with display ) use display link So this has more of a usecase than immediately obvious

OTOH, the kernel side of displaylink isn't exactly the nicest IMO.

Ongy avatar Apr 04 '19 16:04 Ongy

I have seen that a couple of monts ago on Phoronix where they talk about that work in mutter https://www.phoronix.com/scan.php?page=news_item&px=GNOME-Mutter-GPU-Hotplug.

They also link to the merge: https://gitlab.gnome.org/GNOME/mutter/commit/ad7d6e4a37a6258a5de876a85859f7f57415dffa

Oliph avatar Apr 04 '19 18:04 Oliph

@emersion when I hotplug my MST monitors, they show up with different names and so are just empty screens that my configuration does not work for (though they do work and sway uses them). I was gonna file an issue, though I'm unsure whether it belongs to this issue. I have the same issue with i3 so it might be an issue above wlroots and i3, no? Happy to provide logs or other information.

AndreasBackx avatar Apr 15 '19 19:04 AndreasBackx

MST is unrelated to GPU hotplug, so this would be a different issue. If it also happens on i3, it's probably a graphics driver issue.

emersion avatar Apr 15 '19 19:04 emersion

@emersion thank you, I'll file an issue with for the amdgpu stuff.

AndreasBackx avatar Apr 15 '19 20:04 AndreasBackx

Recently #1696 was merged as a workaround, but we eventually want to add proper support to hotplugging DRM devices. It's particularly interesting for things like docking stations or other things which are putting monitors over USB 3 or thunderbolt, but also applies to external GPUs.

Just as a little bit of background

There are really 2 types of DRM devices: display controllers and rendering devices.

  • A display controller is something that can have computer monitors attached, and would go through the DRM KMS API.
  • A rendering device is obviously something we can do rendering with, e.g. via GBM + GLES/Vulkan.

When you're dealing with most desktop hardware, you're dealing with a device that does both, so naively, wlroots was designed treating the two types of devices as the same thing. On some ARM hardware, you'll find you have two different pieces of hardware doing these separately, and docking stations etc. are usually just a display controller without rendering capabilities.

The point

Hotplugging display controllers is easy, hotplugging rendering devices (that we're using) is hard. A display controller going away is basically the same situation as a few outputs being disconnected, but a rendering device going away means that we have to tear down all of our rendering state and try and bring it up elsewhere, which is not something I want to try.

Renderer v6 actually helps a lot towards this, as it moves a lot of the rendering code outside of the backend, but there would still be a bit of extra work to get this working properly.

  • 1 DRM backend per display controller (possibly zero)
  • Make sure there is absolutely no rendering code inside of the DRM backend
  • Make wlr_session aware of the difference between display controllers and rendering devices
  • Make wlr_session listen for hotplug events and bring up and tear down DRM backends as necessary
  • Kill the compositor if our chosen rendering device goes away (unless someone is masochistic enough to get this working properly)

ascent12 avatar May 15 '19 05:05 ascent12

This overall LGTM.

1 DRM backend per display controller (possibly zero)

I wonder if we should really do this. Maybe having a list of DRM nodes in the DRM backend would be better.

emersion avatar May 15 '19 05:05 emersion

I wonder if we should really do this. Maybe having a list of DRM nodes in the DRM backend would be better.

I'd be fine with either way, personally.

ascent12 avatar May 15 '19 05:05 ascent12

Another usecase for this is hybrid graphics laptops.

For example, I have a Thinkpad X1 Extreme with intel+nvidia graphics. The HDMI port is wired to the nvidia. Switching it on is required to use external monitors, and switching it off is required for good battery life on the go. It works decently with bbswitch+nouveau, but it would be awesome if you could switch it without having to restart the session.

Dirbaio avatar Aug 26 '19 20:08 Dirbaio

@emersion I wanted to get back to this. The last few months I have been running into the same issue without MST. So it's likely not an MST problem. This is on sway 1.2, which I assume would have this patch that was merged in May as sway 1.2 was released in August?

AndreasBackx avatar Dec 21 '19 17:12 AndreasBackx

This issue is unrelated to your problem. Please open a new one with all the necessary information.

emersion avatar Dec 22 '19 10:12 emersion

I am super keen to get GPU hotplugging working for display controllers (specifically, I'm motivated by the hope of completely powering off my dGPU with bbswitch when no external monitor is connected, all whilst keeping sway running - https://www.reddit.com/r/swaywm/comments/ikaxem/feature_idea_dynamically_unloading_unused_drm/ )

Is this blocked by the work on renderer v6? If so, could I potentially start experimenting with hotplugging by forking emersion's swapchain branch. My experience in this area is limited, but with some guidance I'd love to be able to contribute a MVP patch

neon64 avatar Sep 06 '20 03:09 neon64

Is this blocked by the work on renderer v6?

No. Just need to listen to udev signals and setup a new DRM child backend on hotplug.

emersion avatar Sep 07 '20 08:09 emersion

I'm excited to be able to report that I've got some form of hotplugging working. It's for "display controllers" only, and only tested so far on my laptop with intel+nouveau dual-GPU. I don't think my implementation is at all merge-able yet, this is my first time delving into the wlroots codebase so I don't understand how all the pieces fit together, and so I've done many dodgy things in the implementation. But in case anyone is looking for an interim solution, particularly on laptop w hybrid graphics, feel free to try this out and let me know what issues you encounter.

https://github.com/neon64/wlroots/tree/feature/unload_drm and please see usage instructions in the commit message of https://github.com/neon64/wlroots/commit/76268d4db8fb6036634e51ffedeb793cdcf087dd

neon64 avatar Oct 07 '20 14:10 neon64

I'm excited to be able to report that I've got some form of hotplugging working. It's for "display controllers" only, and only tested so far on my laptop with intel+nouveau dual-GPU. I don't think my implementation is at all merge-able yet, this is my first time delving into the wlroots codebase so I don't understand how all the pieces fit together, and so I've done many dodgy things in the implementation. But in case anyone is looking for an interim solution, particularly on laptop w hybrid graphics, feel free to try this out and let me know what issues you encounter.

https://github.com/neon64/wlroots/tree/feature/unload_drm and please see usage instructions in the commit message of neon64@76268d4

I'd be glad to give it a spin. I have a Intel+Nouveau laptop and as far as I can tell hotplugging external monitors has been working for a while now. Reading @ascent12 's description, I think I have an idea of the sorts of things to look for, but if you can spell out the changes I should be expecting, I'd be happy to try them.

J0nnyMak0 avatar Oct 07 '20 22:10 J0nnyMak0

So the specific workflow is described in this commit message https://github.com/neon64/wlroots/commit/76268d4db8fb6036634e51ffedeb793cdcf087dd

The general idea is, once you disconnect all external monitors, wlroots will stop using the nouveau driver, so you can power off the dGPU completely while keeping the compositor running (previously you had to close all windows and restart) Similarly, if you started sway with only the Intel driver loaded, run modprobe nouveau and wlroots will start detecting external monitors etc... (previously you had to restart to redetect).

Why not keep nouveau loaded all the time? As far as I'm aware, there is no way to prevent the nouveau driver from sucking up an extra ~15W even when idle - completely switching off the card is the most foolproof way. Before this patch, I'd have to restart sway to go from 16-20W to 3-5W idle. After this patch, I can vastly improve battery life without restarting my wm all the time. I guess this is more or less just a workaround for nouveau not doing proper power management, but I wouldn't even know where to start with hacking on nouveau so have come here instead.

This may be also useful for an eGPU setup, but I don't have one to test with unfortunately.

neon64 avatar Oct 08 '20 01:10 neon64

ok, thanks. I gave it a quick try. Here is an issue I found. (I'm booting with nouveau blacklisted.)

First off, if I first plugin external monitors and then sudo modprobe nouveau, then it works great. The scan is completed and my external monitors come to life. Now I can unplug the monitors and sudo rmmod nouveau and everything works as expected. I then proceed to switch off the card with bbswitch. All good.

However, if I load nouveau module without external monitors connected, then there is a problem. Nouveau is loaded as expected, but then when I go to rmmod nouveau, it fails with: rmmod: ERROR: Module nouveau is in use

lsof tells me:

$ lsof /dev/dri/card*                                                                                                                                                                                                            
COMMAND PID   USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
sway    605 jonm  mem    CHR  226,1          28837 /dev/dri/card1
sway    605 jonm    8u   CHR  226,0      0t0  2217 /dev/dri/card0
sway    605 jonm    9u   CHR  226,0      0t0  2217 /dev/dri/card0
sway    605 jonm   10u   CHR  226,0      0t0  2217 /dev/dri/card0
sway    605 jonm   11u   CHR  226,0      0t0  2217 /dev/dri/card0
sway    605 jonm   50u   CHR  226,1      0t0 28837 /dev/dri/card1
sway    605 jonm   52u   CHR  226,1      0t0 28837 /dev/dri/card1
sway    605 jonm   53u   CHR  226,1      0t0 28837 /dev/dri/card1
sway    605 jonm   54u   CHR  226,1      0t0 28837 /dev/dri/card1
sway    605 jonm   55u   CHR  226,1      0t0 28837 /dev/dri/card1

sway.log

Edit: The problem goes away if I turn off the dGPU before loading nouveau module.

J0nnyMak0 avatar Oct 08 '20 23:10 J0nnyMak0

@J0nnyMak0 thanks so much for trying this out. I've gone ahead and created a draft PR to discuss this specific implementation https://github.com/swaywm/wlroots/pull/2423 - so as to not further pollute this issue thread.

neon64 avatar Oct 10 '20 05:10 neon64