egpu-switcher icon indicating copy to clipboard operation
egpu-switcher copied to clipboard

Add an option to reload `nvidia_drm` kernel module on egpu connection

Open alllexx88 opened this issue 8 months ago • 0 comments

This is not a direct issue of egpu-switcher, but I found a workaround that works for me with the current 0.19.0 version of egpu-switcher, and it would be great if it could be incorporated as a feature, or at least noted somewhere for someone like me.

I'll start with the symptoms. egpu-switcher, installed and configured, with detection retries set to a big enough value (I use 20), was detecting my eGPU on boot fine, however, only external monitor, connected to the eGPU, worked: internal laptop one wasn't being detectable/usable. I was originally configuring my new eGPU on Kubuntu 23.10 (for more background, see my thread on egpu.io forums).

Next the probable reason. After moving to Arch Linux, having the same issues as on Kubuntu, and experimenting with Wayland and all-ways-egpu, I discovered there's no /dev/dri/card* created for my eGPU -- that's why all-ways-egpu (methods 2 and 3) reported no device found, it was scanning /dev/dri/card* devices and looking for predefined pcie path(s) in them (which I configured to match my eGPU), and was finding none. The problem in my case is that my Intel iGPU is disabled (I can't use G-Sync in Windows otherwise, in Optimus mode, even with the eGPU), and laptop's dGPU (also Nvidia) triggers loading nvidia_drm kernel module before eGPU gets detected, and doesn't create a DRM device (/dev/dri/card*) for it. If I disable window manager, login to a tty, manually reload nvidia_drm, the respective /dev/dri/card* device gets created, and booting Plasma X11 session then gets offloading to dGPU working, meaning I can use my internal laptop screen too!

The workaround I use now is creating a following egpu connection hook:

#!/bin/sh

RETRY_INTERVAL=0.1
MAX_RETRIES=30


n=0
modprobe -r nvidia_drm && modprobe nvidia_drm

while [ "$?" != 0 ] && [ "$n" -lt "${MAX_RETRIES}" ]; do
    sleep ${RETRY_INTERVAL}
    n=$(expr $n + 1)
    modprobe -r nvidia_drm && modprobe nvidia_drm
done

I need the retries, since without them the hook often fails with modprobe: FATAL: Module nvidia_drm is in use. error, seems like even before starting the display manager there's some transient use of nvidia_drm module, but it passes soon enough for the script with retries to succeed.

It would be great to add a configurable option to reload nvidia_drm, I can imagine other people having similar issues with Nvidia dGPU + Nvidia eGPU combinations.

Thank you!

alllexx88 avatar Dec 30 '23 16:12 alllexx88