MobilePassThrough icon indicating copy to clipboard operation
MobilePassThrough copied to clipboard

Cannot unbind driver without killing the GUI

Open midi1996 opened this issue 4 years ago • 5 comments

This has been my issue for a long time, whenever the script (or manually) tries to unbind the driver, it will not work and hang with 100% CPU usage, and at worse the kernel will panic and the whole system freezes. The only solutions that I found are:

  • Method 1:
    • Kill the gui (display-manger or isolate multi-user) through tty3
    • unbind manually or run the script (using screen to keep it running in the bg)
    • start the gui
    • go back to screen
  • Method 2:
    • just blacklisting the driver

My intent is that I want to be able to run the vm and pass the GPU to it, and then bring it back to me when the vm isn't running, which I think is also one of the features in this script. So far I tried both master and unattended-win-install (which I think is the one being worked on). I do not like the first method as it's a pain to close all apps then re-open them each time the vm starts/shutdown.

Setup:

  • Lenovo Thinkpad P50
  • Intel HD P530 and Nvidia Quadro M2000M
  • OS: Fedora 34 and 35
  • kernel: 5.14.9-300.fc35.x86_64 (F35)
  • nvidia drivers tested: nouveau and the proprietary one.

midi1996 avatar Oct 07 '21 16:10 midi1996

I think this happens to me occasionally as well, although I haven't checked the CPU usage and I have an AMD GPU in that laptop. Unfortunately I have not been able to figure out what is causing this issue yet. It's nice to hear that you found a workaround other than to reboot though. I guess that let's us rule out that the kernel is at fault.

unattended-win-install is indeed what you should be using. I just haven't merged it into the master because Ubuntu is not fully supported in that branch yet.

T-vK avatar Oct 07 '21 20:10 T-vK

Seems on ubuntu you can unload nouveau and nvidia as long as you kill the GUI, however on Fedora I get kp with nouveau when unloading, nvidia unloading works fine. It's still a bummer though to do this (kill GUI then go back to it).

midi1996 avatar Oct 08 '21 01:10 midi1996

It might be worth it to try different kernel versions. Maybe the latest 5.15-rc4 or maybe an older version. I must say, however, that I haven't tested mbpt on Fedora 35 at all yet, even though it should theoretically work.

T-vK avatar Oct 08 '21 08:10 T-vK

I'm experiencing something similar. I can't get past the unbinding nvidia driver

> Using a virtual OS drive...
> Warning: Bumblebee is not available or doesn't work properly. Continuing anyway...
> Retrieving and parsing DGPU IDs...
> Loading vfio-pci kernel module...
> Using Looking Glass...
> Calculating required buffer size for 1920x1080 for Looking Glass...
> Looking Glass buffer size set to: 32M
> Not using DGPU vBIOS override...
> Not using DGPU vBIOS override...
> Not using SMB share...
> Using dGPU passthrough...
> Unbinding dGPU from nvidia driver...

It just hangs there. I've tried killing it there and it leaves this command running which I can't kill sudo bash -c echo '0000:01:00.0' > '/sys/bus/pci/drivers/nvidia/unbind'

I have a dell precision 5760 with an A3000 GPU running fedora 34.

mauza avatar Nov 05 '21 16:11 mauza

I followed this: https://forum.level1techs.com/t/fedora-33-ultimiate-vfio-guide-for-2020-2021-wip/163814. I blacklisted the drivers and set them to only use vfio on boot. I commented out the binding stuff in the start vm script. I had to put in a MAC address I just put in a random one. Then I was able to start the VM. I got past the couldn't unbind stuff. I don't care if the dGPU is never usable on the host. If I need something it for something I can use it on another linux vm. Now I'm stuck on Networking, but I'll try to figure it out and start a new thread if I need something. Thanks so much for creating this, it is great.

mauza avatar Nov 05 '21 19:11 mauza