Single-GPU-Passthrough
Single-GPU-Passthrough copied to clipboard
VM doesn't start because the nvidia modules don't want to unload
Title says it all. The only thing that virt-manager says is Error starting domain: Cannot recv data: Connection reset by peer
, after a long while of waiting. Executing the script individually gives the following output:
+ echo 0
+ echo 0
+ echo efi-framebuffer.0
+ sleep 3
+ modprobe -r nvidia-drm
modprobe: FATAL: Module nvidia_drm is in use.
+ modprobe -r nvidia-modeset
modprobe: FATAL: Module nvidia_modeset is in use.
+ modprobe -r nvidia
modprobe: FATAL: Module nvidia_drm is in use.
modprobe: FATAL: Error running remove command for nvidia
+ modprobe -r ipmi_devintf
+ modprobe -r ipmi_msghandler
modprobe: FATAL: Module ipmi_msghandler is in use.
+ virsh nodedev-detach pci_0000_08_00_0
Followed by hanging there. I'll provide any other information you need.
I have the same problem with nvidia drivers 455xx and 450xx, but 440.100 and 435xx work.
I don't know if this might work but you could try killing the xserver and everything that might be using the gpu, after that try manually unbinding the gpu for the nvidia drivers and to vfio-pci with:
sudo sh -c "echo 0000:00:03.0 > /sys/bus/pci/drivers/nvidia/unbind" sudo sh -c "echo 0000:00:03.0 > /sys/bus/pci/drivers/vfio-pci/bind"
(you need to use your pci device id on the command of course)
I also created a file named: "/etc/modprobe.d/vfio.conf" with inside: "softdep nouveau pre: vfio-pci" to make sure that vfio is loaded before the nouveau module (open source nvidia drivers), you cound try doing the same thing for the nvidia proprietary drivers by using "nvidia" instead of "nouveau" and that might help you out, I might be wrong there.
I was eventually able to solve my issues by adding a "systemctl stop nvidia-persistenced" command prior to modprobe -r. It seems the persistenced was stopping the drivers form unloading.