Single-GPU-Passthrough icon indicating copy to clipboard operation
Single-GPU-Passthrough copied to clipboard

VM doesn't start because the nvidia modules don't want to unload

Open ultra-azu opened this issue 4 years ago • 3 comments

Title says it all. The only thing that virt-manager says is Error starting domain: Cannot recv data: Connection reset by peer, after a long while of waiting. Executing the script individually gives the following output:

+ echo 0 + echo 0 + echo efi-framebuffer.0 + sleep 3 + modprobe -r nvidia-drm modprobe: FATAL: Module nvidia_drm is in use. + modprobe -r nvidia-modeset modprobe: FATAL: Module nvidia_modeset is in use. + modprobe -r nvidia modprobe: FATAL: Module nvidia_drm is in use. modprobe: FATAL: Error running remove command for nvidia + modprobe -r ipmi_devintf + modprobe -r ipmi_msghandler modprobe: FATAL: Module ipmi_msghandler is in use. + virsh nodedev-detach pci_0000_08_00_0

Followed by hanging there. I'll provide any other information you need.

ultra-azu avatar Jun 11 '20 13:06 ultra-azu

I have the same problem with nvidia drivers 455xx and 450xx, but 440.100 and 435xx work.

IsaacVaughn avatar Nov 03 '20 23:11 IsaacVaughn

I don't know if this might work but you could try killing the xserver and everything that might be using the gpu, after that try manually unbinding the gpu for the nvidia drivers and to vfio-pci with: sudo sh -c "echo 0000:00:03.0 > /sys/bus/pci/drivers/nvidia/unbind" sudo sh -c "echo 0000:00:03.0 > /sys/bus/pci/drivers/vfio-pci/bind" (you need to use your pci device id on the command of course) I also created a file named: "/etc/modprobe.d/vfio.conf" with inside: "softdep nouveau pre: vfio-pci" to make sure that vfio is loaded before the nouveau module (open source nvidia drivers), you cound try doing the same thing for the nvidia proprietary drivers by using "nvidia" instead of "nouveau" and that might help you out, I might be wrong there.

creeloper27 avatar Nov 16 '20 23:11 creeloper27

I was eventually able to solve my issues by adding a "systemctl stop nvidia-persistenced" command prior to modprobe -r. It seems the persistenced was stopping the drivers form unloading.

IsaacVaughn avatar Jan 09 '21 18:01 IsaacVaughn