qubes-issues
qubes-issues copied to clipboard
rd.qubes.hide_pci doesn't work anymore after upgrade to Qubes 4.1
Qubes OS release
4.1
Brief summary
Hiding secondary GPU (AMD RX 580) from dom0
via Grub Command Line does not work anymore in Qubes 4.1. It was working on the same system with Qubes 4.0 previously.
Steps to reproduce
Set /etc/default/grub
to hide the AMD Radeon RX 580 VGA and Audio devices from dom0
and regenerate grub.cfg
.
Verify after reboot via cat /proc/cmdline
it's there and has no typos.
$ cat /proc/cmdline
placeholder root=/dev/mapper/qubes_dom0-root ro rd.luks.uuid=... rd.lvm.lv=qubes_dom0/root rd.lvm.lv=qubes_dom0/swap
rd.qubes.hide_pci=01:00.0,01:00.1 xen-pciback.passthrough=1 i915.alpha_support=1 rhgb quiet plymouth.ignore-serial-consoles
Expected behavior
After reboot the following two PCI devices should not be visible to dom0
and lspci
shouldn't enumerate them anymore:
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]
Actual behavior
After reboot, lspci
in dom0
still enumerates the two PCI devices. Also the amdgpu
kernel module is loaded (shown in lsmod
) and bound to the secondary GPU. Although there's no display connected to the secondary GPU and it's "idle" I can hear the fans of the RX 580.
If I then try a GPU passthrough of the RX 580 to a HVM domU
the domU tries to initialize the RX 580, the fans stop spinning and with a delay of about 10 seconds dom0
crashes/freezes because it has an active amdgpu
module that is still bound to the VGA device of the RX 580. AFAIK this is kind of expected that dom0
crashes if you try a PCI passthrough of a device that is still bound to dom0
.
However, if I blacklist the amdgpu
module from dom0 via /etc/modprobe.d/
the passthrough to domU
works, although the RX 580 PCI devices are still visible to dom0
. I thought that maybe amdgpu grabs the VGA device before dracut
runs the 90qubes-pciback/qubes-pciback.sh
script which does the evaluation of the rd.qubes.hide_pci
Grub command line argument. But this doesn't seem to be the root cause why the hiding doesn't work. Anyway, blacklisting amdgpu
fixes the symptom of passthrough not working, but doesn't fix the proper hiding from dom0
.
dmesg -k | grep "01:00.0" -B10 -A5
doesn't show any obvious errors regarding pciback hiding:
...
[ 1.222929] pci 0000:01:00.0: [1002:67df] type 00 class 0x030000
[ 1.222972] pci 0000:01:00.0: reg 0x10: [mem 0xe0000000-0xefffffff 64bit pref]
[ 1.222996] pci 0000:01:00.0: reg 0x18: [mem 0xf0000000-0xf01fffff 64bit pref]
[ 1.223010] pci 0000:01:00.0: reg 0x20: [io 0xe000-0xe0ff]
[ 1.223024] pci 0000:01:00.0: reg 0x24: [mem 0xf7e00000-0xf7e3ffff]
[ 1.223038] pci 0000:01:00.0: reg 0x30: [mem 0xf7e40000-0xf7e5ffff pref]
[ 1.223195] pci 0000:01:00.0: supports D1 D2
[ 1.223196] pci 0000:01:00.0: PME# supported from D1 D2 D3hot D3cold
...
[ 1.234427] pci 0000:01:00.0: vgaarb: bridge control possible
[ 1.234428] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 1.234429] vgaarb: loaded
...
[ 1.248117] xen: registering gsi 16 triggering 0 polarity 1
[ 1.248132] xen: --> pirq=16 -> irq=16 (gsi=16)
[ 1.248320] xen: registering gsi 16 triggering 0 polarity 1
[ 1.248323] Already setup the GSI :16
...
[ 1.282552] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0
[ 1.282567] PCI: CLS 64 bytes, default 64
[ 1.282573] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[ 1.282574] software IO TLB: mapped [mem 0x000000013e600000-0x0000000142600000] (64MB)
[ 1.282622] Trying to unpack rootfs image as initramfs...
...
[ 4.901048] pciback 0000:01:00.0: xen_pciback: seizing device
[ 4.901128] pciback 0000:01:00.0: enabling device (0000 -> 0003)
[ 4.901161] xen: registering gsi 16 triggering 0 polarity 1
[ 4.901167] Already setup the GSI :16
...
I tried with multiple different kernel versions in Qubes 4.1 from the kernel
and kernel-latest
packages and even the old 5.4
leftover from the previous Qubes 4.0 install before the upgrade to 4.1. But this doesn't make a difference, hiding the RX 580 from dom0
doesn't work with any of these kernel version under 4.1, but was working on 4.0.
Seems like I'm not the only one with this issue/bug: https://forum.qubes-os.org/t/gpu-passthrough-again/14019
Well, apparently it is loaded too late and there is some race condition with the kernel module loading the GPU driver.
If one checks /usr/lib/dracut/modules.d/
, one will see that both 90kernel-modules
and 90qubes-pciback
exist. If the numbering has any relevance, the race condition is no surprise.
Anyway this is pretty bad indeed wrt security as VM devices shouldn't get access to dom0.
A bit related: #7886
On my side, rd.qubes.hide_pci
work as expected (R4.1 and development tree). Didn't see a difference with R4.0.
@yojoe can you still reproduce this?
this is happening to me now, after repairing my grub from a period where I could not boot
Interesting.
I was able to fix it with modprobe unload nouveau for what it's worth
@OwOday what does lspci
in dom0 show?
VGA compatible controller: NVIDIA Corporation AD102 [Geforce RTX 4090]