MobilePassThrough icon indicating copy to clipboard operation
MobilePassThrough copied to clipboard

parameter 'x-pci-stub-device-id' expects an int64 value or range

Open SamKG opened this issue 5 years ago • 13 comments

After running sudo ./start-vm.sh and exiting once, trying to start the vm again using sudo ./start-vm.sh results in the following error:

qemu-system-x86_64: -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,x-pci-sub-device-id=0x,x-pci-sub-vendor-id=0x,multifunction=on,romfile=/home/samkg/Documents/MobilePassThrough/vm-files/vbios-roms/vbios.rom: Parameter 'x-pci-sub-device-id' expects an int64 value or range

Doing a sudo reboot seems to fix it, but it is annoying to not be able to start up the vm multiple times in succession.

Is there any known fix for this?

SamKG avatar Mar 12 '19 22:03 SamKG

What is your output of sudo lspci -vvv after you get the error? Do you have Bumblebee installed and if so, what is the output of sudo optirun echo "Hello"?

T-vK avatar Mar 13 '19 00:03 T-vK

Output before error: lspci_1.txt

After error: lspci_2.txt

Weird thing about it - the LnkSta Speed is downgraded from 8GT/s to 2.5 GT/s . is this normal?

sudo optirun echo "Hello" prints out Hello as expected

SamKG avatar Mar 13 '19 04:03 SamKG

It says !!! Unknown header type 7f for your Nvidia GPU in both files. Something is wrong there. I'm not sure if the LnkSta is a problem or if it's normal. Maybe I can check on my device when I have some more time.

Can you show me the output of the following:

GPU_PCI_ADDRESS=01:00.0
GPU_IDS=$(optirun lspci -n -s "${GPU_PCI_ADDRESS}" | grep -oP "\w+:\w+" | tail -1)
GPU_VENDOR_ID=$(echo "${GPU_IDS}" | cut -d ":" -f1)
GPU_DEVICE_ID=$(echo "${GPU_IDS}" | cut -d ":" -f2)
GPU_SS_IDS=$(optirun lspci -vnn -d "${GPU_IDS}" | grep "Subsystem:" | grep -oP "\w+:\w+")
GPU_SS_VENDOR_ID=$(echo "${GPU_SS_IDS}" | cut -d ":" -f1)
GPU_SS_DEVICE_ID=$(echo "${GPU_SS_IDS}" | cut -d ":" -f2)

echo "GPU_PCI_ADDRESS: ${GPU_PCI_ADDRESS}"
echo "GPU_IDS: $GPU_IDS"
echo "GPU_VENDOR_ID: $GPU_VENDOR_ID"
echo "GPU_DEVICE_ID: $GPU_DEVICE_ID"
echo "GPU_SS_IDS: $GPU_SS_IDS"
echo "GPU_SS_VENDOR_ID: $GPU_SS_VENDOR_ID"
echo "GPU_SS_DEVICE_ID: $GPU_SS_DEVICE_ID"

and also of:

GPU_PCI_ADDRESS=01:00.0

if sudo which optirun &> /dev/null && sudo optirun echo>/dev/null ; then
    USE_BUMBLEBEE=true
    OPTIRUN_PREFIX="optirun "
else
    USE_BUMBLEBEE=false
    OPTIRUN_PREFIX=""
fi

GPU_IDS=$(sudo ${OPTIRUN_PREFIX}lspci -n -s "${GPU_PCI_ADDRESS}" | grep -oP "\w+:\w+" | tail -1)
GPU_VENDOR_ID=$(echo "${GPU_IDS}" | cut -d ":" -f1)
GPU_DEVICE_ID=$(echo "${GPU_IDS}" | cut -d ":" -f2)
GPU_SS_IDS=$(sudo ${OPTIRUN_PREFIX}lspci -vnn -d "${GPU_IDS}" | grep "Subsystem:" | grep -oP "\w+:\w+")
GPU_SS_VENDOR_ID=$(echo "${GPU_SS_IDS}" | cut -d ":" -f1)
GPU_SS_DEVICE_ID=$(echo "${GPU_SS_IDS}" | cut -d ":" -f2)

echo "GPU_PCI_ADDRESS: ${GPU_PCI_ADDRESS}"
echo "GPU_IDS: $GPU_IDS"
echo "GPU_VENDOR_ID: $GPU_VENDOR_ID"
echo "GPU_DEVICE_ID: $GPU_DEVICE_ID"
echo "GPU_SS_IDS: $GPU_SS_IDS"
echo "GPU_SS_VENDOR_ID: $GPU_SS_VENDOR_ID"
echo "GPU_SS_DEVICE_ID: $GPU_SS_DEVICE_ID"
echo "OPTIRUN_PREFIX: $OPTIRUN_PREFIX"
echo "LSPCI_OUTPUT: $(sudo ${OPTIRUN_PREFIX}lspci -vnn -d ${GPU_IDS})"

T-vK avatar Mar 13 '19 08:03 T-vK

out1.txt out2.txt

SamKG avatar Mar 14 '19 03:03 SamKG

Did you run these before getting the error? Because the output looks perfectly fine. Can you run these after getting the error?

T-vK avatar Mar 14 '19 07:03 T-vK

For first script: out3.txt

For second script: out4.txt

SamKG avatar Mar 18 '19 16:03 SamKG

The problem is that this line is missing:

Subsystem: Lenovo Device [17aa:39f5]

or at least that is the symptom...

Because of that, the script can't extract the subsystem vendor id and the subsystem device id which are both required in this line.

I am not sure why the Subsystem line is missing. Maybe there are deeper issues with your system? Have you checked dmesg for GPU related errors?

I have only tested the script on a fresh installation of Fedora 29 btw. Maybe you made some changes to the system that my scripts can't compensate for yet.

Edit:

As a dirty workaround you could try to set the subsystem IDs manually by replacing

GPU_SS_IDS=$(optirun lspci -vnn -d "${GPU_IDS}" | grep "Subsystem:" | grep -oP "\w+:\w+")

with

GPU_SS_IDS="17aa:39f5"

in this line.

T-vK avatar Mar 19 '19 10:03 T-vK

I have just pushed a major update, adding support for Fedora 30 and some other changes. Maybe you can give it another shot now.

T-vK avatar Jul 08 '19 23:07 T-vK

Thanks! Unfortunately, I no longer have this laptop (ran into some issues), and instead have another one without an iGPU.

I don't think it would be possible for me to test

SamKG avatar Jul 08 '19 23:07 SamKG

Hello,

So I ran into this same issue, this happened also on my Lenovo device (P50), what I saw is that if the card is reset or turned on/off or passed through then released, the subsystem line disappears until you reboot the device. One of the solutions is to check if the subsystem even exist, when running lspci, and then saving those values for that computer in a file (you can try matching those ids with the uuid of the device in case there are some hardware changes in the future).

Edit: turning off/on the card with bbswitch might bring back that value, but not always.

midi1996 avatar Oct 07 '21 14:10 midi1996

What GPU does your laptop have?

T-vK avatar Oct 07 '21 20:10 T-vK

A Quadro M2000M. I found that when passing the dGPU and then releasing it makes the subsystem disappear.

GPU_PCI_ADDRESS: 01:00.0
GPU_IDS: 10de:13b0
GPU_VENDOR_ID: 10de
GPU_DEVICE_ID: 13b0
GPU_SS_IDS:
GPU_SS_VENDOR_ID:
GPU_SS_DEVICE_ID:
OPTIRUN_PREFIX:
LSPCI_OUTPUT: 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GLM [Quadro M2000M] [10de:13b0] (rev a2) (prog-if 00 [VGA controller])
Flags: fast devsel, IRQ 16
Memory at d3000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
I/O ports at 4000 [size=128]
Expansion ROM at d4080000 [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Kernel modules: nvidiafb, nouveau

Here is how it looks like post-passing it and releasing it. (Ubuntu here)

However with my tests with bbswitch, if I turn off and then on the card with bbswitch, I do get the subsystemids back, but I cannot pass it to the vm again somehow (this also breaks HDMI Audio device as it will stay disabled until I reboot or reset the pcie device, which will also result in a loss of subsystemid).

So what kills subsystemid from showing: (from my experience)

  • passing the dGPU and releasing it (after vm is off)
  • resetting the pcie device (/sys/bus/pci/{ID}/reset or remove then rescan)

As I only have this laptop, I do not know if any other laptop have this issue, so far in this thread 2 Lenovo laptops show the same symptoms.

midi1996 avatar Oct 08 '21 01:10 midi1996

Okay in that case I'm not sure. If it was an AMD GPU I would have said that it might be the reset bug in which case the vendor-reset project may have helped. But I suppose it doesn't apply to Nvidia GPUs.

This issue is hard for me to debug as I don't have a laptop with a Quadro. But if we can somehow pin-point it further we could jump on the related mailing list and ask the developers themselves.

T-vK avatar Oct 08 '21 08:10 T-vK