nvidia-xrun not working at all
I've noticed a few things. One of them is that regardless of what I put in /etc/X11/nvidia-xorg* (got this from the documentation at the arch wiki, so this might not be the place to say it), /etc/default/nvidia-xrun holds the variables that the script run.
Anyway, I've been fighting against the nvidia driver for a while, when I was able to finally blacklist it properly, nvidia-xrun is still failing, but at least it's failing gracefully. One point I find is this line in /etc/default/nvidia-xrun:
# Bus ID of the PCI express controller
CONTROLLER_BUS_ID=0000:00:01.0
that bus id doesn't seem to exist in my box, and I'm not sure what to put. Here's the output of lspci:
➜ lspci
00:00.0 Host bridge: Intel Corporation Broadwell-U Host Bridge -OPI (rev 09)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 5500 (rev 09)
00:03.0 Audio device: Intel Corporation Broadwell-U Audio Controller (rev 09)
00:14.0 USB controller: Intel Corporation Wildcat Point-LP USB xHCI Controller (rev 03)
00:16.0 Communication controller: Intel Corporation Wildcat Point-LP MEI Controller #1 (rev 03)
00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (3) I218-LM (rev 03)
00:1b.0 Audio device: Intel Corporation Wildcat Point-LP High Definition Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #6 (rev e3)
00:1c.1 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #3 (rev e3)
00:1c.4 PCI bridge: Intel Corporation Wildcat Point-LP PCI Express Root Port #5 (rev e3)
00:1d.0 USB controller: Intel Corporation Wildcat Point-LP USB EHCI Controller (rev 03)
00:1f.0 ISA bridge: Intel Corporation Wildcat Point-LP LPC Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation Wildcat Point-LP SATA Controller [AHCI Mode] (rev 03)
00:1f.3 SMBus: Intel Corporation Wildcat Point-LP SMBus Controller (rev 03)
00:1f.6 Signal processing controller: Intel Corporation Wildcat Point-LP Thermal Management Controller (rev 03)
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5227 PCI Express Card Reader (rev 01)
03:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
The nvidia card is not there because I'm using nvidia-xrun-pm.service, but its address is 04:00.0. I changed that in /etc/default/nvidia-xrun, but the controller bus bit is stll raising an error, it complains that /sys/bus/pci/devices/0000:00:01.0/power/control doesn't exist
Below is dmesg after I try to run it. I notice specially the following lines:
[ 251.484143] ACPI Warning: \_SB.PCI0.PEG.VID._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190509/nsarguments-59)
[ 251.791305] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 251.791306] [drm] No driver support for vblank timestamp query.
here's the full output
=? res=success'
[ 236.945689] audit: type=1130 audit(1563638171.923:158): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty3 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 236.984048] audit: type=1131 audit(1563638171.960:159): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 236.986936] audit: type=1130 audit(1563638171.963:160): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 236.989082] audit: type=1131 audit(1563638171.966:161): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 236.991888] audit: type=1130 audit(1563638171.970:162): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 243.092940] audit: type=1006 audit(1563638178.068:163): pid=2239 uid=0 old-auid=4294967295 auid=1000 tty=tty2 old-ses=4294967295 ses=4 res=1
[ 249.919817] pci 0000:04:00.0: [10de:1347] type 00 class 0x030200
[ 249.919862] pci 0000:04:00.0: reg 0x10: [mem 0xf1000000-0xf1ffffff]
[ 249.919886] pci 0000:04:00.0: reg 0x14: [mem 0xc0000000-0xcfffffff 64bit pref]
[ 249.919909] pci 0000:04:00.0: reg 0x1c: [mem 0xd0000000-0xd1ffffff 64bit pref]
[ 249.919926] pci 0000:04:00.0: reg 0x24: [io 0x3000-0x307f]
[ 249.919943] pci 0000:04:00.0: reg 0x30: [mem 0xfff80000-0xffffffff pref]
[ 249.920120] pci 0000:04:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5 GT/s x4 link at 0000:00:1c.4 (capable of 31.504 Gb/s with 8 GT/s x4 link)
[ 249.920966] pci 0000:04:00.0: BAR 1: assigned [mem 0xc0000000-0xcfffffff 64bit pref]
[ 249.920986] pci 0000:04:00.0: BAR 3: assigned [mem 0xd0000000-0xd1ffffff 64bit pref]
[ 249.921002] pci 0000:04:00.0: BAR 0: assigned [mem 0xf1000000-0xf1ffffff]
[ 249.921012] pci 0000:04:00.0: BAR 6: no space for [mem size 0x00080000 pref]
[ 249.921015] pci 0000:04:00.0: BAR 6: failed to assign [mem size 0x00080000 pref]
[ 249.921020] pci 0000:04:00.0: BAR 5: assigned [io 0x3000-0x307f]
[ 250.982821] IPMI message handler: version 39.2
[ 250.998891] ipmi device interface
[ 251.222618] nvidia: loading out-of-tree module taints kernel.
[ 251.222627] nvidia: module license 'NVIDIA' taints kernel.
[ 251.222628] Disabling lock debugging due to kernel taint
[ 251.229309] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 251.238248] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[ 251.339070] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 430.34 Wed Jun 26 12:19:48 CDT 2019
[ 251.388277] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 235
[ 251.429938] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 430.34 Wed Jun 26 12:15:10 CDT 2019
[ 251.455750] [drm] [nvidia-drm] [GPU ID 0x00000400] Loading driver
[ 251.484143] ACPI Warning: \_SB.PCI0.PEG.VID._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190509/nsarguments-59)
[ 251.791305] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 251.791306] [drm] No driver support for vblank timestamp query.
[ 251.791309] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:04:00.0 on minor 1
[ 255.829134] [drm] [nvidia-drm] [GPU ID 0x00000400] Unloading driver
[ 255.857069] nvidia-modeset: Unloading
[ 256.063743] nvidia-uvm: Unloaded the UVM driver in 8 mode
[ 256.081762] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[ 292.513621] audit: type=1131 audit(1563638227.486:164): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 292.524410] audit: type=1130 audit(1563638227.496:165): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 292.524420] audit: type=1131 audit(1563638227.496:166): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 292.525516] audit: type=1130 audit(1563638227.499:167): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
This has been explained different times in response to similar issues reported, so I'm thinking the problem here is lack of documentation (maybe the readme should be more explanatory).
However, you should be able to find the controller bus id in the output of lshw, for example in my case it looks something like this:
*-pci
description: Host bridge
product: 8th Gen Core Processor Host Bridge/DRAM Registers
vendor: Intel Corporation
physical id: 100
bus info: pci@0000:00:00.0
version: 07
width: 32 bits
clock: 33MHz
configuration: driver=skl_uncore
resources: irq:0
*-pci:0
description: PCI bridge
product: Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16)
vendor: Intel Corporation
physical id: 1
bus info: pci@0000:00:01.0
version: 07
width: 32 bits
clock: 33MHz
capabilities: pci pm msi pciexpress normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:122 ioport:3000(size=4096) memory:ec000000-ed0fffff ioport:c0000000(size=301989888)
*-display UNCLAIMED
description: 3D controller
product: GP107M [GeForce GTX 1050 Ti Mobile]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list
configuration: latency=0
resources: memory:ec000000-ecffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:3000(size=128) memory:ed000000-ed07ffff
The id you're loolking for is the one of the PCI bridge that hosts the card, in my case 0000:00:01.0. Hope this can help.
okay, thanks. Mine doesn't say PCIE controller anywhere, but I'll try with this -pci:2 device, looks about right.
Thank you
*-pci
description: Host bridge
product: Broadwell-U Host Bridge -OPI
vendor: Intel Corporation
physical id: 100
bus info: pci@0000:00:00.0
version: 09
width: 32 bits
clock: 33MHz
configuration: driver=bdw_uncore
resources: irq:0
*-display
description: VGA compatible controller
product: HD Graphics 5500
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 09
width: 64 bits
clock: 33MHz
capabilities: vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:59 memory:f0000000-f0ffffff memory:e0000000-efffffff ioport:4000(size=64) memory:c0000-dffff
......
*-pci:2
description: PCI bridge
product: Wildcat Point-LP PCI Express Root Port #5
vendor: Intel Corporation
physical id: 1c.4
bus info: pci@0000:00:1c.4
version: e3
width: 32 bits
clock: 33MHz
capabilities: pci normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:44 ioport:3000(size=4096) memory:f1000000-f1ffffff ioport:c0000000(size=301989888)
*-display UNCLAIMED
description: 3D controller
product: GM108M [GeForce 940M]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:04:00.0
version: a2
width: 64 bits
clock: 33MHz
capabilities: bus_master cap_list
configuration: latency=0
resources: memory:f1000000-f1ffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:3000(size=128)
@michelesr that did help, but I'm still not being able to run, with the same dmesg errors:
[ 120.810746] pci 0000:04:00.0: [10de:1347] type 00 class 0x030200
[ 120.810775] pci 0000:04:00.0: reg 0x10: [mem 0xf1000000-0xf1ffffff]
[ 120.810789] pci 0000:04:00.0: reg 0x14: [mem 0xc0000000-0xcfffffff 64bit pref]
[ 120.810802] pci 0000:04:00.0: reg 0x1c: [mem 0xd0000000-0xd1ffffff 64bit pref]
[ 120.810811] pci 0000:04:00.0: reg 0x24: [io 0x3000-0x307f]
[ 120.810821] pci 0000:04:00.0: reg 0x30: [mem 0xfff80000-0xffffffff pref]
[ 120.810936] pci 0000:04:00.0: 16.000 Gb/s available PCIe bandwidth, limited by 5 GT/s x4 link at 0000:00:1c.4 (capable of 31.504 Gb/s with 8 GT/s x4 link)
[ 120.811376] pci 0000:04:00.0: BAR 1: assigned [mem 0xc0000000-0xcfffffff 64bit pref]
[ 120.811387] pci 0000:04:00.0: BAR 3: assigned [mem 0xd0000000-0xd1ffffff 64bit pref]
[ 120.811396] pci 0000:04:00.0: BAR 0: assigned [mem 0xf1000000-0xf1ffffff]
[ 120.811401] pci 0000:04:00.0: BAR 6: no space for [mem size 0x00080000 pref]
[ 120.811403] pci 0000:04:00.0: BAR 6: failed to assign [mem size 0x00080000 pref]
[ 120.811405] pci 0000:04:00.0: BAR 5: assigned [io 0x3000-0x307f]
[ 121.877981] IPMI message handler: version 39.2
[ 121.885331] ipmi device interface
[ 122.819521] nvidia: loading out-of-tree module taints kernel.
[ 122.819534] nvidia: module license 'NVIDIA' taints kernel.
[ 122.819535] Disabling lock debugging due to kernel taint
[ 122.826941] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 122.836162] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[ 122.937445] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 430.34 Wed Jun 26 12:19:48 CDT 2019
[ 123.081084] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 235
[ 123.160158] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 430.34 Wed Jun 26 12:15:10 CDT 2019
[ 123.207071] [drm] [nvidia-drm] [GPU ID 0x00000400] Loading driver
[ 123.231503] ACPI Warning: \_SB.PCI0.PEG.VID._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20190509/nsarguments-59)
[ 123.540530] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 123.540532] [drm] No driver support for vblank timestamp query.
[ 123.540535] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:04:00.0 on minor 1
[ 128.383771] [drm] [nvidia-drm] [GPU ID 0x00000400] Unloading driver
[ 128.414612] nvidia-modeset: Unloading
[ 128.724437] nvidia-uvm: Unloaded the UVM driver in 8 mode
[ 128.749476] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[ 185.676286] audit: type=1131 audit(1563819344.571:175): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 185.700457] audit: type=1130 audit(1563819344.594:176): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 185.700513] audit: type=1131 audit(1563819344.594:177): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 185.702775] audit: type=1130 audit(1563819344.597:178): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=getty@tty2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 189.057391] audit: type=1006 audit(1563819347.951:179): pid=1897 uid=0 old-auid=4294967295 auid=1000 tty=(none) old-ses=4294967295 ses=4 res=1
[ 199.645017] audit: type=1131 audit(1563819358.541:180): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=user@979 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 199.652062] audit: type=1131 audit(1563819358.547:181): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=user-runtime-dir@979 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 204.266283] audit: type=1130 audit(1563819363.161:182): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rtkit-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 210.884106] audit: type=1130 audit(1563819369.777:183): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=udisks2 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
I'm not very expert with nvidia drivers, but TBH I don't see anything alarming in the kernel log (tainting the kernel is normal as nvidia is not a module from the original kernel codebase).
What's the problem that are you getting exactly? can you post the output of nvidia-xrun execution?