firmware-open icon indicating copy to clipboard operation
firmware-open copied to clipboard

eGPU not initializing

Open jacobgkau opened this issue 5 years ago • 57 comments

Tested on a darp6 (customer experiencing the same issue on a galp4.) When we plug in an NVIDIA GPU in the Akitio Node, we do see it listed in lspci:

system76@pop-os:~$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Device 9b41 (rev 02)
06:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1)

However, nvidia-smi does not see the GPU, and we are not able to load the NVIDIA driver:

system76@pop-os:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
system76@pop-os:~$ sudo modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': No such device

dmesg shows this output continuously repeated while the GPU is connected:

[  251.993332] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[  251.993840] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[  251.993841] NVRM: The system BIOS may have misconfigured your GPU.
[  251.993845] nvidia: probe of 0000:06:00.0 failed with error -1
[  251.993863] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  251.993864] NVRM: None of the NVIDIA devices were initialized.
[  251.993998] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237

jacobgkau avatar Dec 11 '19 17:12 jacobgkau

From new logs from a Razer Core X Chroma with a Sapphire Radeon Nitro+ RX 590 8GB GPU it may be dealing with the same issue:

[ 143.468521] ---[ end trace 997c68591ecbbf46 ]--- [ 143.468937] amdgpu: probe of 0000:06:00.0 failed with error -22 [ 143.468951] pci 0000:06:00.1: D0 power state depends on 0000:06:00.0 [ 143.469000] snd_hda_intel 0000:06:00.1: enabling device (0000 -> 0002) [ 143.469166] snd_hda_intel 0000:06:00.1: Handle vga_switcheroo audio client [ 143.469168] snd_hda_intel 0000:06:00.1: Force to non-snoop mode [ 143.481490] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:1c.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.1/sound/card1/input56 [ 143.481534] input: HDA ATI HDMI HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1c.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.1/sound/card1/input57 [ 143.481570] input: HDA ATI HDMI HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:1c.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.1/sound/card1/input58 [ 143.481600] input: HDA ATI HDMI HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:1c.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.1/sound/card1/input59 [ 143.481630] input: HDA ATI HDMI HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:1c.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.1/sound/card1/input60 [ 143.481681] input: HDA ATI HDMI HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:1c.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.1/sound/card1/input61

ahoneybun avatar Dec 20 '19 22:12 ahoneybun

So I'm the customer with galp4. I was able to get the eGPU working with a Thinkpad and Lubuntu on an USB. Other than hardware change, the BIOS had a configuration part for thunderbolt 3. I disabled it's security and allowed pre OS support for the port. I think coreboot is either blocking the thunderbolt 3 connection due to security or most likely unable to support it in preboot environment since it's pretty new and coreboot BIOS environment only lets me choose boot options

0-alex-0 avatar Dec 21 '19 01:12 0-alex-0

I am the user of the Razer Core X Chroma and AMD RX 590. Like 0-alex-0, I was able to use the same hardware with a Dell XPS 13 running the same Kubuntu version. I had to go into the BIOS settings and change the Thunderbolt security settings before it worked.

sldavidson avatar Dec 21 '19 19:12 sldavidson

I've had no luck on a darp6 with coreboot also. I've tried with PopOS 19.10, Ubuntu 18.04, and Ubuntu 19.10. The eGPU (purchased specifically to use with this laptop) is a Sonnet eGFX Breakaway Box 550, containing a MSI GAMING GeForce RTX 2070 8GB (it's been tested plugged into a Windows laptop, and it worked). When I plug it in to the Darter Pro, it spins up, and I'm asked to authorize the TB3 device. When I do, the fans spin down to normal levels as expected. It shows the name of the eGPU box. But it doesn't work, and I get the following in dmesg:

[ 5331.431094] thunderbolt 0-1: new device found, vendor=0x8 device=0x38
[ 5331.431099] thunderbolt 0-1: Sonnet Technologies, Inc. eGFX Breakaway Box 550
[ 5372.032458] pcieport 0000:02:01.0: pciehp: Slot(1): Card present
[ 5372.032463] pcieport 0000:02:01.0: pciehp: Slot(1): Link Up
[ 5372.373040] pci 0000:04:00.0: [8086:1578] type 01 class 0x060400
[ 5372.373205] pci 0000:04:00.0: enabling Extended Tags
[ 5372.373483] pci 0000:04:00.0: supports D1 D2
[ 5372.373488] pci 0000:04:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 5372.373620] pci 0000:04:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 link at 0000:02:01.0 (capable of 31.504 Gb/s with 8 GT/s x4 link)
[ 5372.373903] pcieport 0000:02:01.0: ASPM: current common clock configuration is broken, reconfiguring
[ 5372.373979] pci 0000:04:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 5372.374247] pci 0000:05:01.0: [8086:1578] type 01 class 0x060400
[ 5372.374413] pci 0000:05:01.0: enabling Extended Tags
[ 5372.374673] pci 0000:05:01.0: supports D1 D2
[ 5372.374678] pci 0000:05:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 5372.375077] pci 0000:04:00.0: PCI bridge to [bus 05-24]
[ 5372.375099] pci 0000:04:00.0:   bridge window [io  0x0000-0x0fff]
[ 5372.375114] pci 0000:04:00.0:   bridge window [mem 0x00000000-0x000fffff]
[ 5372.375134] pci 0000:04:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[ 5372.375143] pci 0000:05:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 5372.375419] pci 0000:06:00.0: [10de:1f07] type 00 class 0x030000
[ 5372.375517] pci 0000:06:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
[ 5372.375560] pci 0000:06:00.0: reg 0x14: [mem 0x00000000-0x0fffffff 64bit pref]
[ 5372.375600] pci 0000:06:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff 64bit pref]
[ 5372.375625] pci 0000:06:00.0: reg 0x24: [io  0x0000-0x007f]
[ 5372.375649] pci 0000:06:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[ 5372.375931] pci 0000:06:00.0: PME# supported from D0 D3hot
[ 5372.376110] pci 0000:06:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 link at 0000:02:01.0 (capable of 126.016 Gb/s with 8 GT/s x16 link)
[ 5372.376304] pci 0000:06:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 5372.376316] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 5372.376441] pci 0000:06:00.1: [10de:10f9] type 00 class 0x040300
[ 5372.376511] pci 0000:06:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[ 5372.377117] pci 0000:06:00.2: [10de:1ada] type 00 class 0x0c0330
[ 5372.377201] pci 0000:06:00.2: reg 0x10: [mem 0x00000000-0x0003ffff 64bit pref]
[ 5372.377262] pci 0000:06:00.2: reg 0x1c: [mem 0x00000000-0x0000ffff 64bit pref]
[ 5372.377515] pci 0000:06:00.2: PME# supported from D0 D3hot
[ 5372.377812] pci 0000:06:00.3: [10de:1adb] type 00 class 0x0c8000
[ 5372.377879] pci 0000:06:00.3: reg 0x10: [mem 0x00000000-0x00000fff]
[ 5372.378217] pci 0000:06:00.3: PME# supported from D0 D3hot
[ 5372.378753] pci 0000:05:01.0: PCI bridge to [bus 06-24]
[ 5372.378768] pci 0000:05:01.0:   bridge window [io  0x0000-0x0fff]
[ 5372.378781] pci 0000:05:01.0:   bridge window [mem 0x00000000-0x000fffff]
[ 5372.378795] pci 0000:05:01.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[ 5372.378800] pci_bus 0000:06: busn_res: [bus 06-24] end is updated to 06
[ 5372.378810] pci_bus 0000:05: busn_res: [bus 05-24] end is updated to 06
[ 5372.378837] pci 0000:04:00.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[ 5372.378840] pci 0000:04:00.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[ 5372.378844] pci 0000:04:00.0: BAR 14: no space for [mem size 0x01800000]
[ 5372.378846] pci 0000:04:00.0: BAR 14: failed to assign [mem size 0x01800000]
[ 5372.378849] pci 0000:04:00.0: BAR 13: assigned [io  0x2000-0x2fff]
[ 5372.378854] pci 0000:05:01.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[ 5372.378856] pci 0000:05:01.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[ 5372.378859] pci 0000:05:01.0: BAR 14: no space for [mem size 0x01800000]
[ 5372.378862] pci 0000:05:01.0: BAR 14: failed to assign [mem size 0x01800000]
[ 5372.378865] pci 0000:05:01.0: BAR 13: assigned [io  0x2000-0x2fff]
[ 5372.378871] pci 0000:06:00.0: BAR 1: no space for [mem size 0x10000000 64bit pref]
[ 5372.378874] pci 0000:06:00.0: BAR 1: failed to assign [mem size 0x10000000 64bit pref]
[ 5372.378877] pci 0000:06:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[ 5372.378880] pci 0000:06:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[ 5372.378882] pci 0000:06:00.0: BAR 0: no space for [mem size 0x01000000]
[ 5372.378885] pci 0000:06:00.0: BAR 0: failed to assign [mem size 0x01000000]
[ 5372.378887] pci 0000:06:00.0: BAR 6: no space for [mem size 0x00080000 pref]
[ 5372.378890] pci 0000:06:00.0: BAR 6: failed to assign [mem size 0x00080000 pref]
[ 5372.378893] pci 0000:06:00.2: BAR 0: no space for [mem size 0x00040000 64bit pref]
[ 5372.378895] pci 0000:06:00.2: BAR 0: failed to assign [mem size 0x00040000 64bit pref]
[ 5372.378899] pci 0000:06:00.2: BAR 3: no space for [mem size 0x00010000 64bit pref]
[ 5372.378901] pci 0000:06:00.2: BAR 3: failed to assign [mem size 0x00010000 64bit pref]
[ 5372.378905] pci 0000:06:00.1: BAR 0: no space for [mem size 0x00004000]
[ 5372.378907] pci 0000:06:00.1: BAR 0: failed to assign [mem size 0x00004000]
[ 5372.378910] pci 0000:06:00.3: BAR 0: no space for [mem size 0x00001000]
[ 5372.378913] pci 0000:06:00.3: BAR 0: failed to assign [mem size 0x00001000]
[ 5372.378916] pci 0000:06:00.0: BAR 5: assigned [io  0x2000-0x207f]
[ 5372.378928] pci 0000:05:01.0: PCI bridge to [bus 06]
[ 5372.378934] pci 0000:05:01.0:   bridge window [io  0x2000-0x2fff]
[ 5372.378963] pci 0000:04:00.0: PCI bridge to [bus 05-06]
[ 5372.378968] pci 0000:04:00.0:   bridge window [io  0x2000-0x2fff]
[ 5372.378997] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[ 5372.379001] pcieport 0000:02:01.0:   bridge window [io  0x2000-0x3fff]
[ 5372.379008] pcieport 0000:02:01.0:   bridge window [mem 0xd1000000-0xd17fffff]
[ 5372.379014] pcieport 0000:02:01.0:   bridge window [mem 0xc1000000-0xd0ffffff 64bit pref]
[ 5372.379021] PCI: No. 2 try to assign unassigned res
[ 5372.379026] pcieport 0000:02:01.0: resource 14 [mem 0xd1000000-0xd17fffff] released
[ 5372.379028] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[ 5372.379037] pcieport 0000:02:01.0: resource 15 [mem 0xc1000000-0xd0ffffff 64bit pref] released
[ 5372.379039] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[ 5372.379064] pcieport 0000:02:01.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[ 5372.379067] pcieport 0000:02:01.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[ 5372.379070] pcieport 0000:02:01.0: BAR 14: no space for [mem size 0x01800000]
[ 5372.379073] pcieport 0000:02:01.0: BAR 14: failed to assign [mem size 0x01800000]
[ 5372.379077] pci 0000:04:00.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[ 5372.379080] pci 0000:04:00.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[ 5372.379083] pci 0000:04:00.0: BAR 14: no space for [mem size 0x01800000]
[ 5372.379085] pci 0000:04:00.0: BAR 14: failed to assign [mem size 0x01800000]
[ 5372.379089] pci 0000:05:01.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[ 5372.379092] pci 0000:05:01.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[ 5372.379094] pci 0000:05:01.0: BAR 14: no space for [mem size 0x01800000]
[ 5372.379096] pci 0000:05:01.0: BAR 14: failed to assign [mem size 0x01800000]
[ 5372.379101] pci 0000:06:00.0: BAR 1: no space for [mem size 0x10000000 64bit pref]
[ 5372.379104] pci 0000:06:00.0: BAR 1: failed to assign [mem size 0x10000000 64bit pref]
[ 5372.379106] pci 0000:06:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[ 5372.379109] pci 0000:06:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[ 5372.379112] pci 0000:06:00.0: BAR 0: no space for [mem size 0x01000000]
[ 5372.379114] pci 0000:06:00.0: BAR 0: failed to assign [mem size 0x01000000]
[ 5372.379120] pci 0000:06:00.2: BAR 0: no space for [mem size 0x00040000 64bit pref]
[ 5372.379122] pci 0000:06:00.2: BAR 0: failed to assign [mem size 0x00040000 64bit pref]
[ 5372.379125] pci 0000:06:00.2: BAR 3: no space for [mem size 0x00010000 64bit pref]
[ 5372.379128] pci 0000:06:00.2: BAR 3: failed to assign [mem size 0x00010000 64bit pref]
[ 5372.379130] pci 0000:06:00.1: BAR 0: no space for [mem size 0x00004000]
[ 5372.379132] pci 0000:06:00.1: BAR 0: failed to assign [mem size 0x00004000]
[ 5372.379135] pci 0000:06:00.3: BAR 0: no space for [mem size 0x00001000]
[ 5372.379137] pci 0000:06:00.3: BAR 0: failed to assign [mem size 0x00001000]
[ 5372.379140] pci 0000:05:01.0: PCI bridge to [bus 06]
[ 5372.379146] pci 0000:05:01.0:   bridge window [io  0x2000-0x2fff]
[ 5372.379175] pci 0000:04:00.0: PCI bridge to [bus 05-06]
[ 5372.379180] pci 0000:04:00.0:   bridge window [io  0x2000-0x2fff]
[ 5372.379208] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[ 5372.379212] pcieport 0000:02:01.0:   bridge window [io  0x2000-0x3fff]
[ 5372.379279] pcieport 0000:04:00.0: enabling device (0000 -> 0001)
[ 5372.380308] pcieport 0000:05:01.0: enabling device (0000 -> 0001)
[ 5372.381376] pci 0000:06:00.1: D0 power state depends on 0000:06:00.0
[ 5372.381929] snd_hda_intel 0000:06:00.1: Disabling MSI
[ 5372.381944] snd_hda_intel 0000:06:00.1: Handle vga_switcheroo audio client
[ 5372.382014] pci 0000:06:00.2: D0 power state depends on 0000:06:00.0
[ 5372.384262] xhci_hcd 0000:06:00.2: init 0000:06:00.2 fail, -16
[ 5372.384381] xhci_hcd: probe of 0000:06:00.2 failed with error -16
[ 5372.384443] pci 0000:06:00.3: D0 power state depends on 0000:06:00.0
[ 5372.384493] snd_hda_intel 0000:06:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
[ 5372.384496] snd_hda_intel 0000:06:00.1: ioremap error
[ 5372.394293] nvidia-gpu 0000:06:00.3: pcim_iomap failed
[ 5372.394813] nvidia-gpu: probe of 0000:06:00.3 failed with error -12
[ 5372.455075] nouveau 0000:06:00.0: enabling device (0000 -> 0001)
[ 5372.455497] ------------[ cut here ]------------
[ 5372.455499] ioremap on RAM at 0x0000000000000000 - 0x0000000000101fff
[ 5372.455509] WARNING: CPU: 4 PID: 11823 at arch/x86/mm/ioremap.c:186 __ioremap_caller+0x2a7/0x2c0
[ 5372.455510] Modules linked in: nouveau(+) mxm_wmi wmi ttm i2c_nvidia_gpu rfcomm ccm cmac aufs overlay bnep nls_iso8859_1 snd_hda_codec_hdmi sof_pci_dev snd_sof_intel_hda_common snd_sof_intel_hda snd_sof_intel_byt snd_sof_intel_ipc snd_hda_codec_realtek snd_sof snd_hda_codec_generic snd_sof_xtensa_dsp ledtrig_audio snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm intel_rapl_msr snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer joydev iwlmvm intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mac80211 coretemp libarc4 snd btusb uvcvideo iwlwifi kvm_intel btrtl videobuf2_vmalloc btbcm videobuf2_memops btintel kvm videobuf2_v4l2 bluetooth irqbypass videobuf2_common intel_cstate rtsx_pci_ms videodev intel_rapl_perf input_leds ecdh_generic mc soundcore serio_raw memstick
[ 5372.455542]  ecc cfg80211 8250_dw intel_hid mac_hid sparse_keymap sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rtsx_pci_sdmmc i915 aesni_intel aes_x86_64 crypto_simd video cryptd glue_helper i2c_algo_bit nvme drm_kms_helper psmouse syscopyarea sysfillrect nvme_core r8169 sysimgblt i2c_i801 fb_sys_fops rtsx_pci realtek drm thunderbolt ahci intel_lpss_pci libahci intel_lpss pinctrl_cannonlake pinctrl_intel
[ 5372.455563] CPU: 4 PID: 11823 Comm: systemd-udevd Not tainted 5.3.0-26-generic #28-Ubuntu
[ 5372.455564] Hardware name: System76 Darter Pro/Darter Pro, BIOS 2019-10-31_cca6ad0 10/30/2019
[ 5372.455568] RIP: 0010:__ioremap_caller+0x2a7/0x2c0
[ 5372.455570] Code: 0f b7 05 70 db 5b 01 48 09 c1 e9 98 fe ff ff 48 8d 55 c8 48 8d 75 b8 48 c7 c7 cd 32 73 a5 c6 05 99 c1 78 01 01 e8 b4 ab 01 00 <0f> 0b 45 31 ff e9 07 ff ff ff e8 7a a8 01 00 66 2e 0f 1f 84 00 00
[ 5372.455572] RSP: 0018:ffffbd29c2f5f888 EFLAGS: 00010282
[ 5372.455574] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[ 5372.455575] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff945f5e317440
[ 5372.455576] RBP: ffffbd29c2f5f8f0 R08: 00000000000003cd R09: 0000000000000004
[ 5372.455578] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 5372.455579] R13: 0000000000102000 R14: 0000000000000002 R15: ffffffffc1320800
[ 5372.455582] FS:  00007f3cadc53880(0000) GS:ffff945f5e300000(0000) knlGS:0000000000000000
[ 5372.455583] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5372.455585] CR2: 00005587ac315ae8 CR3: 0000000c6be4a005 CR4: 00000000003606e0
[ 5372.455586] Call Trace:
[ 5372.455593]  ? _cond_resched+0x19/0x30
[ 5372.455598]  ? __kmalloc+0x180/0x270
[ 5372.455706]  ? nvkm_device_ctor+0x2d8/0x3640 [nouveau]
[ 5372.455714]  ioremap_nocache+0x1a/0x20
[ 5372.455814]  nvkm_device_ctor+0x2d8/0x3640 [nouveau]
[ 5372.455820]  ? do_pci_enable_device+0xd7/0x100
[ 5372.455894]  nvkm_device_pci_new+0x102/0x2d0 [nouveau]
[ 5372.455899]  ? _cond_resched+0x19/0x30
[ 5372.455980]  nouveau_drm_probe+0x5f/0x2e0 [nouveau]
[ 5372.455984]  local_pci_probe+0x48/0x80
[ 5372.455987]  pci_device_probe+0x10f/0x1b0
[ 5372.455991]  really_probe+0xfb/0x3a0
[ 5372.455994]  driver_probe_device+0x5f/0xe0
[ 5372.455997]  device_driver_attach+0x5d/0x70
[ 5372.456000]  __driver_attach+0x8f/0x150
[ 5372.456003]  ? device_driver_attach+0x70/0x70
[ 5372.456006]  bus_for_each_dev+0x7e/0xc0
[ 5372.456009]  driver_attach+0x1e/0x20
[ 5372.456011]  bus_add_driver+0x14f/0x1f0
[ 5372.456014]  driver_register+0x74/0xc0
[ 5372.456016]  ? 0xffffffffc13e1000
[ 5372.456018]  __pci_register_driver+0x57/0x60
[ 5372.456070]  nouveau_drm_init+0x191/0x1000 [nouveau]
[ 5372.456074]  do_one_initcall+0x4a/0x1fa
[ 5372.456077]  ? kfree+0x200/0x220
[ 5372.456080]  ? _cond_resched+0x19/0x30
[ 5372.456082]  ? kmem_cache_alloc_trace+0x163/0x230
[ 5372.456086]  do_init_module+0x62/0x250
[ 5372.456089]  load_module+0x10d4/0x1220
[ 5372.456093]  __do_sys_finit_module+0xbe/0x120
[ 5372.456096]  ? __do_sys_finit_module+0xbe/0x120
[ 5372.456100]  __x64_sys_finit_module+0x1a/0x20
[ 5372.456103]  do_syscall_64+0x5a/0x130
[ 5372.456105]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 5372.456107] RIP: 0033:0x7f3cae1c694d
[ 5372.456110] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 13 e5 0c 00 f7 d8 64 89 01 48
[ 5372.456111] RSP: 002b:00007ffe064c48d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 5372.456114] RAX: ffffffffffffffda RBX: 000056371e9c38b0 RCX: 00007f3cae1c694d
[ 5372.456115] RDX: 0000000000000000 RSI: 00007f3cae0a3cad RDI: 0000000000000010
[ 5372.456116] RBP: 00007f3cae0a3cad R08: 0000000000000000 R09: 000056371e9c38b0
[ 5372.456117] R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
[ 5372.456118] R13: 000056371e9d31b0 R14: 0000000000020000 R15: 000056371e9c38b0
[ 5372.456121] ---[ end trace bf511b76f200d1ae ]---
[ 5372.456130] nouveau: probe of 0000:06:00.0 failed with error -12

Note: I previously tried installing Nvidia drivers, blacklisting nouveau etc, and the only effects I managed to get were an unbootable system -- couldn't get past the login screen anymore (tried reversing installation of the drivers via recovery, but no luck). That's how I ended up switching from PopOS to Ubuntu. But same issue on both.

Edit: here's the output of lspci | grep VGA:

00:02.0 VGA compatible controller: Intel Corporation Device 9b41 (rev 02)
06:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2070 Rev. A] (rev a1)

jamalex avatar Jan 12 '20 19:01 jamalex

I was happy my new laptop would be using open source firmware, but this is preventing me from using the eGPU for the work I need to use it for, so at this point I'd very readily flash a standard BIOS on here if it got the eGPU working. Is that a possibility? And is this issue still being looked into?

jamalex avatar Jan 12 '20 20:01 jamalex

I realized I hadn't installed system76-driver after installing Ubuntu 19.10. I did so, and also installed system76-driver-nvidia. I still get the same behavior, but dmesg now gives what looks to be the same output as in the original report (as for nvidia-smi and sudo modprobe nvidia, as well):

[   25.666612] thunderbolt 0-1: new device found, vendor=0x8 device=0x38
[   25.666615] thunderbolt 0-1: Sonnet Technologies, Inc. eGFX Breakaway Box 550
[   25.762535] pcieport 0000:02:01.0: pciehp: Slot(1): Card present
[   25.762539] pcieport 0000:02:01.0: pciehp: Slot(1): Link Up
[   26.102817] pci 0000:04:00.0: [8086:1578] type 01 class 0x060400
[   26.102948] pci 0000:04:00.0: enabling Extended Tags
[   26.103165] pci 0000:04:00.0: supports D1 D2
[   26.103166] pci 0000:04:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[   26.103280] pci 0000:04:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 link at 0000:02:01.0 (capable of 31.504 Gb/s with 8 GT/s x4 link)
[   26.103467] pcieport 0000:02:01.0: ASPM: current common clock configuration is broken, reconfiguring
[   26.114369] pci 0000:04:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[   26.114571] pci 0000:05:01.0: [8086:1578] type 01 class 0x060400
[   26.114715] pci 0000:05:01.0: enabling Extended Tags
[   26.114937] pci 0000:05:01.0: supports D1 D2
[   26.114939] pci 0000:05:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[   26.115222] pci 0000:04:00.0: PCI bridge to [bus 05-24]
[   26.115237] pci 0000:04:00.0:   bridge window [io  0x0000-0x0fff]
[   26.115244] pci 0000:04:00.0:   bridge window [mem 0x00000000-0x000fffff]
[   26.115256] pci 0000:04:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[   26.115261] pci 0000:05:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[   26.115438] pci 0000:06:00.0: [10de:1f07] type 00 class 0x030000
[   26.115520] pci 0000:06:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
[   26.115553] pci 0000:06:00.0: reg 0x14: [mem 0x00000000-0x0fffffff 64bit pref]
[   26.115585] pci 0000:06:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff 64bit pref]
[   26.115603] pci 0000:06:00.0: reg 0x24: [io  0x0000-0x007f]
[   26.115622] pci 0000:06:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[   26.115852] pci 0000:06:00.0: PME# supported from D0 D3hot
[   26.116010] pci 0000:06:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 link at 0000:02:01.0 (capable of 126.016 Gb/s with 8 GT/s x16 link)
[   26.116060] pci 0000:06:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[   26.116065] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[   26.116131] pci 0000:06:00.1: [10de:10f9] type 00 class 0x040300
[   26.116184] pci 0000:06:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[   26.116599] pci 0000:06:00.2: [10de:1ada] type 00 class 0x0c0330
[   26.116665] pci 0000:06:00.2: reg 0x10: [mem 0x00000000-0x0003ffff 64bit pref]
[   26.116714] pci 0000:06:00.2: reg 0x1c: [mem 0x00000000-0x0000ffff 64bit pref]
[   26.116913] pci 0000:06:00.2: PME# supported from D0 D3hot
[   26.117071] pci 0000:06:00.3: [10de:1adb] type 00 class 0x0c8000
[   26.117123] pci 0000:06:00.3: reg 0x10: [mem 0x00000000-0x00000fff]
[   26.117400] pci 0000:06:00.3: PME# supported from D0 D3hot
[   26.117836] pci 0000:05:01.0: PCI bridge to [bus 06-24]
[   26.117851] pci 0000:05:01.0:   bridge window [io  0x0000-0x0fff]
[   26.117860] pci 0000:05:01.0:   bridge window [mem 0x00000000-0x000fffff]
[   26.117874] pci 0000:05:01.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[   26.117879] pci_bus 0000:06: busn_res: [bus 06-24] end is updated to 06
[   26.117888] pci_bus 0000:05: busn_res: [bus 05-24] end is updated to 06
[   26.117914] pci 0000:04:00.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[   26.117917] pci 0000:04:00.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[   26.117921] pci 0000:04:00.0: BAR 14: no space for [mem size 0x01800000]
[   26.117923] pci 0000:04:00.0: BAR 14: failed to assign [mem size 0x01800000]
[   26.117928] pci 0000:04:00.0: BAR 13: assigned [io  0x2000-0x2fff]
[   26.117933] pci 0000:05:01.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[   26.117935] pci 0000:05:01.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[   26.117938] pci 0000:05:01.0: BAR 14: no space for [mem size 0x01800000]
[   26.117940] pci 0000:05:01.0: BAR 14: failed to assign [mem size 0x01800000]
[   26.117943] pci 0000:05:01.0: BAR 13: assigned [io  0x2000-0x2fff]
[   26.117951] pci 0000:06:00.0: BAR 1: no space for [mem size 0x10000000 64bit pref]
[   26.117954] pci 0000:06:00.0: BAR 1: failed to assign [mem size 0x10000000 64bit pref]
[   26.117957] pci 0000:06:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[   26.117959] pci 0000:06:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[   26.117962] pci 0000:06:00.0: BAR 0: no space for [mem size 0x01000000]
[   26.117964] pci 0000:06:00.0: BAR 0: failed to assign [mem size 0x01000000]
[   26.117967] pci 0000:06:00.0: BAR 6: no space for [mem size 0x00080000 pref]
[   26.117969] pci 0000:06:00.0: BAR 6: failed to assign [mem size 0x00080000 pref]
[   26.117972] pci 0000:06:00.2: BAR 0: no space for [mem size 0x00040000 64bit pref]
[   26.117974] pci 0000:06:00.2: BAR 0: failed to assign [mem size 0x00040000 64bit pref]
[   26.117978] pci 0000:06:00.2: BAR 3: no space for [mem size 0x00010000 64bit pref]
[   26.117980] pci 0000:06:00.2: BAR 3: failed to assign [mem size 0x00010000 64bit pref]
[   26.117983] pci 0000:06:00.1: BAR 0: no space for [mem size 0x00004000]
[   26.117985] pci 0000:06:00.1: BAR 0: failed to assign [mem size 0x00004000]
[   26.117988] pci 0000:06:00.3: BAR 0: no space for [mem size 0x00001000]
[   26.117990] pci 0000:06:00.3: BAR 0: failed to assign [mem size 0x00001000]
[   26.117993] pci 0000:06:00.0: BAR 5: assigned [io  0x2000-0x207f]
[   26.118006] pci 0000:05:01.0: PCI bridge to [bus 06]
[   26.118011] pci 0000:05:01.0:   bridge window [io  0x2000-0x2fff]
[   26.118040] pci 0000:04:00.0: PCI bridge to [bus 05-06]
[   26.118046] pci 0000:04:00.0:   bridge window [io  0x2000-0x2fff]
[   26.118074] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[   26.118078] pcieport 0000:02:01.0:   bridge window [io  0x2000-0x3fff]
[   26.118086] pcieport 0000:02:01.0:   bridge window [mem 0xd1000000-0xd17fffff]
[   26.118092] pcieport 0000:02:01.0:   bridge window [mem 0xc1000000-0xd0ffffff 64bit pref]
[   26.118099] PCI: No. 2 try to assign unassigned res
[   26.118104] pcieport 0000:02:01.0: resource 14 [mem 0xd1000000-0xd17fffff] released
[   26.118106] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[   26.118116] pcieport 0000:02:01.0: resource 15 [mem 0xc1000000-0xd0ffffff 64bit pref] released
[   26.118118] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[   26.118141] pcieport 0000:02:01.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[   26.118144] pcieport 0000:02:01.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[   26.118147] pcieport 0000:02:01.0: BAR 14: no space for [mem size 0x01800000]
[   26.118149] pcieport 0000:02:01.0: BAR 14: failed to assign [mem size 0x01800000]
[   26.118153] pci 0000:04:00.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[   26.118156] pci 0000:04:00.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[   26.118158] pci 0000:04:00.0: BAR 14: no space for [mem size 0x01800000]
[   26.118160] pci 0000:04:00.0: BAR 14: failed to assign [mem size 0x01800000]
[   26.118164] pci 0000:05:01.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[   26.118167] pci 0000:05:01.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[   26.118169] pci 0000:05:01.0: BAR 14: no space for [mem size 0x01800000]
[   26.118171] pci 0000:05:01.0: BAR 14: failed to assign [mem size 0x01800000]
[   26.118176] pci 0000:06:00.0: BAR 1: no space for [mem size 0x10000000 64bit pref]
[   26.118179] pci 0000:06:00.0: BAR 1: failed to assign [mem size 0x10000000 64bit pref]
[   26.118182] pci 0000:06:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[   26.118184] pci 0000:06:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[   26.118186] pci 0000:06:00.0: BAR 0: no space for [mem size 0x01000000]
[   26.118188] pci 0000:06:00.0: BAR 0: failed to assign [mem size 0x01000000]
[   26.118190] pci 0000:06:00.2: BAR 0: no space for [mem size 0x00040000 64bit pref]
[   26.118193] pci 0000:06:00.2: BAR 0: failed to assign [mem size 0x00040000 64bit pref]
[   26.118195] pci 0000:06:00.2: BAR 3: no space for [mem size 0x00010000 64bit pref]
[   26.118197] pci 0000:06:00.2: BAR 3: failed to assign [mem size 0x00010000 64bit pref]
[   26.118200] pci 0000:06:00.1: BAR 0: no space for [mem size 0x00004000]
[   26.118202] pci 0000:06:00.1: BAR 0: failed to assign [mem size 0x00004000]
[   26.118204] pci 0000:06:00.3: BAR 0: no space for [mem size 0x00001000]
[   26.118207] pci 0000:06:00.3: BAR 0: failed to assign [mem size 0x00001000]
[   26.118210] pci 0000:05:01.0: PCI bridge to [bus 06]
[   26.118216] pci 0000:05:01.0:   bridge window [io  0x2000-0x2fff]
[   26.118246] pci 0000:04:00.0: PCI bridge to [bus 05-06]
[   26.118251] pci 0000:04:00.0:   bridge window [io  0x2000-0x2fff]
[   26.118315] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[   26.118322] pcieport 0000:02:01.0:   bridge window [io  0x2000-0x3fff]
[   26.118391] pcieport 0000:04:00.0: enabling device (0000 -> 0001)
[   26.119436] pcieport 0000:05:01.0: enabling device (0000 -> 0001)
[   26.120527] pci 0000:06:00.1: D0 power state depends on 0000:06:00.0
[   26.121204] snd_hda_intel 0000:06:00.1: Disabling MSI
[   26.121218] snd_hda_intel 0000:06:00.1: Handle vga_switcheroo audio client
[   26.121297] pci 0000:06:00.2: D0 power state depends on 0000:06:00.0
[   26.123177] xhci_hcd 0000:06:00.2: init 0000:06:00.2 fail, -16
[   26.123286] xhci_hcd: probe of 0000:06:00.2 failed with error -16
[   26.123336] pci 0000:06:00.3: D0 power state depends on 0000:06:00.0
[   26.123373] snd_hda_intel 0000:06:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
[   26.123374] snd_hda_intel 0000:06:00.1: ioremap error
[   26.125087] nvidia-gpu 0000:06:00.3: pcim_iomap failed
[   26.125723] IPMI message handler: version 39.2
[   26.125862] nvidia-gpu: probe of 0000:06:00.3 failed with error -12
[   26.129362] ipmi device interface
[   26.691416] nvidia: module license 'NVIDIA' taints kernel.
[   26.691418] Disabling lock debugging due to kernel taint
[   26.717409] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   26.721524] nvidia 0000:06:00.0: enabling device (0000 -> 0001)
[   26.722190] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   26.722191] NVRM: The system BIOS may have misconfigured your GPU.
[   26.722198] nvidia: probe of 0000:06:00.0 failed with error -1
[   26.722246] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   26.722247] NVRM: None of the NVIDIA devices were initialized.
[   26.726291] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   27.581741] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   27.582647] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   27.582650] NVRM: The system BIOS may have misconfigured your GPU.
[   27.582657] nvidia: probe of 0000:06:00.0 failed with error -1
[   27.582722] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   27.582723] NVRM: None of the NVIDIA devices were initialized.
[   27.583023] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   28.108225] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   28.109029] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   28.109032] NVRM: The system BIOS may have misconfigured your GPU.
[   28.109040] nvidia: probe of 0000:06:00.0 failed with error -1
[   28.109101] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   28.109102] NVRM: None of the NVIDIA devices were initialized.
[   28.109976] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   28.656604] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   28.657531] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   28.657533] NVRM: The system BIOS may have misconfigured your GPU.
[   28.657541] nvidia: probe of 0000:06:00.0 failed with error -1
[   28.657618] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   28.657619] NVRM: None of the NVIDIA devices were initialized.
[   28.658195] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   29.229447] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   29.231824] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   29.231825] NVRM: The system BIOS may have misconfigured your GPU.
[   29.231832] nvidia: probe of 0000:06:00.0 failed with error -1
[   29.231867] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   29.231868] NVRM: None of the NVIDIA devices were initialized.
[   29.232192] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   29.962784] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   29.963413] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   29.963415] NVRM: The system BIOS may have misconfigured your GPU.
[   29.963422] nvidia: probe of 0000:06:00.0 failed with error -1
[   29.963463] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   29.963464] NVRM: None of the NVIDIA devices were initialized.
[   29.965017] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   30.509769] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   30.510971] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   30.510973] NVRM: The system BIOS may have misconfigured your GPU.
[   30.510979] nvidia: probe of 0000:06:00.0 failed with error -1
[   30.511014] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   30.511015] NVRM: None of the NVIDIA devices were initialized.
[   30.512141] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   31.064048] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   31.064708] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   31.064711] NVRM: The system BIOS may have misconfigured your GPU.
[   31.064718] nvidia: probe of 0000:06:00.0 failed with error -1
[   31.064765] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   31.064766] NVRM: None of the NVIDIA devices were initialized.
[   31.065704] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   31.936015] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   31.937372] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   31.937373] NVRM: The system BIOS may have misconfigured your GPU.
[   31.937382] nvidia: probe of 0000:06:00.0 failed with error -1
[   31.937422] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   31.937423] NVRM: None of the NVIDIA devices were initialized.
[   31.938639] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   32.164701] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   32.166055] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   32.166057] NVRM: The system BIOS may have misconfigured your GPU.
[   32.166064] nvidia: probe of 0000:06:00.0 failed with error -1
[   32.166104] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   32.166105] NVRM: None of the NVIDIA devices were initialized.
[   32.166628] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   32.818358] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   32.819779] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   32.819781] NVRM: The system BIOS may have misconfigured your GPU.
[   32.819788] nvidia: probe of 0000:06:00.0 failed with error -1
[   32.819830] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   32.819831] NVRM: None of the NVIDIA devices were initialized.
[   32.820123] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237

jamalex avatar Jan 12 '20 20:01 jamalex

We are working on new firmware to address this issue

jackpot51 avatar Jan 13 '20 20:01 jackpot51

New firmware will be released within the next week that will allow using NVIDIA GPUs for CUDA and PCIE passthrough. Thunderbolt security is enabled by default and this makes using NVIDIA GPUs for graphics require custom configuration because the graphics stack loads before boltd authorizes the eGPU.

jackpot51 avatar Jan 16 '20 21:01 jackpot51

Does this disable Thunderbolt security, and make me more vulnerable to untrusted thunderbolt devices? Or do we get configuration options available in coreboot?

On January 16, 2020 4:18:08 PM Jeremy Soller [email protected] wrote:

New firmware will be released within the next week that will allow using NVIDIA GPUs for CUDA and PCIE passthrough. Thunderbolt security is enabled by default and this makes using NVIDIA GPUs for graphics require custom configuration because the graphics stack loads before boltd authorizes the eGPU. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

vthg2themax avatar Jan 16 '20 21:01 vthg2themax

Is the fix NVIDIA specific?

tkolo avatar Jan 16 '20 21:01 tkolo

The only available option is to have Thunderbolt security enabled. The fix was to increase hotplug memory for the Thunderbolt bridge and should not be specific to NVIDIA, but the test was to ensure that the RTX 2080 Ti would be correctly initialized by the NVIDIA driver.

jackpot51 avatar Jan 16 '20 23:01 jackpot51

Apologies for not understanding the response, but it sounds like thunderbolt security will remain securely the same, however a lower level change to hotplug memory will solve this issue, right? Thanks again by the way! On January 16, 2020 6:51:17 PM Jeremy Soller [email protected] wrote:

The only available option is to have Thunderbolt security enabled. The fix was to increase hotplug memory for the Thunderbolt bridge and should not be specific to NVIDIA.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

vthg2themax avatar Jan 17 '20 01:01 vthg2themax

That's right

jackpot51 avatar Jan 17 '20 04:01 jackpot51

This now works for me perectly after BIOS update. Thanks guys!

tkolo avatar Jan 24 '20 17:01 tkolo

Hi, Can someone tell me what specific egpu model would work now that I've updated my BIOS on the Darter Pro?

Thanks, Ron

zlgonzalez avatar Jan 25 '20 05:01 zlgonzalez

I have updated the firmware of my coreboot Galago Pro, and cannot get my AMD RX 5700xt in my Sonnet 650 egpu working still. I can add logs if needed, or if you prefer to continue troubleshooting only with support.

vthg2themax avatar Jan 25 '20 08:01 vthg2themax

I have Darter Pro (darp6) with Sonnet 650 and tested with borrowed AMD R9 290X. I've ordered AMD RX 5700XT now, we'll see how it works.

tkolo avatar Jan 25 '20 14:01 tkolo

I also popped in another hard drive and installed Windows 10, and installed the drivers and rebooted. The device in device manager came up with a code 12 leading me to believe that there could still be another issue with the firmware.

vthg2themax avatar Jan 26 '20 02:01 vthg2themax

I swapped out the GPU to an AMD rx560 and it also gives me a code 12 error saying there are not enough resources on the device. I do some web programming for my main job, so if you think this type of problem is something I could contribute to, please feel free to let me know where to start. I really believe in the idea of coreboot, and want to see coreboot achieve the same functionality of the out of box BIOS the device originally had.

vthg2themax avatar Jan 26 '20 14:01 vthg2themax

I'm sorry if I'm asking a stupid question, but did you also test eGPU with different laptop? Perhaps that's what's broken.

tkolo avatar Jan 26 '20 16:01 tkolo

Not a stupid question at all! Unfortunately, this laptop was my first with eGPU capability, so I do not currently have another to test with. I guess this is why being a trailblazer is so difficult. I will have to see if I can get ahold of another one to use to help troubleshoot this, but it will be a while I imagine.

vthg2themax avatar Jan 26 '20 17:01 vthg2themax

We have AMD RX 5700XT's to test with, I will try it as well. The fix was to increase the amount of hotplug memory reserved for the Thunderbolt bridge and I found the amount required to run the RTX 2080 Ti, but perhaps AMD GPUs need even more hotplug memory reserved.

We expect to fully support all modern GPUs over Thunderbolt with coreboot.

jackpot51 avatar Jan 26 '20 17:01 jackpot51

I have changed the title to remove NVIDIA since that portion is solved. We do not need any customers to investigate AMD graphics card issues as we will investigate them internally.

jackpot51 avatar Jan 26 '20 17:01 jackpot51

Answering on a Sunday? Wow! Thank you for your dedication! I just figured I would offer to help, because this is a public project with positive implications for personal freedom on an end user's device.

vthg2themax avatar Jan 26 '20 18:01 vthg2themax

My radeon 5700xt arrived today and indeed it doesn't work. You guys mentioned you'll test it as well, so I'll just wait.

tkolo avatar Jan 30 '20 18:01 tkolo

Actually never mind, loading amdgpu driver with vm_update_mode=3 fixed the issue :)

tkolo avatar Jan 30 '20 18:01 tkolo

I wonder why that would be required

jackpot51 avatar Jan 30 '20 18:01 jackpot51

parm:           vm_update_mode:VM update using CPU (0 = never (default except for large BAR(LB)), 1 = Graphics only, 2 = Compute only (default for LB), 3 = Both (int)

jackpot51 avatar Jan 30 '20 18:01 jackpot51

Is that a setting in pop os that needs to be modified?

vthg2themax avatar Jan 30 '20 19:01 vthg2themax

Perhaps, I still have more investigation to do

jackpot51 avatar Jan 30 '20 19:01 jackpot51