drm-kmod
drm-kmod copied to clipboard
5.15.25: radeonkms: ring 0 stalled, GPU lockup, GPU reset succeeded, blackout persisted, no response to keyboard input
Description
Around twenty-five minutes after a crash of Firefox (https://reviews.freebsd.org/P557), I reopened Firefox.
Both displays blacked out.
If I recall correctly, the blackout began:
- whilst Firefox started (around 719 tabs, most hidden, across three windows on the display to the left of the notebook)
- maybe also whilst I used a trackball to move the pointer from the display on the left, to the right.
I waited for a minute or so, no response to keyboard input.
Hard disk drive activity was visible, so I pressed the power button for a graceful shutdown.
I started the computer, viewed logs. The result of a probe whilst drafting this issue: https://bsd-hardware.info/?probe=1a4897cb53.
An extract from /var/log/messages
:
2023-02-26 09.00 messages extract.txt
DRM-related lines:
drmn0: ring 0 stalled for more than 10119msec
drmn0: GPU lockup (current fence id 0x0000000000048f10 last fence id 0x0000000000048f23 on ring 0)
drmn0: Saved 610 dwords of commands on ring 0.
drmn0: GPU softreset: 0x00000019
drmn0: GRBM_STATUS = 0xA2701CA0
drmn0: GRBM_STATUS_SE0 = 0x1C000003
drmn0: GRBM_STATUS_SE1 = 0x00000007
drmn0: SRBM_STATUS = 0x200000C0
drmn0: SRBM_STATUS2 = 0x00000000
drmn0: R_008674_CP_STALLED_STAT1 = 0x01000000
drmn0: R_008678_CP_STALLED_STAT2 = 0x00011000
drmn0: R_00867C_CP_BUSY_STAT = 0x00068406
drmn0: R_008680_CP_STAT = 0x80878647
drmn0: R_00D034_DMA_STATUS_REG = 0x44C83D57
drmn0: GRBM_SOFT_RESET=0x00007F6B
drmn0: SRBM_SOFT_RESET=0x00000100
drmn0: GRBM_STATUS = 0x00003828
drmn0: GRBM_STATUS_SE0 = 0x00000007
drmn0: GRBM_STATUS_SE1 = 0x00000007
drmn0: SRBM_STATUS = 0x200000C0
drmn0: SRBM_STATUS2 = 0x00000000
drmn0: R_008674_CP_STALLED_STAT1 = 0x00000000
drmn0: R_008678_CP_STALLED_STAT2 = 0x00000000
drmn0: R_00867C_CP_BUSY_STAT = 0x00000000
drmn0: R_008680_CP_STAT = 0x00000000
drmn0: R_00D034_DMA_STATUS_REG = 0x44C83D57
drmn0: GPU reset succeeded, trying to resume
[drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
drmn0: WB enabled
drmn0: fence driver on ring 0 use gpu addr 0x0000000040000c00
drmn0: fence driver on ring 3 use gpu addr 0x0000000040000c0c
drmn0: fence driver on ring 5 use gpu addr 0x0000000000072118
[drm] ring test on 0 succeeded in 1 usecs
[drm] ring test on 3 succeeded in 4 usecs
[drm] ring test on 5 succeeded in 2 usecs
[drm] UVD initialized successfully.
FreeBSD version
% uname -a
FreeBSD mowa219-gjp4-8570p-freebsd 14.0-CURRENT FreeBSD 14.0-CURRENT #33 main-n261014-cd406ac94d8b: Sun Feb 19 01:35:14 GMT 2023 grahamperrin@mowa219-gjp4-8570p-freebsd:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64
% uname -KU
1400081 1400081
%
PCI info
pciconf -lv
hostb0@pci0:0:0:0: class=0x060000 rev=0x09 hdr=0x00 vendor=0x8086 device=0x0154 subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = '3rd Gen Core processor DRAM Controller'
class = bridge
subclass = HOST-PCI
pcib1@pci0:0:1:0: class=0x060400 rev=0x09 hdr=0x01 vendor=0x8086 device=0x0151 subvendor=0x8086 subdevice=0x2010
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port'
class = bridge
subclass = PCI-PCI
xhci0@pci0:0:20:0: class=0x0c0330 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e31 subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = '7 Series/C210 Series Chipset Family USB xHCI Host Controller'
class = serial bus
subclass = USB
none0@pci0:0:22:0: class=0x078000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e3a subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = '7 Series/C216 Chipset Family MEI Controller'
class = simple comms
uart2@pci0:0:22:3: class=0x070002 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e3d subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = '7 Series/C210 Series Chipset Family KT Controller'
class = simple comms
subclass = UART
em0@pci0:0:25:0: class=0x020000 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1502 subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = '82579LM Gigabit Network Connection (Lewisville)'
class = network
subclass = ethernet
ehci0@pci0:0:26:0: class=0x0c0320 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e2d subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = '7 Series/C216 Chipset Family USB Enhanced Host Controller'
class = serial bus
subclass = USB
hdac1@pci0:0:27:0: class=0x040300 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e20 subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = '7 Series/C216 Chipset Family High Definition Audio Controller'
class = multimedia
subclass = HDA
pcib2@pci0:0:28:0: class=0x060400 rev=0xc4 hdr=0x01 vendor=0x8086 device=0x1e10 subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = '7 Series/C216 Chipset Family PCI Express Root Port 1'
class = bridge
subclass = PCI-PCI
pcib3@pci0:0:28:2: class=0x060400 rev=0xc4 hdr=0x01 vendor=0x8086 device=0x1e14 subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = '7 Series/C210 Series Chipset Family PCI Express Root Port 3'
class = bridge
subclass = PCI-PCI
pcib4@pci0:0:28:3: class=0x060400 rev=0xc4 hdr=0x01 vendor=0x8086 device=0x1e16 subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = '7 Series/C216 Chipset Family PCI Express Root Port 4'
class = bridge
subclass = PCI-PCI
ehci1@pci0:0:29:0: class=0x0c0320 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e26 subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = '7 Series/C216 Chipset Family USB Enhanced Host Controller'
class = serial bus
subclass = USB
isab0@pci0:0:31:0: class=0x060100 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e55 subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = 'QM77 Express Chipset LPC Controller'
class = bridge
subclass = PCI-ISA
ahci0@pci0:0:31:2: class=0x010601 rev=0x04 hdr=0x00 vendor=0x8086 device=0x1e03 subvendor=0x103c subdevice=0x17a7
vendor = 'Intel Corporation'
device = '7 Series Chipset Family 6-port SATA Controller [AHCI mode]'
class = mass storage
subclass = SATA
vgapci0@pci0:1:0:0: class=0x030000 rev=0x00 hdr=0x00 vendor=0x1002 device=0x6841 subvendor=0x103c subdevice=0x17a9
vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
device = 'Thames [Radeon HD 7550M/7570M/7650M]'
class = display
subclass = VGA
hdac0@pci0:1:0:1: class=0x040300 rev=0x00 hdr=0x00 vendor=0x1002 device=0xaa90 subvendor=0x103c subdevice=0x17a9
vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
device = 'Turks HDMI Audio [Radeon HD 6500/6600 / 6700M Series]'
class = multimedia
subclass = HDA
iwn0@pci0:4:0:0: class=0x028000 rev=0x34 hdr=0x00 vendor=0x8086 device=0x0082 subvendor=0x8086 subdevice=0x1301
vendor = 'Intel Corporation'
device = 'Centrino Advanced-N 6205 [Taylor Peak]'
class = network
%
DRM KMOD version
% pkg query -x '%n %v' 'drm.*kmod'
drm-515-kmod 5.15.25
% pkg info drm-515-kmod | grep -e Installed -e repository
Installed on : Sun Feb 19 15:51:53 2023 GMT
repository : poudriere
%
To reproduce
The issue has not occurred frequently enough for me to make it reproducible, sorry.
This morning's blackout is, maybe, the third since I began testing drm-515-kmod.
If I recall correctly, the previous blackout was very soon after wake from sleep (moments after the SDDM lock screen appeared). At the time I was too busy/lazy to check logs, so I can't be certain that the cause was the same.
Screenshots
Not applicable.
Additional context
Firmware built from source, https://github.com/freebsd/drm-kmod-firmware/commit/d21284bf7970e87313a9aee4b39142585e0721ca (2023-02-17).
% pwd
/usr/home/grahamperrin/dev/drm-kmod-firmware
% git branch
* master
% git rev-list --max-count=1 HEAD
d21284bf7970e87313a9aee4b39142585e0721ca
% git pull --ff-only
Already up to date.
% zgrep firmware /var/log/messages.0.bz2 | tail -n 8
Feb 26 02:14:52 mowa219-gjp4-8570p-freebsd kernel: iwn0: iwn_read_firmware: ucode rev=0x12a80601
Feb 26 09:28:52 mowa219-gjp4-8570p-freebsd kernel: drmn0: successfully loaded firmware image 'radeon/TURKS_pfp.bin'
Feb 26 09:28:52 mowa219-gjp4-8570p-freebsd kernel: drmn0: successfully loaded firmware image 'radeon/TURKS_me.bin'
Feb 26 09:28:52 mowa219-gjp4-8570p-freebsd kernel: drmn0: successfully loaded firmware image 'radeon/BTC_rlc.bin'
Feb 26 09:28:52 mowa219-gjp4-8570p-freebsd kernel: drmn0: successfully loaded firmware image 'radeon/TURKS_mc.bin'
Feb 26 09:28:52 mowa219-gjp4-8570p-freebsd kernel: drmn0: successfully loaded firmware image 'radeon/TURKS_smc.bin'
Feb 26 09:28:52 mowa219-gjp4-8570p-freebsd kernel: drmn0: successfully loaded firmware image 'radeon/SUMO_uvd.bin'
Feb 26 09:28:52 mowa219-gjp4-8570p-freebsd kernel: iwn0: iwn_read_firmware: ucode rev=0x12a80601
%
… If I recall correctly, the blackout began:
- whilst Firefox started (around 719 tabs, most hidden, across three windows on the display to the left of the notebook)
- maybe also whilst I used a trackball to move the pointer from the display on the left, to the right.
Now, reviewing what's in the three windows, I think it more likely that:
- Firefox startup was complete
- in a tab in the third window, I followed a link from within https://reviews.freebsd.org/D38720 to https://reviews.freebsd.org/R9:14a267f652a6164d1d8c453ce19424ad7f324b49, intending to copy the hash
– and for copy purposes, I typically aim for something near the pointer that will respond neatly to a double-click, so I guess I moved the pointer towards the address bar and maybe the blackout occurred before I could double-click the 14a267f652a6164d1d8c453ce19424ad7f324b49
part of the URL.
Could you ssh to the machine ?
Good thinking. I didn't try ssh at the time, but given that disk activity was visible, I do strongly suspect that ssh would have worked.
After another blackout occurred, a few weeks ago I reverted to drm-510-kmod.
If I step forward again, what will be most useful (to you) for me to retry/try:
- drm-515-kmod, or
master
?
If I can ssh in when symptoms recur, what would you like me to run?
719 tabs?
If on drm-515-kmod, you are on 14-CURRENT, so best to stick to drm-515 instead of master.
Thanks,
… you are on 14-CURRENT …
I alrady mentioned 14.0-CURRENT, more specifically 1400081
, in the opening post.
@evadot will feedback from (packaged) drm-515-kmod be good enough to progress this issue? Or would you prefer me to build from source (master
)?
https://github.com/FreeBSD/freebsd-ports/commit/231fddc24bd7780d2d08b63ef16a823e27385002 looks interesting, I'll build from ports.
With drm-515-kmod-5.15.25_3
, yesterday at 08:57:
…
drmn0: ring 0 stalled for more than 10276msec
drmn0: GPU lockup (current fence id 0x00000000000769c3 last fence id 0x00000000000769fe on ring 0)
drmn0: failed to get a new IB (-11)
[drm ERROR :radeon_cs_ib_fill] Failed to get ib !
drmn0: Saved 1874 dwords of commands on ring 0.
drmn0: GPU softreset: 0x00000019
drmn0: GRBM_STATUS = 0xA2703CA0
drmn0: GRBM_STATUS_SE0 = 0x1C000007
drmn0: GRBM_STATUS_SE1 = 0x00000007
drmn0: SRBM_STATUS = 0x200000C0
drmn0: SRBM_STATUS2 = 0x00000000
drmn0: R_008674_CP_STALLED_STAT1 = 0x01000000
drmn0: R_008678_CP_STALLED_STAT2 = 0x00011000
drmn0: R_00867C_CP_BUSY_STAT = 0x00068406
drmn0: R_008680_CP_STAT = 0x80878647
drmn0: R_00D034_DMA_STATUS_REG = 0x44C83D57
drmn0: GRBM_SOFT_RESET=0x00007F6B
drmn0: SRBM_SOFT_RESET=0x00000100
drmn0: GRBM_STATUS = 0x00003828
drmn0: GRBM_STATUS_SE0 = 0x00000007
drmn0: GRBM_STATUS_SE1 = 0x00000007
drmn0: SRBM_STATUS = 0x200000C0
drmn0: SRBM_STATUS2 = 0x00000000
drmn0: R_008674_CP_STALLED_STAT1 = 0x00000000
drmn0: R_008678_CP_STALLED_STAT2 = 0x00000000
drmn0: R_00867C_CP_BUSY_STAT = 0x00000000
drmn0: R_008680_CP_STAT = 0x00000000
drmn0: R_00D034_DMA_STATUS_REG = 0x44C83D57
drmn0: GPU reset succeeded, trying to resume
[drm] enabling PCIE gen 2 link speeds, disable with radeon.pcie_gen2=0
[drm] PCIE GART of 1024M enabled (table at 0x0000000000162000).
drmn0: WB enabled
drmn0: fence driver on ring 0 use gpu addr 0x0000000040000c00
drmn0: fence driver on ring 3 use gpu addr 0x0000000040000c0c
drmn0: fence driver on ring 5 use gpu addr 0x0000000000072118
[drm] ring test on 0 succeeded in 1 usecs
[drm] ring test on 3 succeeded in 4 usecs
[drm] ring test on 5 succeeded in 2 usecs
[drm] UVD initialized successfully.
[drm] ib test on ring 0 succeeded in 0 usecs
[drm] ib test on ring 3 succeeded in 0 usecs
[drm] ib test on ring 5 succeeded
…
After the event, 09:02, the result of a probe: https://bsd-hardware.info/?probe=95f2b2f9d6.
09:04:
I might have run plasmashell --replace
, instead I chose to restart the computer.
Context (08:00:00–09:07):
I'm experiencing the same issue, and managed to reproduced it somewhat consistently.
To Reproduce:
Freshly started session on Sway or Hpyrland with swayidle/swaylock
in the background:
swayidle -w timeout 300 'swaylock -f -c 000000' timeout 600 'swaymsg "output * power off"' resume 'swaymsg "output * power on"' before-sleep 'swaylock -f -c 000000 --effect-blur 7x5'
FreeBSD version
FreeBSD hawkeye.stormriders.local 14.0-RELEASE-p3 FreeBSD 14.0-RELEASE-p3 #0: Mon Dec 11 04:56:01 UTC 2023 [email protected]:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
❯ uname -KU
% 1400097 1400097
PCI Info
❯ pciconf -lv
hostb0@pci0:0:0:0: class=0x060000 rev=0x06 hdr=0x00 vendor=0x8086 device=0x0c00 subvendor=0x1043 subdevice=0x8534
vendor = 'Intel Corporation'
device = '4th Gen Core Processor DRAM Controller'
class = bridge
subclass = HOST-PCI
pcib1@pci0:0:1:0: class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x0c01 subvendor=0x1043 subdevice=0x8534
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v3/4th Gen Core Processor PCI Express x16 Controller'
class = bridge
subclass = PCI-PCI
pcib4@pci0:0:1:1: class=0x060400 rev=0x06 hdr=0x01 vendor=0x8086 device=0x0c05 subvendor=0x1043 subdevice=0x8534
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v3/4th Gen Core Processor PCI Express x8 Controller'
class = bridge
subclass = PCI-PCI
xhci0@pci0:0:20:0: class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x8086 device=0x8cb1 subvendor=0x1043 subdevice=0x8534
vendor = 'Intel Corporation'
device = '9 Series Chipset Family USB xHCI Controller'
class = serial bus
subclass = USB
none0@pci0:0:22:0: class=0x078000 rev=0x00 hdr=0x00 vendor=0x8086 device=0x8cba subvendor=0x1043 subdevice=0x8534
vendor = 'Intel Corporation'
device = '9 Series Chipset Family ME Interface'
class = simple comms
em0@pci0:0:25:0: class=0x020000 rev=0x00 hdr=0x00 vendor=0x8086 device=0x15a1 subvendor=0x1043 subdevice=0x85c4
vendor = 'Intel Corporation'
device = 'Ethernet Connection (2) I218-V'
class = network
subclass = ethernet
ehci0@pci0:0:26:0: class=0x0c0320 rev=0x00 hdr=0x00 vendor=0x8086 device=0x8cad subvendor=0x1043 subdevice=0x8534
vendor = 'Intel Corporation'
device = '9 Series Chipset Family USB EHCI Controller'
class = serial bus
subclass = USB
hdac1@pci0:0:27:0: class=0x040300 rev=0x00 hdr=0x00 vendor=0x8086 device=0x8ca0 subvendor=0x1043 subdevice=0x860b
vendor = 'Intel Corporation'
device = '9 Series Chipset Family HD Audio Controller'
class = multimedia
subclass = HDA
pcib5@pci0:0:28:0: class=0x060400 rev=0xd0 hdr=0x01 vendor=0x8086 device=0x8c90 subvendor=0x1043 subdevice=0x8534
vendor = 'Intel Corporation'
device = '9 Series Chipset Family PCI Express Root Port 1'
class = bridge
subclass = PCI-PCI
pcib6@pci0:0:28:3: class=0x060401 rev=0xd0 hdr=0x01 vendor=0x8086 device=0x244e subvendor=0x1043 subdevice=0x8534
vendor = 'Intel Corporation'
device = '82801 PCI Bridge'
class = bridge
subclass = PCI-PCI
ehci1@pci0:0:29:0: class=0x0c0320 rev=0x00 hdr=0x00 vendor=0x8086 device=0x8ca6 subvendor=0x1043 subdevice=0x8534
vendor = 'Intel Corporation'
device = '9 Series Chipset Family USB EHCI Controller'
class = serial bus
subclass = USB
isab0@pci0:0:31:0: class=0x060100 rev=0x00 hdr=0x00 vendor=0x8086 device=0x8cc4 subvendor=0x1043 subdevice=0x8534
vendor = 'Intel Corporation'
device = 'Z97 Chipset LPC Controller'
class = bridge
subclass = PCI-ISA
ahci0@pci0:0:31:2: class=0x010601 rev=0x00 hdr=0x00 vendor=0x8086 device=0x8c82 subvendor=0x1043 subdevice=0x8534
vendor = 'Intel Corporation'
device = '9 Series Chipset Family SATA Controller [AHCI Mode]'
class = mass storage
subclass = SATA
ichsmb0@pci0:0:31:3: class=0x0c0500 rev=0x00 hdr=0x00 vendor=0x8086 device=0x8ca2 subvendor=0x1043 subdevice=0x8534
vendor = 'Intel Corporation'
device = '9 Series Chipset Family SMBus Controller'
class = serial bus
subclass = SMBus
pcib2@pci0:1:0:0: class=0x060400 rev=0xc7 hdr=0x01 vendor=0x1002 device=0x1478 subvendor=0x0000 subdevice=0x0000
vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
device = 'Navi 10 XL Upstream Port of PCI Express Switch'
class = bridge
subclass = PCI-PCI
pcib3@pci0:2:0:0: class=0x060400 rev=0x00 hdr=0x01 vendor=0x1002 device=0x1479 subvendor=0x1002 subdevice=0x1479
vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
device = 'Navi 10 XL Downstream Port of PCI Express Switch'
class = bridge
subclass = PCI-PCI
vgapci0@pci0:3:0:0: class=0x030000 rev=0xc7 hdr=0x00 vendor=0x1002 device=0x73ff subvendor=0x1043 subdevice=0x05d5
vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
device = 'Navi 23 [Radeon RX 6600/6600 XT/6600M]'
class = display
subclass = VGA
hdac0@pci0:3:0:1: class=0x040300 rev=0x00 hdr=0x00 vendor=0x1002 device=0xab28 subvendor=0x1002 subdevice=0xab28
vendor = 'Advanced Micro Devices, Inc. [AMD/ATI]'
device = 'Navi 21/23 HDMI/DP Audio Controller'
class = multimedia
subclass = HDA
rtwn0@pci0:4:0:0: class=0x028000 rev=0x01 hdr=0x00 vendor=0x10ec device=0x8179 subvendor=0x10ec subdevice=0x8197
vendor = 'Realtek Semiconductor Co., Ltd.'
device = 'RTL8188EE Wireless Network Adapter'
class = network
pcib7@pci0:6:0:0: class=0x060401 rev=0x04 hdr=0x01 vendor=0x1b21 device=0x1080 subvendor=0x1043 subdevice=0x8489
vendor = 'ASMedia Technology Inc.'
device = 'ASM1083/1085 PCIe to PCI Bridge'
class = bridge
subclass = PCI-PCI
DRM Kmod
❯ sudo pkg query -x '%n %v' 'drm.*kmod'
drm-515-kmod 5.15.118_3
drm-kmod 20220907_1