Firmware
Firmware copied to clipboard
AMD RX550 高频率出现 gpu reset 问题
硬件:XA61200 + 3A6000 系统: gentoo 内核: 6.6.8 , 6.7.0-rc7 , 6.7.0
使用中, 反复出现 gpu reset
在使用 freerdp , qemu , 以及使用 webkit 的浏览器的过程中, 极高频率出现 gpu reset
但不限于这些应用, 甚至 gnome-text-edit 也会引发 gpu reset , 规律无法锁定
此问题在 loongarch 交流群有多人出现此问题, 现在还没能锁定是否单独这一款显卡出现此问题, 已再次尝试购买一张 RX6400 显卡进行下一步验证
日志样本 : Dec 31 11:05:53 loongson kernel: [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx timeout, signaled seq=152958, emitted seq=152961 Dec 31 11:05:53 loongson kernel: [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process WebKitWebProces pid 333260 thread WebKitWebP:cs0 pid 333289 Dec 31 11:05:53 loongson kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin! Dec 31 11:05:57 loongson kernel: amdgpu 0000:07:00.0: amdgpu: failed to suspend display audio Dec 31 11:05:58 loongson kernel: amdgpu: cp is busy, skip halt cp Dec 31 11:05:58 loongson kernel: amdgpu: rlc is busy, skip halt rlc Dec 31 11:05:58 loongson kernel: amdgpu 0000:07:00.0: amdgpu: BACO reset Dec 31 11:05:58 loongson kernel: azx_single_wait_for_response: 62 callbacks suppressed Dec 31 11:05:58 loongson kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume Dec 31 11:05:58 loongson kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400200000). Dec 31 11:05:58 loongson kernel: [drm] VRAM is lost due to GPU reset! Dec 31 11:05:59 loongson kernel: [drm] UVD and UVD ENC initialized successfully. Dec 31 11:05:59 loongson kernel: [drm] VCE initialized successfully. Dec 31 11:05:59 loongson kernel: amdgpu 0000:07:00.0: amdgpu: recover vram bo from shadow start Dec 31 11:05:59 loongson kernel: amdgpu 0000:07:00.0: amdgpu: recover vram bo from shadow done Dec 31 11:05:59 loongson kernel: [drm] Skip scheduling IBs! Dec 31 11:05:59 loongson kernel: [drm] Skip scheduling IBs! Dec 31 11:05:59 loongson kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset(6) succeeded!
It may be related to a hardware issue (hazard in HyperTransport) explained in https://github.com/chenhuacai/linux/commit/a1e31fe7e00ad569d145b2ac09546a2dda04ba65. But it's just a "may" and the workaround is for radeon, no workaround has been developed for amdgpu yet.
目前 替换为一块 R5 240 使用 DisplayPort 输出口的显卡, 情况终于变的正常.
我也遇到了这样的问题。我是在跑SPEC CPU 2017,跑一段时间看看它,怎么动鼠标键盘,屏幕都不亮,没有显示输出。ssh进去一看,SPEC CPU 2017在正常跑,显示挂了。一看dmesg信息,gpu reset。