compute-runtime icon indicating copy to clipboard operation
compute-runtime copied to clipboard

Failed to reset GuC, ret = -110 leads to hard LOCKUP on cpu

Open JHarding86 opened this issue 2 years ago • 5 comments

Hello, I am running an Alder Lake i9-12900K with Proxmox (debian). I have been running into this issue for the past few weeks which leads to a complete system crash.

[i915] *ERROR* Failed to reset GuC, ret = -110
NMI watchdog: Watchdog detected hard LOCKUP on cpu 8
NMI watchdog: Watchdog detected hard LOCKUP on cpu 4

Check the attached picture for more errors. I can provide answers to any questions, and I wish I could provide more information, I'm just not sure where to look IMG_2497 .

JHarding86 avatar Mar 28 '22 00:03 JHarding86

this seems to indicate your program is crashing or hanging, and sending the GPU into a bad state. is this a new test you are running, or was it working before? could you share a bit of details of what you are executing?

jandres742 avatar Mar 28 '22 00:03 jandres742

I have a Plex Linux Container that is attempting to do some hardware transcoding.

Admittedly i915 is new to me, and this is a new setup. I initially battled getting hardware transcoding working at all on Adler Lake. I have a thread on the Proxmox forums (here is the thread https://forum.proxmox.com/threads/alder-lake-gvt-d-integrated-graphics-passthrough.105983/) that details my journey up until this point, but I transitioned here because I think this is more pertinent to i915. Please let me know if I'm wrong.

JHarding86 avatar Mar 28 '22 00:03 JHarding86

This happened again, here is a picture of the output. image

JHarding86 avatar Mar 30 '22 10:03 JHarding86

I have a Plex Linux Container that is attempting to do some hardware transcoding.

@JHarding86 You've filed your issue to wrong project. Transcoding is responsibility of media-driver, not compute-runtime: https://github.com/intel/media-driver/

Additionally, while GPU hangs can be either kernel or user-space driver issue, GPU resets to recover from those are solely kernel (and FW) responsibility. Not a user-space driver issue (whether it's compute-runtime, media-driver or mesa).

Your bug report is missing information on which versions you have. After boot, that can be seen with: dmesg | grep -i -e "linux version" -e i915

Make sure you're using latest kernel + matching GuC version, and if that does not help, please file the GPU reset failure issue against kernel: https://gitlab.freedesktop.org/drm/intel/-/issues

You could file another ticket against media-driver about it triggering the reset, and link the kernel ticket there.

Please close this one.

eero-t avatar Apr 25 '22 14:04 eero-t

Hi @JHarding86 did you report in media-driver or kernel? I'm having the exact same issue but went looking for a media-driver bug to add detail to and didn't find one.

sunbeam60 avatar Jun 21 '22 09:06 sunbeam60

As @eero-t mentioned:

Make sure you're using latest kernel + matching GuC version, and if that does not help, please file the GPU reset failure issue against kernel: https://gitlab.freedesktop.org/drm/intel/-/issues You could file another ticket against media-driver about it triggering the reset, and link the kernel ticket there.

I'm closing this issue

JablonskiMateusz avatar Aug 22 '22 14:08 JablonskiMateusz