packages icon indicating copy to clipboard operation
packages copied to clipboard

AMDGPU Crashing Kernel When Suspending

Open moriel5 opened this issue 1 year ago • 6 comments

Please confirm there isn't an existing open bug report

  • [X] I have searched open bugs for this issue

Summary

Motherboard Asus Z97 Pro Gamer CPU: i5-4570 GPU: Sapphire Radeon 550 4GB Kernel version: 6.11.5-30.current Current Sync: 10/30/2024

A few weeks or months ago (I forget how long), after one of the weekly syncs (I believe when going to Linux 6.10, though it could be later), my desktop PC would no longer sleep when I suspended the OS (whether via the power button, as that was what I had set it to, or via the user indicator on Budgie Panel), which I had noticed due to when I thought I was waking, instead the fans would start for half a second, stop, and then start again, with the PC posting like normal, as though I had just turned it on (cold boot).

This has gone on unchanged through the last few syncs, including a few minutes ago, and I just never got around to posting this issue until now.

Checking logs did not reveal anything, so I turned to dmesg while suspending, and the last lines referred to AMDGPU in connection to the Linux kernel crashing.

Logs will be attached soon, after I crash the kernel again to obtain the last lines.

Steps to reproduce

  1. Press Suspend.
  2. System suspends.
  3. Kernel crashes.
  4. PC turns off.

Expected result

The PC should suspend normally, without the kernel crashing.

Actual result

The kernel crashed, and the PC turned off.

Environment

  • [X] Is system up to date?

Repo

Shannon (stable)

Desktop Environment

Budgie

System details

System: Host: moriel-pc Kernel: 6.11.5-307.current arch: x86_64 bits: 64 Desktop: Budgie v: 10.9.2 Distro: Solus 4.6 convergence Machine: Type: Desktop System: ASUS product: All Series v: N/A serial: Mobo: ASUSTeK model: Z97-PRO GAMER v: Rev X.0x serial: UEFI: American Megatrends v: 2203 date: 02/26/2016 CPU: Info: quad core Intel Core i5-4570 [MCP] speed (MHz): avg: 824 min/max: 800/3600 Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Lexa PRO [Radeon 540/540X/550/550X / RX 540X/550/550X] driver: amdgpu v: kernel Device-2: MosChip PCI 9865 Multi-I/O driver: N/A Device-3: Logitech B525 HD Webcam driver: snd-usb-audio,uvcvideo type: USB Display: x11 server: X.Org v: 21.1.14 with: Xwayland v: 24.1.4 driver: X: loaded: amdgpu,modesetting unloaded: fbdev,vesa dri: radeonsi,crocus gpu: amdgpu resolution: 1: 1920x1080~60Hz 2: 1280x1024~60Hz API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.2.5 renderer: AMD Radeon RX 550 / 550 Series (radeonsi polaris12 LLVM 18.1.8 DRM 3.59 6.11.5-307.current) Network: Device-1: Intel Ethernet I218-V driver: e1000e Device-2: Intel Wi-Fi 6 AX200 driver: iwlwifi Drives: Local Storage: total: 931.52 GiB used: 372.79 GiB (40.0%) Info: Memory: total: 20 GiB available: 19.48 GiB used: 2.23 GiB (11.4%) Processes: 272 Uptime: 41m Shell: Bash inxi: 3.3.36

Other comments

Update: Unfortunately, I was unable to capture the very moment the kernel crashed, as it did before dmesg managed to output anything, so the logs are incomplete (the crash happens right after the system gets into the suspended state). journal.log

moriel5 avatar Oct 30 '24 01:10 moriel5

Unfortunately, the logs don't have anything after the suspend operation, they don't have the kernel crash information.

I noticed that systemd had a coredump a few minutes before the power button was pressed to suspend the system

coredump

Oct 30 03:50:30 moriel-pc systemd[1]: Created slice system-systemd\x2dcoredump.slice - Slice /system/systemd-coredump. Oct 30 03:50:30 moriel-pc systemd[1]: Started [email protected] - Process Core Dump (PID 5180/UID 0). Oct 30 03:50:32 moriel-pc systemd-coredump[5181]: Process 2478 (im.nheko.Nheko) of user 1000 dumped core.

                                              Stack trace of thread 2478:
                                              #0  0x00007f77916aa87b pthread_kill (libc.so.6 + 0x9e87b)
                                              #1  0x00007f7791650316 raise (libc.so.6 + 0x44316)
                                              #2  0x000055c9df2681a6 _Z17stacktraceHandleri (im.nheko.Nheko + 0xa6d1a6)
                                              #3  0x00007f77916503c0 n/a (libc.so.6 + 0x443c0)
                                              #4  0x000055ca02d10940 n/a (n/a + 0x0)
                                              #5  0x00007f77935b2aa7 event_add_nolock_ (libevent_core-2.1.so.7 + 0x19aa7)
                                              #6  0x00007f77935b2ddc event_add (libevent_core-2.1.so.7 + 0x19ddc)
                                              #7  0x00007f7794ae9410 _ZN6coeurl6Client7addsockEii (libcoeurl.so.0.3 + 0xa410)
                                              #8  0x00007f7794ae982c _ZN6coeurl6Client7sock_cbEPviiS1_S1_ (libcoeurl.so.0.3 + 0xa82c)
                                              #9  0x00007f7791b3cc0b n/a (libcurl.so.4 + 0x5cc0b)
                                              #10 0x00007f7791afa592 n/a (libcurl.so.4 + 0x1a592)
                                              #11 0x00007f7791afa96d n/a (libcurl.so.4 + 0x1a96d)
                                              #12 0x00007f7791afab3a n/a (libcurl.so.4 + 0x1ab3a)
                                              #13 0x00007f7791afae1d n/a (libcurl.so.4 + 0x1ae1d)
                                              #14 0x00007f7791b3c970 curl_multi_cleanup (libcurl.so.4 + 0x5c970)
                                              #15 0x00007f7794ae9f6f _ZN6coeurl6ClientD2Ev (libcoeurl.so.0.3 + 0xaf6f)
                                              #16 0x00007f77947a84ea _ZN3mtx4http6ClientD1Ev (libmatrix_client.so.0.10.0 + 0x1a84ea)
                                              #17 0x000055c9dee1390f n/a (im.nheko.Nheko + 0x61890f)
                                              #18 0x00007f7791652c60 n/a (libc.so.6 + 0x46c60)
                                              #19 0x00007f7791652d1e exit (libc.so.6 + 0x46d1e)
                                              #20 0x00007f77916364f3 n/a (libc.so.6 + 0x2a4f3)
                                              #21 0x00007f77916365a9 __libc_start_main (libc.so.6 + 0x2a5a9)
                                              #22 0x000055c9dee02965 _start (im.nheko.Nheko + 0x607965)
                                              
                                              Stack trace of thread 2572:
                                              #0  0x00007f7791721c0d syscall (libc.so.6 + 0x115c0d)
                                              #1  0x00007f7791d2da1a n/a (libglib-2.0.so.0 + 0xc0a1a)
                                              #2  0x00007f7791d26911 n/a (libglib-2.0.so.0 + 0xb9911)
                                              #3  0x00007f77916a89ea n/a (libc.so.6 + 0x9c9ea)
                                              #4  0x00007f77917244cc n/a (libc.so.6 + 0x1184cc)
                                              
                                              Stack trace of thread 2573:
                                              #0  0x00007f7791717686 ppoll (libc.so.6 + 0x10b686)
                                              #1  0x00007f7791d6edcd n/a (libglib-2.0.so.0 + 0x101dcd)
                                              #2  0x00007f7791ce8566 n/a (libglib-2.0.so.0 + 0x7b566)
                                              #3  0x00007f7791d26911 n/a (libglib-2.0.so.0 + 0xb9911)
                                              #4  0x00007f77916a89ea n/a (libc.so.6 + 0x9c9ea)
                                              #5  0x00007f77917244cc n/a (libc.so.6 + 0x1184cc)
                                              
                                              Stack trace of thread 2575:
                                              #0  0x00007f7791721c0d syscall (libc.so.6 + 0x115c0d)
                                              #1  0x00007f7791d6e723 n/a (libglib-2.0.so.0 + 0x101723)
                                              #2  0x00007f7791c95049 g_async_queue_pop (libglib-2.0.so.0 + 0x28049)
                                              #3  0x00007f7782b1db75 n/a (libpangoft2-1.0.so.0 + 0xdb75)
                                              #4  0x00007f7791d26911 n/a (libglib-2.0.so.0 + 0xb9911)
                                              #5  0x00007f77916a89ea n/a (libc.so.6 + 0x9c9ea)
                                              #6  0x00007f77917244cc n/a (libc.so.6 + 0x1184cc)
                                              
                                              Stack trace of thread 2570:
                                              #0  0x00007f7791717686 ppoll (libc.so.6 + 0x10b686)
                                              #1  0x00007f7791d6edcd n/a (libglib-2.0.so.0 + 0x101dcd)
                                              #2  0x00007f7791ce1754 g_main_context_iteration (libglib-2.0.so.0 + 0x74754)
                                              #3  0x00007f77923bb2b2 _ZN20QEventDispatcherGlib13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE (libQt6Core.so.6 + 0x5bb2b2)
                                              #4  0x00007f77920ce00a _ZN10QEventLoop4execE6QFlagsINS_17ProcessEventsFlagEE (libQt6Core.so.6 + 0x2ce00a)
                                              #5  0x00007f77921e4786 _ZN7QThread4execEv (libQt6Core.so.6 + 0x3e4786)
                                              #6  0x00007f7792f951fa n/a (libQt6DBus.so.6 + 0x4f1fa)
                                              #7  0x00007f779228d4df n/a (libQt6Core.so.6 + 0x48d4df)
                                              #8  0x00007f77916a89ea n/a (libc.so.6 + 0x9c9ea)
                                              #9  0x00007f77917244cc n/a (libc.so.6 + 0x1184cc)
                                              
                                              Stack trace of thread 2574:
                                              #0  0x00007f7791717686 ppoll (libc.so.6 + 0x10b686)
                                              #1  0x00007f7791d6edcd n/a (libglib-2.0.so.0 + 0x101dcd)
                                              #2  0x00007f7791ce6b8f g_main_loop_run (libglib-2.0.so.0 + 0x79b8f)
                                              #3  0x00007f778ff78ec0 n/a (libgio-2.0.so.0 + 0x178ec0)
                                              #4  0x00007f7791d26911 n/a (libglib-2.0.so.0 + 0xb9911)
                                              #5  0x00007f77916a89ea n/a (libc.so.6 + 0x9c9ea)
                                              #6  0x00007f77917244cc n/a (libc.so.6 + 0x1184cc)
                                              ELF object binary architecture: AMD x86-64

Oct 30 03:50:32 moriel-pc systemd[1]: [email protected]: Deactivated successfully.

@moriel5 , would you be able to attach a picture (taken with your phone, for instance) of any core dump / errors on the screen when the kernel crashes? Thanks!

TraceyC77 avatar Nov 01 '24 19:11 TraceyC77

If I can, I'll try doing so.

But the one time I saw anything, it was either one or two (I forget which, but no more than two) lines, which only mentioned the fact that AMDGPU crashed and took the kernel with it, without any more information.

moriel5 avatar Nov 03 '24 15:11 moriel5

Thanks. Exact errors are always more useful than summaries. Things that might look unimportant sometimes are helpful to us.

TraceyC77 avatar Nov 03 '24 19:11 TraceyC77

As is always the case.

moriel5 avatar Nov 04 '24 19:11 moriel5

@moriel5 Have you seen this crash recently? I'm hoping one of the newer kernels has resolved the issue. Thanks.

TraceyC77 avatar Mar 14 '25 16:03 TraceyC77

I may have still been seeing this crash as late as last week (have not yet had time to sync this week since I was not at home since Thursday, however I intend to sync in a few hours), however since I was unable to procure logs, there is no proof in either direction, only that asus-wmi is no longer conflicting with acpi in regards with S0ix reporting, so I am back to trying to figure out what is going on with S3.

moriel5 avatar Mar 17 '25 16:03 moriel5