linux icon indicating copy to clipboard operation
linux copied to clipboard

System freezing on CM5 eMMC (Ubuntu 24.x, kernel 6.11/6.14)

Open yeseuleee opened this issue 1 month ago • 0 comments

Describe the bug

Hello, I'm experiencing a severe stability issue on Raspberry Pi CM5 (eMMC) when running Ubuntu 24.x. After approximately 3.5 days of continuous uptime, the system becomes completely unresponsive (freezes) while ping still works.

Environment

  • Board: Raspberry Pi Compute Module 5 (eMMC model)
  • OS: Ubuntu 24.x
  • Kernel versions tested:
    • 6.11.0-1009-raspi
    • 6.14.0-1012-raspi
  • (Issue occurs on both versions)
  • Storage: CM5 onboard eMMC
  • Uptime before issue: typically 3.5 days or more

Symptoms when the issue occurs

  • SSH is unreachable
  • Ping works normally
  • Network communication becomes extremely slow or stops entirely
  • File-based DB connections fail
  • Sometimes logs are written, other times nothing is logged at all

The system requires a hard reboot to recover.

Observed logs

1. Filesystem suddenly turns read-only

In many cases, just before the freeze, I see multiple logs like:

fallocate[323]: fallocate: cannot open /swapfile: Read-only file system

Once this message appears, the freeze almost always follows shortly after.

2. USB errors sometimes appear before the freeze

These USB errors often appear before the read-only filesystem message:

usb 2-1.4.2: device descriptor read/64, error -71
usb 2-1.4.2: new high-speed USB device number 37 using xhci-hcd
usb 2-1.4-port2: attempt power cycle

I have also seen occasional disconnect or current-related USB errors. However, the USB devices use a separate, stable power supply, so I do not believe this is a power issue.

3. Sometimes there are no logs at all

In several cases:

  • no syslog entries
  • no dmesg updates
  • system is frozen but not remounted read-only

What I have verified

  • No CPU spikes
  • Memory usage is stable
  • Disk usage is fine
  • No network congestion
  • System temperature is within normal range
  • eMMC I/O load is not heavy

Nothing obvious seems to lead to the freeze.

Possible root cause: eMMC CQE deadlock?

Based on my research, I found discussions mentioning CQE (Command Queue Engine) deadlock issues on certain Raspberry Pi eMMC configurations.

My questions:

  1. Is there a known CQE-related freeze/deadlock issue for CM5 eMMC in these kernel versions?
  2. If so, has this been addressed in a newer kernel or firmware update?
  3. Some users suggest disabling CQE, but is there an official or recommended workaround other than disabling CQE entirely?
  4. Is long-uptime instability with eMMC + CQE a known issue on CM5?

This system must operate 24/7, so long-term stability is critical.

If additional logs or traces are needed, I can provide them. Thank you very much for your help. Let me know what further information I can collect to help diagnose this issue.

Steps to reproduce the behaviour

  1. Install Ubuntu 24.04 (or later) on Raspberry Pi CM5 eMMC.
  2. Use kernel versions such as 6.11.0-1009-raspi or 6.14.0-1012-raspi (the issue occurs on both).
  3. Run normal workloads (logging, DB file access, USB devices attached, light-to-moderate I/O). No heavy stress is required.
  4. Let the system run continuously for 3.5 days or longer.
  5. After ~3.5 days of uptime, the system gradually becomes unstable:
    • Network slows down severely or stalls.
    • SSH stops responding.
    • File operations start failing.
  6. Eventually, the system freezes completely while ping still replies.
  7. In some cases, "Read-only file system" messages or USB errors appear shortly before the freeze; in other cases, no logs are produced.

Device (s)

Raspberry Pi CM5

System

  • Raspberry Pi Compute Module 5 (eMMC 4G/8G)
  • OS: Ubuntu 24.04/24.10 LTS (non-Raspberry Pi OS)
  • Kernel: 6.11.0-1009-raspi or 6.14.0-1012-raspi (issue present in both)
  • Firmware: N/A on Ubuntu images
  • Uptime before failure: typically ~3.5 days This system must operate continuously (24/7), so resolving long-term stability issues is essential.

Logs

Common logs before freeze: fallocate: cannot open /swapfile: Read-only file system

and sometimes, Below are the logs captured shortly before the system freeze.

These logs show multiple kernel "hung task" events, where essential filesystem-related tasks (jbd2, systemd-journal, application modules, sync) remain blocked for more than 122 seconds. This indicates that the eMMC or EXT4 journaling layer is no longer responding, which aligns with the "Read-only file system" message observed in other freeze events.

Such behaviour suggests a possible deadlock in the block layer, EXT4 journal, or eMMC/CQE command queue path. Once this occurs, all write operations stall and the entire system becomes unresponsive while still answering ping.

Full logs:

----------------------------------------------------------------------
[Sat Nov 29 06:54:29 2025] INFO: task jbd2/mmcblk0p2-:258 blocked for more than 122 seconds.
[Sat Nov 29 06:54:29 2025]         Tainted: G         C E       6.14.0-1012-raspi #12-Ubuntu
[Sat Nov 29 06:54:29 2025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Nov 29 06:54:29 2025] task:jbd2/mmcblk0p2- state:D stack:0      pid:258      tgid:258      ppid:2         task_flags:0x240040 flags:0x00000008
[Sat Nov 29 06:54:29 2025] Call trace:
[Sat Nov 29 06:54:29 2025]  __switch_to+0xe8/0x148 (T)
[Sat Nov 29 06:54:29 2025]  __schedule+0x32c/0x990
[Sat Nov 29 06:54:29 2025]  schedule+0x3c/0x118
[Sat Nov 29 06:54:29 2025]  jbd2_journal_wait_updates+0x70/0xf0
[Sat Nov 29 06:54:29 2025]  jbd2_journal_commit_transaction+0x19c/0x16b0
[Sat Nov 29 06:54:29 2025]  kjournald2+0xc4/0x248
[Sat Nov 29 06:54:29 2025]  kthread+0x110/0x1e0
[Sat Nov 29 06:54:29 2025]  ret_from_fork+0x10/0x20
...
[Sat Nov 29 06:54:29 2025] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
----------------------------------------------------------------------

USB-related errors seen before some freezes:

  usb 2-1.4.2: device descriptor read/64, error -71
  usb 2-1.4-port2: attempt power cycle

Other occurrences:

  • Network becomes extremely slow or stops.
  • SSH becomes unavailable while ping still responds.
  • Sometimes no logs appear at all.

Additional context

No response

yeseuleee avatar Dec 05 '25 03:12 yeseuleee