[BUG]: Page cache usage causes post-write sync to fail
What happened?
This bug is specific to the Linux variant of Imager. When writing an image to a storage device that is also a target for a device-specific synchronisation operation, the sync op can time out with various bad effects. A prerequisite is that the destination storage has slower write speed than the read/decompress operation speed (typical for most target SD cards).
Here are two reproducers:
- Pi 5 8GB running Pi OS booted from USB and writing an image to SD
Insert a blank SD class A1 card in the SD slot and use imager in CLI mode to write it.
While writing, the buffers/page cache usage reported will climb to use the vast majority of the free RAM. At or near the 100% step in the progress bar, the sync op is issued, takes more than 2 minutes to complete, and this causes a splat in dmesg:
[ 726.451032] INFO: task kworker/1:0:2147 blocked for more than 120 seconds.
[ 726.451043] Not tainted 6.12.47-v8-16k+ #635
[ 726.451046] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 726.451048] task:kworker/1:0 state:D stack:0 pid:2147 tgid:2147 ppid:2 flags:0x00000008
[ 726.451056] Workqueue: events_freezable mmc_rescan
[ 726.451067] Call trace:
[ 726.451069] __switch_to+0xf0/0x160
[ 726.451076] __schedule+0x330/0xb68
[ 726.451080] schedule+0x3c/0x148
[ 726.451084] __mmc_claim_host+0xbc/0x1f0
[ 726.451088] mmc_get_card+0x3c/0x58
[ 726.451093] mmc_sd_detect+0x28/0xa0
[ 726.451097] mmc_rescan+0x94/0x330
[ 726.451101] process_one_work+0x15c/0x3c0
[ 726.451107] worker_thread+0x2e4/0x3f0
[ 726.451111] kthread+0x120/0x130
[ 726.451115] ret_from_fork+0x10/0x20
[ 811.206493] mmcblk0: p1 p2
[ 811.282818] mmcblk0: p1 p2
The process eventually succeeds (the final write to the partition table causes a re-enumeration).
- Writing to a Pi exposing storage via USB mass-storage gadget
A different variation of this is seen with a Pi 4/5 4GB running Pi OS, and writing to a Pi 4/5 exposing its SD card via mass-storage gadget. The dirty page counts on the gadget Pi climb to a significant fraction of total RAM. In this case, the sync op on the mass-storage interface times out and causes Linux to do a device reset, which is badly handled by the gadget.
Imager should have some notion of synchronously writing to/checkpointing writes to the underlying block device in both cases - avoiding buffer bloat which also causes the progress bar to be quite inaccurate.
Version
1.9.6 (Default)
What host operating system were you using?
Debian and derivatives (eg Ubuntu)
Host OS Version
Raspberry Pi OS bookworm
Selected OS
Raspberry Pi OS bookworm
Which Raspberry Pi Device are you using?
Raspberry Pi 5, 500, and Compute Modules 5
What kind of storage device are you using?
Other
OS Customisation
- [ ] Yes, I was using OS Customisation when the bug occurred.
Relevant log output
Bug report accepted, scheduled for 2.0.
2.0 will introduce an adaptive pending-write window. Devices with more RAM that may just be suffering bus contention will get a 256MiB write window or 7 second hard limit, before an enforced sync. Devices with less RAM will get a 16MiB write window, or a 3 second hard limit, before an enforced sync.
This mechanism will be applied across Windows, macOS and Linux for consistency - though we've only observed this on Linux, it's certainly plausible on the other OSes.
This appears to be resolved as of 2.0 rc7 (not using the exact failing flash storage as last time, but a device that's similarly slow to write)
Thanks for the confirmation, @P33M. Closing as fixed, with 2.x-series releases.