linux
linux copied to clipboard
SDHCI leaks DMA buffers leading to a full swiotlb buffer
Describe the bug
When recovering data off poorly SD cards that have extremely long read access cycle times, eventually I get this splat:
[689174.139067] mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
[689174.139562] swiotlb_tbl_map_single: 53 callbacks suppressed
[689174.139575] sdhci-iproc fe340000.mmc: swiotlb buffer is full (sz: 32768 bytes), total 32768 (slots), used 30824 (slots)
[689174.139591] ------------[ cut here ]------------
[689174.139598] WARNING: CPU: 0 PID: 1804 at drivers/mmc/host/sdhci.c:1154 sdhci_send_command+0xacc/0xf18
[689174.139625] Modules linked in: cdc_acm dm_mod 8021q garp stp llc snd_soc_hdmi_codec sg brcmfmac brcmutil v3d gpu_sched cfg80211 raspberrypi_hwmon rfkill i2c_brcmstb bcm2835_codec(C) vc4 bcm2835_v4l2(C) bcm2835_isp(C) bcm2835_mmal_vchiq(C) v4l2_mem2mem cec videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops vc_sm_cma(C) drm_kms_helper videobuf2_v4l2 videobuf2_common videodev mc snd_bcm2835(C) snd_soc_core snd_compress snd_pcm_dmaengine rpivid_mem snd_pcm snd_timer snd syscopyarea sysfillrect sysimgblt fb_sys_fops uio_pdrv_genirq uio nvmem_rmem i2c_dev fuse drm drm_panel_orientation_quirks backlight ip_tables x_tables ipv6
[689174.139940] CPU: 0 PID: 1804 Comm: kworker/0:2H Tainted: G WC 5.15.32-v7l+ #1538
[689174.139954] Hardware name: BCM2711
[689174.139962] Workqueue: kblockd blk_mq_run_work_fn
[689174.139980] Backtrace:
[689174.139991] [<c0bcaf20>] (dump_backtrace) from [<c0bcb16c>] (show_stack+0x20/0x24)
[689174.140019] r7:00000482 r6:c0e3cf44 r5:00000080 r4:60000093
[689174.140026] [<c0bcb14c>] (show_stack) from [<c0bcf7e8>] (dump_stack_lvl+0x70/0x94)
[689174.140044] [<c0bcf778>] (dump_stack_lvl) from [<c0bcf824>] (dump_stack+0x18/0x1c)
[689174.140062] r7:00000482 r6:00000009 r5:c09e04e4 r4:c0eb45c8
[689174.140068] [<c0bcf80c>] (dump_stack) from [<c0221458>] (__warn+0xfc/0x114)
[689174.140086] [<c022135c>] (__warn) from [<c0bcb82c>] (warn_slowpath_fmt+0x70/0xd8)
[689174.140107] r7:00000482 r6:c0eb45c8 r5:c1205048 r4:00000000
[689174.140114] [<c0bcb7c0>] (warn_slowpath_fmt) from [<c09e04e4>] (sdhci_send_command+0xacc/0xf18)
[689174.140137] r9:d35f377c r8:00000000 r7:c2044d00 r6:00000003 r5:d35f371c r4:c2044b80
[689174.140145] [<c09dfa18>] (sdhci_send_command) from [<c09e096c>] (sdhci_send_command_retry+0x3c/0x12c)
[689174.140167] r10:d35f3600 r9:00000013 r8:00000000 r7:c2044d00 r6:d35f371c r5:0000000b
[689174.140174] r4:c2044b80
[689174.140180] [<c09e0930>] (sdhci_send_command_retry) from [<c09e0ad4>] (sdhci_request+0x78/0xa8)
[689174.140202] r10:d35f3600 r9:00000001 r8:c2044b80 r7:00000013 r6:c2044d00 r5:d35f36a8
[689174.140210] r4:c2044800
[689174.140216] [<c09e0a5c>] (sdhci_request) from [<c09c46d4>] (__mmc_start_request+0x88/0x184)
[689174.140239] r9:c2044800 r8:00000000 r7:c2044800 r6:d35f36a8 r5:d35f36a8 r4:c2044800
[689174.140246] [<c09c464c>] (__mmc_start_request) from [<c09c4864>] (mmc_start_request+0x94/0xbc)
[689174.140266] r7:c2044800 r6:00000000 r5:d35f36a8 r4:c2044800
[689174.140272] [<c09c47d0>] (mmc_start_request) from [<c09d961c>] (mmc_blk_mq_issue_rq+0x388/0x9f4)
[689174.140292] r7:c2044800 r6:ca21c208 r5:d35f36a8 r4:c1205048
[689174.140298] [<c09d9294>] (mmc_blk_mq_issue_rq) from [<c09da090>] (mmc_mq_queue_rq+0x150/0x298)
[689174.140317] r10:c2044800 r9:00000000 r8:c299f000 r7:00000001 r6:ca21c210 r5:d35f3600
[689174.140325] r4:ca21c208
[689174.140331] [<c09d9f40>] (mmc_mq_queue_rq) from [<c072a748>] (blk_mq_dispatch_rq_list+0x150/0x8d0)
[689174.140352] r10:00000000 r9:c57685a0 r8:d35f3600 r7:c3e66c00 r6:c3e66c00 r5:d35f3630
[689174.140360] r4:cdbe7e84
[689174.140366] [<c072a5f8>] (blk_mq_dispatch_rq_list) from [<c07305fc>] (__blk_mq_sched_dispatch_requests+0xd4/0x188)
[689174.140391] r10:efefef05 r9:00000000 r8:ca21c800 r7:c3e66c04 r6:c1205048 r5:cdbe7e84
[689174.140398] r4:c3e66c00
[689174.140405] [<c0730528>] (__blk_mq_sched_dispatch_requests) from [<c07308bc>] (blk_mq_sched_dispatch_requests+0x44/0x6c)
[689174.140429] r8:efefef00 r7:00000020 r6:efef9dc0 r5:00000000 r4:c3e66c00
[689174.140436] [<c0730878>] (blk_mq_sched_dispatch_requests) from [<c07276a4>] (__blk_mq_run_hw_queue+0x60/0xc4)
[689174.140457] r5:00000000 r4:c3e66c00
[689174.140463] [<c0727644>] (__blk_mq_run_hw_queue) from [<c0727730>] (blk_mq_run_work_fn+0x28/0x2c)
[689174.140481] r5:c3fb2600 r4:c3e66c40
[689174.140487] [<c0727708>] (blk_mq_run_work_fn) from [<c023ee2c>] (process_one_work+0x250/0x57c)
[689174.140506] [<c023ebdc>] (process_one_work) from [<c023f1b8>] (worker_thread+0x60/0x5c4)
[689174.140530] r10:efef9dc0 r9:c1203d00 r8:efef9dd8 r7:00000008 r6:efef9dc0 r5:c3fb2618
[689174.140537] r4:c3fb2600
[689174.140543] [<c023f158>] (worker_thread) from [<c02474e8>] (kthread+0x178/0x194)
[689174.140567] r10:cdbe6000 r9:c2b4fe74 r8:00000000 r7:c3fb2600 r6:c023f158 r5:c2768e40
[689174.140574] r4:c5381f40
[689174.140580] [<c0247370>] (kthread) from [<c02000d4>] (ret_from_fork+0x14/0x20)
[689174.140596] Exception stack(0xcdbe7fb0 to 0xcdbe7ff8)
[689174.140610] 7fa0: 00000000 00000000 00000000 00000000
[689174.140635] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[689174.140646] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[689174.140657] r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0247370
[689174.140665] r4:c2768e40
[689174.140672] ---[ end trace b887a2abb18e442d ]---
Note the timestamps. I believe the prerequisite condition for the leak is hitting the spurious interrupt error message (which had been happening every ~10 seconds previously).
Given the length of time it takes for the leak to fill the bounce buffer entirely, I would guess it's related to the no-cd card presence polling mechanism as requesting the csd returns data.
Steps to reproduce the behaviour
Use a broken SD card such that you get the bad interrupt message, and read from the card continuously for about 150 hours.
Device (s)
Raspberry Pi 4 Mod. B
System
Raspberry Pi reference 2022-01-29
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, f01430c9d8f67a4b9719cc00e74a2079d3834d5d, stage4
Mar 24 2022 13:19:26
Copyright (c) 2012 Broadcom
version e5a963efa66a1974127860b42e913d2374139ff5 (clean) (release) (start)
Linux raspberrypi 5.15.32-v7l+ #1538 SMP Thu Mar 31 19:39:41 BST 2022 armv7l GNU/Linux
Logs
No response
Additional context
No response
You should be able to detect the failure much earlier than this - /sys/kernel/debug/swiotlb/io_tlb_used gives a runtime indication of the swiotlb usage, which on a healthy idle system should be ~0.
This is on the list for next week.
I tried playing with kmemleak today and it gave me nothing relating to mmc under the same conditions - other than some (what I assume to be spurious) warnings about setting up regmaps in v3d at boot time. Board has been left dd'ing from the dodgy SD card just in case.
It entered the same failure mode over the weekend.
root@raspberrypi:/home/pi# cat /sys/kernel/debug/swiotlb/io_tlb_used
32762
root@raspberrypi:/home/pi# cat /sys/kernel/debug/swiotlb/io_tlb_nslabs
32768
No results in kmemleak. The leak actually takes around 20k seconds of runtime to exhaust the SWIOTLB and what looks like ~2048 instances of the data interrupt error.
It goes a lot faster - within a minute - if I apply this patch -
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index c835074435f1..140eab63708c 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2247,7 +2247,7 @@ void mmc_rescan(struct work_struct *work)
out:
if (host->caps & MMC_CAP_NEEDS_POLL)
- mmc_schedule_delayed_work(&host->detect, HZ);
+ mmc_schedule_delayed_work(&host->detect, HZ / 8);
}
void mmc_start_host(struct mmc_host *host)
and set swiotlb=8 in /boot/cmdline.txt. I see the usage incrementing in blocks of 16, but sometimes it increments by 1. Removing the card doesn't cause the count to reset to 0.
The problem appears to be associated with a data timeout. If I make the sdhci driver a lot more chatty I see this:
[ 90.058846] IRQ status 0x00000002
[ 90.059442] IRQ status 0x00000001
[ 90.066680] IRQ status 0x00000002
[ 90.067287] IRQ status 0x00000001
[ 90.074507] IRQ status 0x00000002
[ 90.075098] IRQ status 0x00000001
[ 98.269467] IRQ status 0x00108000
[ 98.269570] IRQ status 0x00000003
[ 98.269584] mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
[ 98.269598] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 98.269609] mmc0: sdhci: Sys addr: 0x00000080 | Version: 0x00001002
[ 98.269623] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000040
[ 98.269635] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000033
[ 98.269646] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x00000017
[ 98.269658] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080
[ 98.269670] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000107
[ 98.269681] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000
[ 98.269693] mmc0: sdhci: Int enab: 0x03ff100b | Sig enab: 0x03ff100b
[ 98.269704] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[ 98.269716] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525
[ 98.269727] mmc0: sdhci: Cmd: 0x00000c1a | Max curr: 0x00080008
[ 98.269739] mmc0: sdhci: Resp[0]: 0x00000b00 | Resp[1]: 0x301083f4
[ 98.269750] mmc0: sdhci: Resp[2]: 0x30303030 | Resp[3]: 0x00000900
[ 98.269762] mmc0: sdhci: Host ctl2: 0x0000808c
[ 98.269773] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0xdad30240
[ 98.269784] mmc0: sdhci: ============================================
[ 98.269928] IRQ status 0x00000001
[ 98.270260] IRQ status 0x00000001
[ 98.270575] IRQ status 0x00000001
[ 106.462467] IRQ status 0x00108000
[ 106.462571] IRQ status 0x00000003
[ 106.462585] mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
[ 106.462599] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 106.462610] mmc0: sdhci: Sys addr: 0x00000080 | Version: 0x00001002
[ 106.462622] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000080
[ 106.462634] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000033
[ 106.462646] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x00000017
[ 106.462659] mmc0: sdhci: Power: 0x0000000f | Blk gap: 0x00000080
[ 106.462670] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000107
[ 106.462682] mmc0: sdhci: Timeout: 0x00000000 | Int stat: 0x00000000
[ 106.462693] mmc0: sdhci: Int enab: 0x03ff100b | Sig enab: 0x03ff100b
[ 106.462705] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000000
[ 106.462717] mmc0: sdhci: Caps: 0x45ee6432 | Caps_1: 0x0000a525
[ 106.462729] mmc0: sdhci: Cmd: 0x00000c1a | Max curr: 0x00080008
[ 106.462741] mmc0: sdhci: Resp[0]: 0x00000b00 | Resp[1]: 0x301083f4
[ 106.462752] mmc0: sdhci: Resp[2]: 0x30303030 | Resp[3]: 0x00000900
[ 106.462764] mmc0: sdhci: Host ctl2: 0x0000808c
[ 106.462774] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0xdad30200
[ 106.462785] mmc0: sdhci: ============================================
[ 106.464992] IRQ status 0x00000001
A sequence of reads are being performed so tuples of interrupts for cmd(read) complete/data complete are being received. On a dodgy read, the data timeout error bit is asserted which starts an SD controller reset cycle. Note the timestamps - usually an 8-10ms gap between the command interrupt and data interrupt but before the timeout there was just a command interrupt.
I believe the timeouts are triggering the leak. Disabling card presence polling had no effect on the appearance of spurious interrupts or the leak.
For anyone else reading this, 0x00000001 is RESPONSE, 0x00000002 is DATA_END, 0x00000003 is both RESPONSE and DATA_END, and 0x00108000 is ERROR and DATA_TIMEOUT.
The trace shows that for commands with a data phase you would expect to see the DATA_END interrupt before the RESPONSE, but in the failure cases there is a long gap (over 8 seconds) then a DATA_TIMEOUT. The handling of the timeout includes clearing the record that there is a data operation in progress, so that when the DATA_END occurs immediately afterwards it comes as a surprise.
For a DMA buffer leak to occur, dma_unmap_sg must not be being called - which should happen normally from sdhci_request_done, or from sdhci_post_req in the case of an error.
When DATA_TIMEOUT is received, the driver calls into sdhci_data_irq, which sets host->data->error to -ETIMEDOUT and calls sdhci_finish_data.
[ To be continued ]
A possible fix for this issue has been available via rpi-update for two months now. Would you consider updating with sudo rpi-update and seeing if you can still reproduce the problem? Note that as rpi-update installs pre-release software there is a small but real chance of instability or even data loss. It's worth making a backup or installing to a fresh card to be on the safe side.
Hello, I'm still getting this error on RPi4 with ArchLinux ARM with linux-rpi kernel (it includes this patch):
Linux rpi-zxczxc 6.1.35-4-rpi-ARCH #1 SMP PREEMPT Sun Jun 25 12:45:58 MDT 2023 aarch64 GNU/Linux
Please post some of the errors you are seeing, and tell us:
- How frequent are they?
- How much RAM does the Pi 4 have?
- What is the Pi doing at the time you see these errors? If it seems to coincide with some particular activity, does avoiding the activity avoid the errors?
- It depends. Sometimes RPi can work for several hours before errors occur, sometimes it doesn't even boot.
- 8GB
- I think it correlates with disk IO activity. It usually occurs when I start bitcoind on this RPi and it downloads the blockchain. My main suspicion is that it's a hardware issue. I've already tried to solder sdcard slot, and it kinda helped but not fully - errors became less frequent but continued to occur. sdcard fully works as I checked it on my laptop. PS: I'll provide dmesg output when I'll be able to boot RPi again from this card.
dmesg output:
[ 61.141007] mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
[ 62.069618] mmc0: tried to HW reset card, got error -110
[ 62.069653] mmcblk0: recovery failed!
[ 62.069782] I/O error, dev mmcblk0, sector 59442936 op 0x0:(READ) flags 0x80700 phys_seg 2 prio class 2
[ 62.071694] I/O error, dev mmcblk0, sector 59443064 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 2
[ 62.071714] mmc0: card 59b4 removed
[ 62.075987] Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
[ 62.076745] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:343: I/O error 10 writing to inode 58197031 starting block 6990592)
[ 62.076771] Buffer I/O error on device mmcblk0p2, logical block 6887936
[ 62.078287] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:343: I/O error 10 writing to inode 58197031 starting block 7945344)
[ 62.078327] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:343: I/O error 10 writing to inode 58327181 starting block 7949033)
[ 62.078337] Buffer I/O error on device mmcblk0p2, logical block 7846377
[ 62.078402] Buffer I/O error on device mmcblk0p2, logical block 7841792
[ 62.078402] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:343: I/O error 10 writing to inode 58196595 starting block 232922143)
[ 62.078410] Buffer I/O error on device mmcblk0p2, logical block 232819487
[ 62.078432] Buffer I/O error on device mmcblk0p2, logical block 7841793
[ 62.078449] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:343: I/O error 10 writing to inode 58196594 starting block 665645)
[ 62.078460] Buffer I/O error on device mmcblk0p2, logical block 562989
[ 62.078461] Buffer I/O error on device mmcblk0p2, logical block 7841794
[ 62.078467] Buffer I/O error on device mmcblk0p2, logical block 7841795
[ 62.078513] Buffer I/O error on device mmcblk0p2, logical block 7841796
[ 62.078534] Buffer I/O error on device mmcblk0p2, logical block 7841797
[ 62.079406] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:343: I/O error 10 writing to inode 58877615 starting block 409619)
[ 62.079438] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:343: I/O error 10 writing to inode 58197026 starting block 665856)
[ 62.079457] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:343: I/O error 10 writing to inode 58197026 starting block 7940375)
[ 62.079469] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:343: I/O error 10 writing to inode 58197026 starting block 7940401)
[ 62.079480] EXT4-fs warning (device mmcblk0p2): ext4_end_bio:343: I/O error 10 writing to inode 58197026 starting block 7940458)
[ 62.083725] Aborting journal on device mmcblk0p2-8.
[ 62.083856] JBD2: I/O error when updating journal superblock for mmcblk0p2-8.
[ 62.085687] EXT4-fs error (device mmcblk0p2): ext4_journal_check_start:83: comm kworker/u8:1: Detected aborted journal
[ 62.085942] EXT4-fs (mmcblk0p2): I/O error while writing superblock
[ 62.085973] EXT4-fs (mmcblk0p2): Remounting filesystem read-only
[ 62.086002] EXT4-fs (mmcblk0p2): ext4_writepages: jbd2_start: 9223372036854775761 pages, ino 58197026; err -30
[ 62.159464] Core dump to |/usr/lib/systemd/systemd-coredump pipe failed
[ 63.257822] mmc0: new ultra high speed DDR50 SDXC card at address 59b4
[ 63.259063] mmcblk0: mmc0:59b4 LX1TB 928 GiB
[ 63.262884] mmcblk0: p1 p2
[ 63.263211] mmcblk0: mmc0:59b4 LX1TB 928 GiB
[ 63.299185] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #8126472: comm systemd: reading directory lblock 0
[ 63.299186] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #58458113: comm systemd: reading directory lblock 0
[ 63.299268] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #58458113: comm systemd: reading directory lblock 0
[ 63.299346] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #58458113: comm systemd: reading directory lblock 0
[ 63.299504] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #262145: comm systemd: reading directory lblock 0
[ 63.299506] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #58458113: comm systemd: reading directory lblock 0
[ 63.299600] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #8126472: comm systemd: reading directory lblock 0
[ 63.299683] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #265447: comm systemd: reading directory lblock 0
[ 63.299721] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #262145: comm systemd: reading directory lblock 0
[ 67.260893] EXT4-fs error: 234 callbacks suppressed
[ 67.260920] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #58195969: comm (chgrp): reading directory lblock 0
[ 67.261218] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #58195969: comm (chgrp): reading directory lblock 0
[ 67.261466] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #58458113: comm (chgrp): reading directory lblock 0
[ 72.508787] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #58195969: comm (chgrp): reading directory lblock 0
[ 72.509098] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #58195969: comm (chgrp): reading directory lblock 0
[ 72.509340] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1678: inode #58458113: comm (chgrp): reading directory lblock 0
I've researched a bit more and I am now pretty sure that this error occurs when the temperature of one of the components is too high. I do not have a thermal scanner, so I can observe only SoC temperature using lm_sensors. RPi works well under the load if I constantly blow air on it (SoC temp is ~35 C). If I turn the fan off temp raises to ~55 C and this error occurs - it takes less than 5 minutes to go from 35 C to 55 C.