diskcorruption on 6.12.y
Describe the bug
Revision 521f2baed818c04981fd61b275c996a8ef03b833 of branch rpi-6.12.y gives massive disk-corruption when realtime-kernel is enabled.
Steps to reproduce the behaviour
Configure kernel for real-time.
Device (s)
Raspberry Pi Zero 2 W, Raspberry Pi 3 Mod. B
System
raspbian 64bit lite
Logs
Not directly available. If this problem is of interest, I can see if I can reproduce it. Errors I saw were related to directory-nodes being full.
Additional context
No response
Realtime support is new to 6.12:
- Have you add a non-corrupting 6.12 realtime kernel before the indicated commit?
- Does a non-realtime build of 6.12 work for you?
- Have you add a non-corrupting 6.11 realtime (with patches) kernel?
- Have you tried realtime kernels before?
- On which medium are you seeing corruptions - SD card or some external storage?
- no, it was the first I tried
- seems to fail as well. see log below.
- no
- no. the RT patches were only merged in 6.12 I believe.
- sd cards (I tried 3 different cards, all 32 GB in size)
Both the realtime and non-realtime 6.12 kernel fail:
[ 103.850530] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm apt: No space for directory leaf checksum. Please run e2fsck -D.
[ 103.850566] EXT4-fs error (device mmcblk0p2): htree_dirblock_to_tree:1083: inode #55844: block 2: comm apt: Directory block failed checksum
[ 103.850777] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm apt: No space for directory leaf checksum. Please run e2fsck -D.
[ 103.850791] EXT4-fs error (device mmcblk0p2): htree_dirblock_to_tree:1083: inode #55844: block 2: comm apt: Directory block failed checksum
[ 104.173859] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm apt: No space for directory leaf checksum. Please run e2fsck -D.
[ 104.173892] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm apt: Directory block failed checksum
[ 104.174135] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm apt: No space for directory leaf checksum. Please run e2fsck -D.
[ 104.174149] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm apt: Directory block failed checksum
[ 104.236603] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm apt: No space for directory leaf checksum. Please run e2fsck -D.
[ 104.236680] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm apt: Directory block failed checksum
[ 104.237061] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm apt: No space for directory leaf checksum. Please run e2fsck -D.
[ 104.237075] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm apt: Directory block failed checksum
[ 104.303747] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm http: No space for directory leaf checksum. Please run e2fsck -D.
[ 104.303791] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm http: Directory block failed checksum
[ 104.309206] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm http: No space for directory leaf checksum. Please run e2fsck -D.
[ 104.309240] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm http: Directory block failed checksum
[ 104.316966] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm http: No space for directory leaf checksum. Please run e2fsck -D.
[ 104.317007] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm http: Directory block failed checksum
[ 104.323162] EXT4-fs warning (device mmcblk0p2): ext4_dirblock_csum_verify:406: inode #55844: comm http: No space for directory leaf checksum. Please run e2fsck -D.
[ 104.323196] EXT4-fs error (device mmcblk0p2): ext4_dx_find_entry:1753: inode #55844: block 2: comm http: Directory block failed checksum
Both the realtime and non-realtime 6.12 kernel fail:
I think you'll need to back off to a known good point, confirm that is reliable then just update the kernel.
Start with a clean install of RPiOS Bookworm 64-bit lite.
Update it (sudo apt update && sudo apt full-upgrade).
Confirm sdcard is reliable with no complaints in dmesg when running your normal workloads.
Now update to our build of the 6.12 kernel.
sudo rpi-update next
reboot and report if any sdcard corruption issues.
That build of the kernel works fine. What would be the procedure for getting a kernel with CONFIG_PREEMPT_RT instead?
You said
Both the realtime and non-realtime 6.12 kernel fail:
So I think the first issue to resolve is if the default 6.12 (non-realtime) kernel has an issue with corrupting the sdcard. Once we have an answer to that, we can consider enabling RT. But you don't want to be changing too many things at once.
You said
Both the realtime and non-realtime 6.12 kernel fail:
So I think the first issue to resolve is if the default 6.12 (non-realtime) kernel has an issue with corrupting the sdcard. Once we have an answer to that, we can consider enabling RT. But you don't want to be changing too many things at once.
Yes, but not the one from rpi-update next.
I did a diff of the rpi-update next-version of .config and the one I came up with and see quite a few possible reasons for my version to fail.
So 'm going to try only changing the scheduling parameters in the rpi-update next-version and see if that helps.
@folkertvanheusden were you able to reproduce the errors with a more recent 6.12 version? Haven't seen such errors on PREEMPT RT enabled 6.12 builds yet.
Tested our tree (with some vendor patches, mostly targeting dt for our hardware platform) on CM5 with eMMC, pi500 with the stock rpi sd card and rpi2w. So far none of these has reported any ext4 warnings - even under high (io) load tests. Are there any other warnings / errors in kernel log? There are still some quirks with rt on rpi (dwc_otg for example needs patching or switch to dwc2)
Hi,
I haven't tried it further, maybe later.
On Fri, Jan 3, 2025 at 11:14 AM Nicolai @.***> wrote:
@folkertvanheusden https://github.com/folkertvanheusden were you able to reproduce the errors with a more recent 6.12 version? Haven't seen such errors on PREEMPT RT enabled 6.12 builds yet.
Tested our tree (with some vendor patches, mostly targeting dt for our hardware platform) on CM5 with eMMC, pi500 with the stock rpi sd card and rpi2w. So far none of these has reported any ext4 warnings - even under high (io) load tests. Are there any other warnings / errors in kernel log? There are still some quirks with rt on rpi (dwc_otg for example needs patching or switch to dwc2)
— Reply to this email directly, view it on GitHub https://github.com/raspberrypi/linux/issues/6492#issuecomment-2568989673, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUN5IW76IV5EMCBLL3MVPOD2IZPI7AVCNFSM6AAAAABSO5A44OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRYHE4DSNRXGM . You are receiving this because you were mentioned.Message ID: @.***>
Hi, I also had disc corruption issues, which occasionally also disappeared after reboots. While having an open console, I had serveral times some bus errors, which I started to examine. Those occured sporadically, without any logical reason.
I deactivated all overclocking and the errors disappeared. Now all the Pi Zero 2W raspis work as expexted. The 6.12.x kernels seem to be spiky with overclocking.
Hope this helps someone :)