Bcachefs crash (Oops) in LZ4HC_compress_generic+0x3b3/0x1b90
🥝Kernel: 6.10.0 Hello, developers. Ive just decided to enable compression on a production filesystem because it was critically running out of space. I did it by:
echo "lz4:15" > /sys/fs/bcachefs/MY_UID/options/compression
The filesystem was under heavy use at that moment.
bch-rebalance thread became 100% CPU ,I think it means that compression has started.
But after less than a minute I got a crash of a kworker process in a workqueue.
[509621.528103] [T10653] BUG: unable to handle page fault for address: ffffc9002a03d000
[509621.528109] [T10653] #PF: supervisor write access in kernel mode
[509621.528110] [T10653] #PF: error_code(0x0002) - not-present page
[509621.528111] [T10653] PGD 100000067 P4D 100000067 PUD 1002f2067 PMD 9b9af8067 PTE 0
[509621.528116] [T10653] Oops: Oops: 0002 [#1] PREEMPT_RT SMP
[509621.528118] [T10653] CPU: 7 PID: 10653 Comm: kworker/u80:4 Kdump: loaded Tainted: G W OE 6.10.0-rc6-blahaj-rt+ #14
[509621.528121] [T10653] Hardware name: OEM X79G/X79G, BIOS 4.6.5 08/02/2022
[509621.528122] [T10653] Workqueue: writeback wb_workfn (flush-bcachefs-2)
[509621.528127] [T10653] RIP: 0010:LZ4HC_compress_generic+0x3b3/0x1b90
[509621.528131] [T10653] Code: ea ff 00 00 00 48 83 c0 01 c6 40 ff ff 81 fa fe 00 00 00 7f ea 88 10 48 83 c0 01 48 8d 14 30 49 8b 0b 48 83 c0 08 49 83 c3 08 <48> 89 48 f8 48 39 d0 72 ec 48 8b 9d 78 ff ff ff 48 8b 45 d0 44 8b
[509621.528133] [T10653] RSP: 0018:ffffc9002a4cb430 EFLAGS: 00010296
[509621.528135] [T10653] RAX: ffffc9002a03d001 RBX: ffffc9002a02d000 RCX: 2704138e8569e88d
[509621.528136] [T10653] RDX: ffffc9002a03cffa RSI: 000000000000fef9 RDI: ffffc90029f7cefd
[509621.528137] [T10653] RBP: ffffc9002a4cb540 R08: 0000000000000004 R09: 0000000000000001
[509621.528138] [T10653] R10: 0000000000000002 R11: ffffc90029f7cf00 R12: ffff888211da0000
[509621.528139] [T10653] R13: ffffc9002a02d000 R14: 0000000000000000 R15: 0000000000010000
[509621.528140] [T10653] FS: 0000000000000000(0000) GS:ffff888fff9c0000(0000) knlGS:0000000000000000
[509621.528141] [T10653] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[509621.528143] [T10653] CR2: ffffc9002a03d000 CR3: 00000007e58b7003 CR4: 00000000001706f0
[509621.528144] [T10653] Call Trace:
[509621.528146] [T10653] <TASK>
[509621.528147] [T10653] ? show_regs.part.0+0x22/0x24
[509621.528152] [T10653] ? __die_body.cold+0x8/0x1c
[509621.528155] [T10653] ? __die+0x2e/0x40
[509621.528157] [T10653] ? page_fault_oops+0x102/0x270
[509621.528159] [T10653] ? search_bpf_extables+0xdb/0xf0
[509621.528161] [T10653] ? search_exception_tables+0x60/0x70
[509621.528164] [T10653] ? fixup_exception+0x32/0x450
[509621.528166] [T10653] ? kernelmode_fixup_or_oops.isra.0+0x45/0x50
[509621.528167] [T10653] ? bad_area_nosemaphore+0x119/0x160
[509621.528169] [T10653] ? do_kern_addr_fault+0x7b/0x90
[509621.528171] [T10653] ? exc_page_fault+0x194/0x1a0
[509621.528175] [T10653] ? asm_exc_page_fault+0x2b/0x30
[509621.528179] [T10653] ? LZ4HC_compress_generic+0x3b3/0x1b90
[509621.528183] [T10653] LZ4_compress_HC+0x94/0xb0
[509621.528185] [T10653] attempt_compress+0x205/0x270
[509621.528190] [T10653] ? kvmalloc_node_noprof+0x3f/0xe0
[509621.528193] [T10653] ? mempool_kvmalloc+0x1e/0x20
[509621.528196] [T10653] ? mempool_alloc_noprof+0x45/0x140
[509621.528199] [T10653] bch2_bio_compress+0x21a/0x5a0
[509621.528201] [T10653] __bch2_write+0x1791/0x1ba0
[509621.528205] [T10653] bch2_write+0x19e/0x440
[509621.528208] [T10653] ? bch2_write+0x19e/0x440
[509621.528211] [T10653] __bch2_writepage+0x452/0x910
[509621.528213] [T10653] ? bchfs_read.isra.0+0xd00/0xd00
[509621.528214] [T10653] write_cache_pages+0x54/0x90
[509621.528218] [T10653] bch2_writepages+0x59/0xe0
[509621.528219] [T10653] ? rt_spin_unlock+0x1b/0x50
[509621.528222] [T10653] do_writepages+0xd1/0x230
[509621.528223] [T10653] ? psi_group_change+0x1c5/0x410
[509621.528227] [T10653] __writeback_single_inode+0x44/0x340
[509621.528229] [T10653] writeback_sb_inodes+0x222/0x5d0
[509621.528233] [T10653] __writeback_inodes_wb+0x54/0x160
[509621.528234] [T10653] wb_writeback+0x1ff/0x370
[509621.528236] [T10653] ? get_nr_inodes+0x56/0xa0
[509621.528239] [T10653] wb_workfn+0x1e6/0x480
[509621.528241] [T10653] ? finish_task_switch.isra.0+0x8f/0x2b0
[509621.528244] [T10653] process_one_work+0x188/0x360
[509621.528247] [T10653] worker_thread+0x350/0x4e0
[509621.528249] [T10653] ? max_active_store+0xe0/0xe0
[509621.528252] [T10653] kthread+0xe5/0x110
[509621.528254] [T10653] ? kthread_park+0x90/0x90
[509621.528256] [T10653] ret_from_fork+0x3a/0x60
[509621.528258] [T10653] ? kthread_park+0x90/0x90
[509621.528260] [T10653] ret_from_fork_asm+0x11/0x20
[509621.528263] [T10653] </TASK>
[509621.528264] [T10653] Modules linked in: quota_v2 nls_iso8859_1 tls 88x2bu(OE) option usb_wwan usbserial cdc_acm netlink_diag yellowfb(OE) whitefb(OE) redfb(OE) purplefb(OE) pinkfb(OE) peachfb(OE) orangefb(OE) magentafb(OE) indigofb(OE) greenfb(OE) cyanfb(OE) brownfb(OE) bluefb(OE) blackfb(OE) missmikatools(OE) mikakernelm cfg80211 8021q garp mrp stp llc nft_chain_nat xt_MASQUERADE xt_nat nf_nat nft_limit ipt_REJECT nf_reject_ipv4 xt_recent xt_limit xt_pkttype xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype xt_tcpudp nft_compat nf_tables x_tables nfnetlink intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel uvcvideo cdc_ether usbnet snd_hda_codec_realtek snd_hda_codec_generic videobuf2_vmalloc kvm snd_usb_audio snd_hda_codec_hdmi snd_hda_scodec_component uvc snd_hda_intel videobuf2_memops snd_intel_dspcfg videobuf2_v4l2 crct10dif_pclmul snd_intel_sdw_acpi videodev crc32_pclmul snd_hda_codec ghash_clmulni_intel sha512_ssse3 sha256_ssse3 iTCO_wdt intel_pmc_bxt
[509621.528306] [T10653] missmikafs sha1_ssse3 iTCO_vendor_support aesni_intel snd_hda_core crypto_simd cryptd snd_usbmidi_lib rapl snd_rawmidi psmouse videobuf2_common r8152 intel_cstate snd_seq_device snd_hwdep mc snd_pcm r8169 mii pcspkr serio_raw snd_timer snd i2c_i801 ioatdma realtek i2c_mux lpc_ich i2c_smbus dca soundcore mac_hid
[509621.528321] [T10653] CR2: ffffc9002a03d000
[509621.528323] [T10653] ---[ end trace 0000000000000000 ]---
[509621.528410] [T10653] pstore: backend (erst) writing error (-28)
[509621.528410] [T10653] RIP: 0010:LZ4HC_compress_generic+0x3b3/0x1b90
[509621.528410] [T10653] Code: ea ff 00 00 00 48 83 c0 01 c6 40 ff ff 81 fa fe 00 00 00 7f ea 88 10 48 83 c0 01 48 8d 14 30 49 8b 0b 48 83 c0 08 49 83 c3 08 <48> 89 48 f8 48 39 d0 72 ec 48 8b 9d 78 ff ff ff 48 8b 45 d0 44 8b
[509621.528410] [T10653] RSP: 0018:ffffc9002a4cb430 EFLAGS: 00010296
[509621.528410] [T10653] RAX: ffffc9002a03d001 RBX: ffffc9002a02d000 RCX: 2704138e8569e88d
[509621.528410] [T10653] RDX: ffffc9002a03cffa RSI: 000000000000fef9 RDI: ffffc90029f7cefd
[509621.528410] [T10653] RBP: ffffc9002a4cb540 R08: 0000000000000004 R09: 0000000000000001
[509621.528410] [T10653] R10: 0000000000000002 R11: ffffc90029f7cf00 R12: ffff888211da0000
[509621.528410] [T10653] R13: ffffc9002a02d000 R14: 0000000000000000 R15: 0000000000010000
[509621.528410] [T10653] FS: 0000000000000000(0000) GS:ffff888fff9c0000(0000) knlGS:0000000000000000
[509621.528410] [T10653] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[509621.528410] [T10653] CR2: ffffc9002a03d000 CR3: 00000007e58b7003 CR4: 00000000001706f0
[509621.528410] [T10653] note: kworker/u80:4[10653] exited with irqs disabled
I believe that last your function was attempt_compress+0x205/0x270. Then crash in the lz4 module/code.
I don't know if it is your bug, or an issue with configuration. My configuration is this:
Device: (unknown device)
External UUID: <uuid>
Internal UUID: <uuid>
Magic number: <uuid>
Device index: 0
Label: <name>
Version: 1.7: mi_btree_bitmap
Version upgrade complete: 1.7: mi_btree_bitmap
Oldest version on disk: 1.7: mi_btree_bitmap
Created: <date>
Sequence number: 34
Time of last write: Sat Sep 28 12:12:12 2024
Superblock size: 4.66 KiB/1.00 MiB
Clean: 0
Devices: 1
Sections: members_v1,replicas_v0,quota,clean,journal_v2,counters,members_v2,errors,ext,downgrade
Features: lz4,zstd,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features: alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
Options:
block_size: 4.00 KiB
btree_node_size: 256 KiB
errors: continue fix_safe panic [ro]
metadata_replicas: 1
data_replicas: 1
metadata_replicas_required: 1
data_replicas_required: 1
encoded_extent_max: 64.0 KiB
metadata_checksum: none [crc32c] crc64 xxhash
data_checksum: none [crc32c] crc64 xxhash
compression: lz4:15
background_compression: none
str_hash: [crc32c] crc64 siphash
metadata_target: none
foreground_target: none
background_target: none
promote_target: none
erasure_code: 0
inodes_32bit: 1
shard_inode_numbers: 1
inodes_use_key_cache: 1
gc_reserve_percent: 8
gc_reserve_bytes: 0 B
root_reserve_percent: 0
wide_macs: 0
acl: 1
usrquota: 1
grpquota: 1
prjquota: 1
journal_flush_delay: 1000
journal_flush_disabled: 0
journal_reclaim_delay: 100
journal_transaction_names: 1
version_upgrade: [compatible] incompatible none
nocow: 0
members_v2 (size 160):
Device: 0
Label: (none)
UUID: <uuid>
Size: 119 GiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 256 KiB
First bucket: 0
Buckets: 488386
Last mount: Sun Sep 22 14:53:58 2024
Last superblock write: 34
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user
Btree allocated bitmap blocksize: 4.00 MiB
Btree allocated bitmap: 0000000000000000000000000000000100000000000000000000000100011011
Durability: 1
Discard: 1
Freespace initialized: 1
errors (size 8):
Thank you very much!
The filesystem is still up and it accepts reads writes. bch-rebalance is also still running at 100%. I don't know if a reboot is required anytime soon. Just the manual sync() calls don't return.
UPD: bch-rebalance process has now just crashed in the same function (LZ4HC_compress_generic).
I decided to find out about this function in source code
# scripts/faddr2line ./vmlinux LZ4HC_compress_generic+0x3b3/0x1b90
LZ4HC_compress_generic+0x3b3/0x1b90:
LZ4_copy8 at lib/lz4/lz4defs.h:158
(inlined by) LZ4_wildCopy at lib/lz4/lz4defs.h:180
(inlined by) LZ4HC_encodeSequence at lib/lz4/lz4hc_compress.c:296 (inlined by) LZ4HC_compress_generic at lib/lz4/lz4hc_compress.c:402
Also for attempt_compress:
attempt_compress+0x205/0x270:
attempt_compress at fs/bcachefs/compress.c:323 (discriminator 2)
I find you make call to the lz4 function here:
int ret = LZ4_compress_HC(
src, dst, src_len, dst_len,
compression.level,workspace
);
And that function crashes on this code:
static FORCE_INLINE void LZ4_copy8(void *dst, const void *src) {
#if LZ4_ARCH64 // this was True during kernel build
U64 a = get_unaligned((const U64 *)src);
/* THE CRASH IS HERE -> */ put_unaligned(a, (U64 *)dst);
#else
U32 a = get_unaligned((const U32 *)src);
U32 b = get_unaligned((const U32 *)src + 1);
...
Respectively this is put_unaligned which is a macro in include/asm-generic/unaligned.h:
#define put_unaligned(val, ptr) __put_unaligned_t(typeof(*(ptr)), (val), (ptr))
Respectively, macro "__put_unaligned_t" is finally defined in the same file:
#define __put_unaligned_t(type, val, ptr) do { \
struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr); \
__pptr->x = (val); \
} while (0)
Here is the exact crashing line:
__pptr->x = (val);
I wasn't able to understand myself why this action randomly faults. As I observed, the base chance of the fault is very low. But when the compression is happening at 100% CPU, eventually after some time it happens...
Interesting detail: I still haven't rebooted because the filesystem continues to work! Despite that it's bch-rebalance kthread is crashed. fsync() calls return. This is hilarious. Surprising and reliable design, thanks. It made my day.
@koverstreet not fixed