bcachefs
bcachefs copied to clipboard
btree trans held srcu lock (delaying memory reclaim) by more than 10 seconds [2748744c3734]
This warnings appears since upgrade to 6.6.0 (branch master):
[ 336.210444] ------------[ cut here ]------------
[ 336.210459] btree trans held srcu lock (delaying memory reclaim) by more than 10 seconds
[ 336.210466] WARNING: CPU: 2 PID: 4437 at fs/bcachefs/btree_iter.c:2838 bch2_trans_srcu_unlock (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:2852 (discriminator 1) fs/bcachefs/btree_iter.c:2842 (discriminator
1))
[ 336.210480] Modules linked in: zram xt_nat xt_tcpudp tun veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables x_tables bpfilter br_netfilter overlay bridg
e stp llc dummy fuse snd_hda_codec_hdmi i915 binfmt_misc i2c_algo_bit drm_buddy cec intel_gtt drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm pps_ldisc pps_core drm coretemp hwmon intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp snd_h
da_intel kvm_intel drm_panel_orientation_quirks snd_intel_dspcfg cfbfillrect cfbimgblt snd_hda_codec cfbcopyarea snd_hda_core fb kvm i2c_i801 mei_hdcp i2c_smbus mei_pxp intel_lpss_pci irqbypass intel_lpss rapl snd_pcm i2c_core idma64 intel_cstate virt_dma video intel_uncore intel_pch_t
hermal snd_timer mfd_core mei_me wmi font tiny_power_button fbdev mei cdc_acm bfq snd acpi_pad soundcore backlight thermal fan button raid1 md_mod crct10dif_pclmul
[ 336.210539] ghash_clmulni_intel sha512_ssse3 xhci_pci mpt3sas aesni_intel xhci_hcd crypto_simd cryptd raid_class scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod
[ 336.210582] CPU: 2 PID: 4437 Comm: burp Tainted: G U 6.6.0-03558-g2748744c3734 #41
[ 336.210589] Hardware name: MSI MS-7982/B150M PRO-VDH (MS-7982), BIOS 3.H0 07/10/2018
[ 336.210595] RIP: 0010:bch2_trans_srcu_unlock (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:2852 (discriminator 1) fs/bcachefs/btree_iter.c:2842 (discriminator 1))
[ 336.210601] Code: 00 00 00 48 8d bd b0 35 00 00 83 fe 01 77 1f e8 b5 ed c6 ff 80 a3 99 00 00 00 fe 5b 5d c3 48 c7 c7 28 50 11 a0 e8 2f fa bd ff <0f> 0b eb cf 0f 0b eb dd 0f 1f 80 00 00 00 00 0f 1f 44 00 00 53 f6
All code
========
0: 00 00 add %al,(%rax)
2: 00 48 8d add %cl,-0x73(%rax)
5: bd b0 35 00 00 mov $0x35b0,%ebp
a: 83 fe 01 cmp $0x1,%esi
d: 77 1f ja 0x2e
f: e8 b5 ed c6 ff call 0xffffffffffc6edc9
14: 80 a3 99 00 00 00 fe andb $0xfe,0x99(%rbx)
1b: 5b pop %rbx
1c: 5d pop %rbp
1d: c3 ret
1e: 48 c7 c7 28 50 11 a0 mov $0xffffffffa0115028,%rdi
25: e8 2f fa bd ff call 0xffffffffffbdfa59
2a:* 0f 0b ud2 <-- trapping instruction
2c: eb cf jmp 0xfffffffffffffffd
2e: 0f 0b ud2
30: eb dd jmp 0xf
32: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
39: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
3e: 53 push %rbx
3f: f6 .byte 0xf6
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: eb cf jmp 0xffffffffffffffd3
4: 0f 0b ud2
6: eb dd jmp 0xffffffffffffffe5
8: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
f: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
14: 53 push %rbx
15: f6 .byte 0xf6
[ 336.210610] RSP: 0018:ffffb783cb0bf9f8 EFLAGS: 00010282
[ 336.210616] RAX: 0000000000000000 RBX: ffff9075db7e8000 RCX: 0000000000000000
[ 336.210622] RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 336.210627] RBP: ffff907620b80000 R08: 0000000000000000 R09: ffffffffa044f060
[ 336.210632] R10: ffffb783cb0bf898 R11: ffffffffa060f0a8 R12: ffff9075db7e8550
[ 336.210638] R13: 0000000000000000 R14: 0000000000000000 R15: ffff907620babf60
[ 336.210645] FS: 00007f4d20021740(0000) GS:ffff90791bf00000(0000) knlGS:0000000000000000
[ 336.210651] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 336.210657] CR2: 00007ffea7870f80 CR3: 000000011b42a001 CR4: 00000000003706e0
[ 336.210662] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 336.210667] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 336.210673] Call Trace:
[ 336.210678] <TASK>
[ 336.210683] ? bch2_trans_srcu_unlock (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:2852 (discriminator 1) fs/bcachefs/btree_iter.c:2842 (discriminator 1))
[ 336.210689] ? __warn (kernel/panic.c:673)
[ 336.210697] ? bch2_trans_srcu_unlock (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:2852 (discriminator 1) fs/bcachefs/btree_iter.c:2842 (discriminator 1))
[ 336.210703] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 336.210710] ? handle_bug (arch/x86/kernel/traps.c:237)
[ 336.210718] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[ 336.210725] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
[ 336.210732] ? bch2_trans_srcu_unlock (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:2852 (discriminator 1) fs/bcachefs/btree_iter.c:2842 (discriminator 1))
[ 336.210738] ? bch2_trans_srcu_unlock (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:2852 (discriminator 1) fs/bcachefs/btree_iter.c:2842 (discriminator 1))
[ 336.210744] ? bch2_trans_begin (fs/bcachefs/btree_iter.c:2888)
[ 336.210766] bch2_trans_begin (fs/bcachefs/btree_iter.c:2920)
[ 336.210772] ? __bch2_create (fs/bcachefs/fs.c:271 (discriminator 1))
[ 336.210779] __bch2_create (fs/bcachefs/fs.c:261 (discriminator 1))
[ 336.210787] ? d_splice_alias (fs/dcache.c:3183)
[ 336.210794] ? bch2_create (fs/bcachefs/fs.c:381 (discriminator 1) fs/bcachefs/fs.c:392 (discriminator 1))
[ 336.210800] bch2_create (fs/bcachefs/fs.c:381 (discriminator 1) fs/bcachefs/fs.c:392 (discriminator 1))
[ 336.210806] path_openat (fs/namei.c:3479 fs/namei.c:3546 fs/namei.c:3776)
[ 336.210814] do_filp_open (fs/namei.c:3810 (discriminator 2))
[ 336.210820] ? __check_object_size (mm/usercopy.c:196 mm/usercopy.c:251 mm/usercopy.c:213)
[ 336.210828] do_sys_openat2 (fs/open.c:1441 (discriminator 1))
[ 336.210834] __x64_sys_openat (fs/open.c:1466)
[ 336.210841] do_syscall_64 (arch/x86/entry/common.c:50 (discriminator 1) arch/x86/entry/common.c:80 (discriminator 1))
[ 336.210847] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 336.210853] RIP: 0033:0x7f4d20119a21
[ 336.210859] Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d ea e9 0d 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 93 00 00 00 48 8b 54 24 28 64 48 2b 14 25
All code
========
0: 75 57 jne 0x59
2: 89 f0 mov %esi,%eax
4: 25 00 00 41 00 and $0x410000,%eax
9: 3d 00 00 41 00 cmp $0x410000,%eax
e: 74 49 je 0x59
10: 80 3d ea e9 0d 00 00 cmpb $0x0,0xde9ea(%rip) # 0xdea01
17: 74 6d je 0x86
19: 89 da mov %ebx,%edx
1b: 48 89 ee mov %rbp,%rsi
1e: bf 9c ff ff ff mov $0xffffff9c,%edi
23: b8 01 01 00 00 mov $0x101,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 0f 87 93 00 00 00 ja 0xc9
36: 48 8b 54 24 28 mov 0x28(%rsp),%rdx
3b: 64 fs
3c: 48 rex.W
3d: 2b .byte 0x2b
3e: 14 25 adc $0x25,%al
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 0f 87 93 00 00 00 ja 0x9f
c: 48 8b 54 24 28 mov 0x28(%rsp),%rdx
11: 64 fs
12: 48 rex.W
13: 2b .byte 0x2b
14: 14 25 adc $0x25,%al
[ 336.210867] RSP: 002b:00007ffc559f9780 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
[ 336.210873] RAX: ffffffffffffffda RBX: 0000000000020041 RCX: 00007f4d20119a21
[ 336.210879] RDX: 0000000000020041 RSI: 0000564288104e00 RDI: 00000000ffffff9c
[ 336.210884] RBP: 0000564288104e00 R08: 0000000000000007 R09: 0000564288104e30
[ 336.210889] R10: 00000000000001b6 R11: 0000000000000202 R12: 00000000000001b6
[ 336.210895] R13: 00005642880e33d0 R14: 0000000000000000 R15: 00007ffc559f9a80
[ 336.210901] </TASK>
[ 336.210905] ---[ end trace 0000000000000000 ]---
[ 346.594711] ------------[ cut here ]------------
[ 346.594730] btree trans held srcu lock (delaying memory reclaim) by more than 10 seconds
[ 346.594737] WARNING: CPU: 2 PID: 6241 at fs/bcachefs/btree_iter.c:2838 bch2_trans_srcu_unlock (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:2852 (discriminator 1) fs/bcachefs/btree_iter.c:2842 (discriminator 1))
[ 346.594766] Modules linked in: zram xt_nat xt_tcpudp tun veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables x_tables bpfilter br_netfilter overlay bridge stp llc dummy fuse snd_hda_codec_hdmi i915 binfmt_misc i2c_algo_bit drm_buddy cec intel_gtt drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm pps_ldisc pps_core drm coretemp hwmon intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp snd_hda_intel kvm_intel drm_panel_orientation_quirks snd_intel_dspcfg cfbfillrect cfbimgblt snd_hda_codec cfbcopyarea snd_hda_core fb kvm i2c_i801 mei_hdcp i2c_smbus mei_pxp intel_lpss_pci irqbypass intel_lpss rapl snd_pcm i2c_core idma64 intel_cstate virt_dma video intel_uncore intel_pch_t
hermal snd_timer mfd_core mei_me wmi font tiny_power_button fbdev mei cdc_acm bfq snd acpi_pad soundcore backlight thermal fan button raid1 md_mod crct10dif_pclmul
[ 346.594852] ghash_clmulni_intel sha512_ssse3 xhci_pci mpt3sas aesni_intel xhci_hcd crypto_simd cryptd raid_class scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod
[ 346.594884] CPU: 2 PID: 6241 Comm: burp Tainted: G U W 6.6.0-03558-g2748744c3734 #41
[ 346.594906] Hardware name: MSI MS-7982/B150M PRO-VDH (MS-7982), BIOS 3.H0 07/10/2018
[ 346.594912] RIP: 0010:bch2_trans_srcu_unlock (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:2852 (discriminator 1) fs/bcachefs/btree_iter.c:2842 (discriminator 1))
[ 346.594918] Code: 00 00 00 48 8d bd b0 35 00 00 83 fe 01 77 1f e8 b5 ed c6 ff 80 a3 99 00 00 00 fe 5b 5d c3 48 c7 c7 28 50 11 a0 e8 2f fa bd ff <0f> 0b eb cf 0f 0b eb dd 0f 1f 80 00 00 00 00 0f 1f 44 00 00 53 f6
All code
========
0: 00 00 add %al,(%rax)
2: 00 48 8d add %cl,-0x73(%rax)
5: bd b0 35 00 00 mov $0x35b0,%ebp
a: 83 fe 01 cmp $0x1,%esi
d: 77 1f ja 0x2e
f: e8 b5 ed c6 ff call 0xffffffffffc6edc9
14: 80 a3 99 00 00 00 fe andb $0xfe,0x99(%rbx)
1b: 5b pop %rbx
1c: 5d pop %rbp
1d: c3 ret
1e: 48 c7 c7 28 50 11 a0 mov $0xffffffffa0115028,%rdi
25: e8 2f fa bd ff call 0xffffffffffbdfa59
2a:* 0f 0b ud2 <-- trapping instruction
2c: eb cf jmp 0xfffffffffffffffd
2e: 0f 0b ud2
30: eb dd jmp 0xf
32: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
39: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
3e: 53 push %rbx
3f: f6 .byte 0xf6
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: eb cf jmp 0xffffffffffffffd3
4: 0f 0b ud2
6: eb dd jmp 0xffffffffffffffe5
8: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
f: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
14: 53 push %rbx
15: f6 .byte 0xf6
[ 346.594927] RSP: 0018:ffffb783cab4b9e8 EFLAGS: 00010286
[ 346.594947] RAX: 0000000000000000 RBX: ffff9076c32f0000 RCX: 0000000000000000
[ 346.594953] RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 346.594959] RBP: ffff907620b80000 R08: 0000000000000000 R09: ffffffffa044f060
[ 346.594965] R10: ffffb783cab4b888 R11: ffffffffa060f0a8 R12: ffff9076c32f0550
[ 346.594971] R13: 0000000000000000 R14: 0000000000000000 R15: ffff907620babf60
[ 346.594977] FS: 00007f4d20021740(0000) GS:ffff90791bf00000(0000) knlGS:0000000000000000
[ 346.594983] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 346.594988] CR2: 00007fdbbc362658 CR3: 0000000232134004 CR4: 00000000003706e0
[ 346.594994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 346.594999] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 346.595004] Call Trace:
[ 346.595009] <TASK>
[ 346.595014] ? bch2_trans_srcu_unlock (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:2852 (discriminator 1) fs/bcachefs/btree_iter.c:2842 (discriminator 1))
[ 346.595020] ? __warn (kernel/panic.c:673)
[ 346.595027] ? bch2_trans_srcu_unlock (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:2852 (discriminator 1) fs/bcachefs/btree_iter.c:2842 (discriminator 1))
[ 346.595033] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 346.595040] ? handle_bug (arch/x86/kernel/traps.c:237)
[ 346.595046] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[ 346.595052] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
[ 346.595059] ? bch2_trans_srcu_unlock (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:2852 (discriminator 1) fs/bcachefs/btree_iter.c:2842 (discriminator 1))
[ 346.595069] ? bch2_trans_srcu_unlock (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:2852 (discriminator 1) fs/bcachefs/btree_iter.c:2842 (discriminator 1))
[ 346.595075] ? bch2_trans_begin (fs/bcachefs/btree_iter.c:2888)
[ 346.595080] bch2_trans_begin (fs/bcachefs/btree_iter.c:2920)
[ 346.595087] ? __bch2_create (fs/bcachefs/fs.c:271 (discriminator 1))
[ 346.595107] __bch2_create (fs/bcachefs/fs.c:261 (discriminator 1))
[ 346.595116] ? d_splice_alias (fs/dcache.c:3183)
[ 346.595123] ? bch2_create (fs/bcachefs/fs.c:381 (discriminator 1) fs/bcachefs/fs.c:392 (discriminator 1))
[ 346.595128] bch2_create (fs/bcachefs/fs.c:381 (discriminator 1) fs/bcachefs/fs.c:392 (discriminator 1))
[ 346.595151] path_openat (fs/namei.c:3479 fs/namei.c:3546 fs/namei.c:3776)
[ 346.595174] do_filp_open (fs/namei.c:3810 (discriminator 2))
[ 346.595180] ? __check_object_size (mm/usercopy.c:196 mm/usercopy.c:251 mm/usercopy.c:213)
[ 346.595187] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:104 ./include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:186)
[ 346.595208] do_sys_openat2 (fs/open.c:1441 (discriminator 1))
[ 346.595215] __x64_sys_openat (fs/open.c:1466)
[ 346.595221] do_syscall_64 (arch/x86/entry/common.c:50 (discriminator 1) arch/x86/entry/common.c:80 (discriminator 1))
[ 346.595230] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 346.595236] RIP: 0033:0x7f4d20119a21
[ 346.595242] Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d ea e9 0d 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 93 00 00 00 48 8b 54 24 28 64 48 2b 14 25
All code
========
0: 75 57 jne 0x59
2: 89 f0 mov %esi,%eax
4: 25 00 00 41 00 and $0x410000,%eax
9: 3d 00 00 41 00 cmp $0x410000,%eax
e: 74 49 je 0x59
10: 80 3d ea e9 0d 00 00 cmpb $0x0,0xde9ea(%rip) # 0xdea01
17: 74 6d je 0x86
19: 89 da mov %ebx,%edx
1b: 48 89 ee mov %rbp,%rsi
1e: bf 9c ff ff ff mov $0xffffff9c,%edi
23: b8 01 01 00 00 mov $0x101,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 0f 87 93 00 00 00 ja 0xc9
36: 48 8b 54 24 28 mov 0x28(%rsp),%rdx
3b: 64 fs
3c: 48 rex.W
3d: 2b .byte 0x2b
3e: 14 25 adc $0x25,%al
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 0f 87 93 00 00 00 ja 0x9f
c: 48 8b 54 24 28 mov 0x28(%rsp),%rdx
11: 64 fs
12: 48 rex.W
13: 2b .byte 0x2b
14: 14 25 adc $0x25,%al
[ 346.595250] RSP: 002b:00007ffc559f9780 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
[ 346.595256] RAX: ffffffffffffffda RBX: 0000000000020041 RCX: 00007f4d20119a21
[ 346.595262] RDX: 0000000000020041 RSI: 0000564288153820 RDI: 00000000ffffff9c
[ 346.595267] RBP: 0000564288153820 R08: 0000000000000007 R09: 00005642881537c0
[ 346.595272] R10: 00000000000001b6 R11: 0000000000000202 R12: 00000000000001b6
[ 346.595277] R13: 00005642880e32f0 R14: 0000000000000000 R15: 00007ffc559f9a80
[ 346.595284] </TASK>
[ 346.595288] ---[ end trace 0000000000000000 ]---
[ 847.787199] ------------[ cut here ]------------
[ 847.787217] btree trans held srcu lock (delaying memory reclaim) by more than 10 seconds
[ 847.787228] WARNING: CPU: 3 PID: 6952 at fs/bcachefs/btree_iter.c:2838 bch2_trans_put (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:3087 (discriminator 1))
[ 847.787244] Modules linked in: zram xt_nat xt_tcpudp tun veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype iptable_filter ip_tables x_tables bpfilter br_netfilter overlay bridg
e stp llc dummy fuse snd_hda_codec_hdmi i915 binfmt_misc i2c_algo_bit drm_buddy cec intel_gtt drm_display_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm pps_ldisc pps_core drm coretemp hwmon intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp snd_hda_intel kvm_intel drm_panel_orientation_quirks snd_intel_dspcfg cfbfillrect cfbimgblt snd_hda_codec cfbcopyarea snd_hda_core fb kvm i2c_i801 mei_hdcp i2c_smbus mei_pxp intel_lpss_pci irqbypass intel_lpss rapl snd_pcm i2c_core idma64 intel_cstate virt_dma video intel_uncore intel_pch_thermal snd_timer mfd_core mei_me wmi font tiny_power_button fbdev mei cdc_acm bfq snd acpi_pad soundcore backlight thermal fan button raid1 md_mod crct10dif_pclmul
[ 847.787303] ghash_clmulni_intel sha512_ssse3 xhci_pci mpt3sas aesni_intel xhci_hcd crypto_simd cryptd raid_class scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod
[ 847.787338] CPU: 3 PID: 6952 Comm: burp Tainted: G U W 6.6.0-03558-g2748744c3734 #41
[ 847.787348] Hardware name: MSI MS-7982/B150M PRO-VDH (MS-7982), BIOS 3.H0 07/10/2018
[ 847.787358] RIP: 0010:bch2_trans_put (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:3087 (discriminator 1))
[ 847.787367] Code: 00 83 fe 01 77 54 e8 83 de c6 ff e9 22 fe ff ff 48 8d 7b 10 e8 55 fc 61 00 e9 90 fd ff ff 48 c7 c7 28 50 11 a0 e8 f4 ea bd ff <0f> 0b eb c6 ba 01 00 00 00 b8 ff ff ff ff 48 0f bd c2 48 98 f0 48
All code
========
0: 00 83 fe 01 77 54 add %al,0x547701fe(%rbx)
6: e8 83 de c6 ff call 0xffffffffffc6de8e
b: e9 22 fe ff ff jmp 0xfffffffffffffe32
10: 48 8d 7b 10 lea 0x10(%rbx),%rdi
14: e8 55 fc 61 00 call 0x61fc6e
19: e9 90 fd ff ff jmp 0xfffffffffffffdae
1e: 48 c7 c7 28 50 11 a0 mov $0xffffffffa0115028,%rdi
25: e8 f4 ea bd ff call 0xffffffffffbdeb1e
2a:* 0f 0b ud2 <-- trapping instruction
2c: eb c6 jmp 0xfffffffffffffff4
2e: ba 01 00 00 00 mov $0x1,%edx
33: b8 ff ff ff ff mov $0xffffffff,%eax
38: 48 0f bd c2 bsr %rdx,%rax
3c: 48 98 cltq
3e: f0 lock
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: eb c6 jmp 0xffffffffffffffca
4: ba 01 00 00 00 mov $0x1,%edx
9: b8 ff ff ff ff mov $0xffffffff,%eax
e: 48 0f bd c2 bsr %rdx,%rax
12: 48 98 cltq
14: f0 lock
15: 48 rex.W
[ 847.787377] RSP: 0018:ffffb783c1563a58 EFLAGS: 00010286
[ 847.787386] RAX: 0000000000000000 RBX: ffff90766e3f0000 RCX: 0000000000000000
[ 847.787392] RDX: 0000000000000002 RSI: 0000000000000027 RDI: 00000000ffffffff
[ 847.787401] RBP: ffff907620b80000 R08: 0000000000000000 R09: ffffffffa044f060
[ 847.787409] R10: ffffb783c15638f8 R11: ffffffffa060f0a8 R12: ffff907620babf60
[ 847.787416] R13: ffff907620b834e0 R14: ffff907620b80000 R15: ffff90769c096b60
[ 847.787424] FS: 00007f4d20021740(0000) GS:ffff90791bf80000(0000) knlGS:0000000000000000
[ 847.787438] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 847.787446] CR2: 00007fc689bceff4 CR3: 00000003523ec006 CR4: 00000000003706e0
[ 847.787451] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 847.787458] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 847.787463] Call Trace:
[ 847.787473] <TASK>
[ 847.787479] ? bch2_trans_put (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:3087 (discriminator 1))
[ 847.787488] ? __warn (kernel/panic.c:673)
[ 847.787496] ? bch2_trans_put (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:3087 (discriminator 1))
[ 847.787505] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[ 847.787514] ? handle_bug (arch/x86/kernel/traps.c:237)
[ 847.787522] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1))
[ 847.787530] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
[ 847.787539] ? bch2_trans_put (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:3087 (discriminator 1))
[ 847.787547] ? bch2_trans_put (fs/bcachefs/btree_iter.c:2838 (discriminator 1) fs/bcachefs/btree_iter.c:2836 (discriminator 1) fs/bcachefs/btree_iter.c:3087 (discriminator 1))
[ 847.787555] __bch2_create (fs/bcachefs/fs.c:336)
[ 847.787567] ? d_splice_alias (fs/dcache.c:3183)
[ 847.787576] ? bch2_create (fs/bcachefs/fs.c:381 (discriminator 1) fs/bcachefs/fs.c:392 (discriminator 1))
[ 847.787581] bch2_create (fs/bcachefs/fs.c:381 (discriminator 1) fs/bcachefs/fs.c:392 (discriminator 1))
[ 847.787589] path_openat (fs/namei.c:3479 fs/namei.c:3546 fs/namei.c:3776)
[ 847.787599] do_filp_open (fs/namei.c:3810 (discriminator 2))
[ 847.787607] ? __check_object_size (mm/usercopy.c:196 mm/usercopy.c:251 mm/usercopy.c:213)
[ 847.787616] do_sys_openat2 (fs/open.c:1441 (discriminator 1))
[ 847.787626] __x64_sys_openat (fs/open.c:1466)
[ 847.787635] do_syscall_64 (arch/x86/entry/common.c:50 (discriminator 1) arch/x86/entry/common.c:80 (discriminator 1))
[ 847.787643] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 847.787651] RIP: 0033:0x7f4d20119a21
[ 847.787659] Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d ea e9 0d 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 93 00 00 00 48 8b 54 24 28 64 48 2b 14 25
All code
========
0: 75 57 jne 0x59
2: 89 f0 mov %esi,%eax
4: 25 00 00 41 00 and $0x410000,%eax
9: 3d 00 00 41 00 cmp $0x410000,%eax
e: 74 49 je 0x59
10: 80 3d ea e9 0d 00 00 cmpb $0x0,0xde9ea(%rip) # 0xdea01
17: 74 6d je 0x86
19: 89 da mov %ebx,%edx
1b: 48 89 ee mov %rbp,%rsi
1e: bf 9c ff ff ff mov $0xffffff9c,%edi
23: b8 01 01 00 00 mov $0x101,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 0f 87 93 00 00 00 ja 0xc9
36: 48 8b 54 24 28 mov 0x28(%rsp),%rdx
3b: 64 fs
3c: 48 rex.W
3d: 2b .byte 0x2b
3e: 14 25 adc $0x25,%al
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 0f 87 93 00 00 00 ja 0x9f
c: 48 8b 54 24 28 mov 0x28(%rsp),%rdx
11: 64 fs
12: 48 rex.W
13: 2b .byte 0x2b
14: 14 25 adc $0x25,%al
[ 847.787669] RSP: 002b:00007ffc559f9780 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
[ 847.787677] RAX: ffffffffffffffda RBX: 0000000000020041 RCX: 00007f4d20119a21
[ 847.787684] RDX: 0000000000020041 RSI: 0000564288156da0 RDI: 00000000ffffff9c
[ 847.787692] RBP: 0000564288156da0 R08: 0000000000000007 R09: 0000564288156dd0
[ 847.787700] R10: 00000000000001b6 R11: 0000000000000202 R12: 00000000000001b6
[ 847.787707] R13: 00005642880e37a0 R14: 0000000000000000 R15: 00007ffc559f9a80
[ 847.787715] </TASK>
[ 847.787724] ---[ end trace 0000000000000000 ]---
I have couple filesystems on box, but I suspect this is related also to this fs:
External UUID: 2fc82775-653c-4995-89f9-5df5f626314c
Internal UUID: 1ec747b3-ea96-485b-bdc5-745565da4b69
Device index: 0
Label: media2
Version: 1.3: rebalance_work
Version upgrade complete: 1.3: rebalance_work
Oldest version on disk: 1.2: deleted_inodes
Created: Mon Oct 23 14:33:09 2023
Sequence number: 84
Superblock size: 4504
Clean: 0
Devices: 1
Sections: members_v1,replicas_v0,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors
Features: lz4,zstd,journal_seq_blacklist_v3,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features: alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
Options:
block_size: 4.00 KiB
btree_node_size: 256 KiB
errors: continue [ro] panic
metadata_replicas: 2
data_replicas: 1
metadata_replicas_required: 1
data_replicas_required: 1
encoded_extent_max: 1.00 MiB
metadata_checksum: none [crc32c] crc64 xxhash
data_checksum: none [crc32c] crc64 xxhash
compression: none
background_compression: lz4
str_hash: crc32c crc64 [siphash]
metadata_target: none
foreground_target: none
background_target: none
promote_target: none
erasure_code: 0
inodes_32bit: 1
shard_inode_numbers: 1
inodes_use_key_cache: 1
gc_reserve_percent: 5
gc_reserve_bytes: 0 B
root_reserve_percent: 0
wide_macs: 0
acl: 1
usrquota: 0
grpquota: 0
prjquota: 0
journal_flush_delay: 1000
journal_flush_disabled: 0
journal_reclaim_delay: 100
journal_transaction_names: 1
version_upgrade: [compatible] incompatible none
nocow: 0
members_v2 (size 144):
Device: 0
Label: (none)
UUID: 36d40aff-eaad-4218-8d08-4f44b2150dae
Size: 2.00 TiB
read errors: 0
write errors: 1
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 512 KiB
First bucket: 0
Buckets: 4194304
Last mount: Fri Nov 3 20:28:16 2023
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user
Discard: 0
Freespace initialized: 1
replicas_v0 (size 24):
btree: 1 [0] journal: 1 [0] user: 1 [0]
The warning was firing incorrectly when we weren't holding the srcu lock - should be fixed now
Thanks, i'm not sure if this is in master branch, as for now, at commit #6c5850f4860d issue exists.
yeah, something funny is going on.
I also get this issue. It seemingly leads to an OOM on boot which is eventually "reclaimed" when it loops back around past the total memory. Unfortunately, the kernel treats the OOM as if it's real, which it may or may not be. Thankfully it only happens once.
Hmm, this happens also in userspace:
# bcachefs fsck -n /dev/vg-ZA1C3NKP/burp
mounting version 1.3: rebalance_work opts=metadata_replicas=2,compression=lz4,background_compression=zstd:6,degraded,journal_reclaim_delay=1000,fsck,fix_errors=no,nochanges,nodirect_io
recovering from clean shutdown, journal seq 23150366
journal read done, replaying entries 23150366-23150366
alloc_read... done
stripes_read... done
snapshots_read... done
WARNING at libbcachefs/btree_iter.c:2729: btree trans held srcu lock (delaying memory reclaim) for 68 seconds
WARNING at libbcachefs/btree_iter.c:2729: btree trans held srcu lock (delaying memory reclaim) for 16 seconds
Interestingly, I have not seen this in linux 6.7-rc1 or ~~6.7-rc2~~. That said, I have also reinstalled with linux 6.7-rc1, so it may be an upgrade path problem. Edit: I didn't wait long enough lol. This happens in 6.7-rc2.
bcachefs super
External UUID: 7b12f60f-5a7d-4e06-a915-60ff5e794ef2
Internal UUID: 44e9d681-d028-4320-8191-2a5f2ef5e2ae
Device index: 0
Label: root
Version: 1.3: rebalance_work
Version upgrade complete: 1.3: rebalance_work
Oldest version on disk: 1.3: rebalance_work
Created: Thu Nov 16 13:53:08 2023
Sequence number: 174
Superblock size: 6064
Clean: 0
Devices: 3
Sections: members_v1,crypt,replicas_v0,disk_groups,clean,journal_seq_blacklist,journal_v2,counters,members_v2,errors
Features: zstd,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,reflink_inline_data,new_varint,journal_no_flush,alloc_v2,extents_across_btree_nodes
Compat features: alloc_info,alloc_metadata,extents_above_btree_updates_done,bformat_overflow_done
Options:
block_size: 4.00 KiB
btree_node_size: 256 KiB
errors: continue [ro] panic
metadata_replicas: 2
data_replicas: 2
metadata_replicas_required: 1
data_replicas_required: 1
encoded_extent_max: 64.0 KiB
metadata_checksum: none crc32c crc64 [xxhash]
data_checksum: none crc32c crc64 [xxhash]
compression: zstd
background_compression: zstd
str_hash: crc32c crc64 [siphash]
metadata_target: ssd
foreground_target: ssd
background_target: hdd
promote_target: ssd
erasure_code: 0
inodes_32bit: 1
shard_inode_numbers: 1
inodes_use_key_cache: 1
gc_reserve_percent: 8
gc_reserve_bytes: 0 B
root_reserve_percent: 2
wide_macs: 0
acl: 1
usrquota: 0
grpquota: 0
prjquota: 0
journal_flush_delay: 1000
journal_flush_disabled: 0
journal_reclaim_delay: 100
journal_transaction_names: 1
version_upgrade: [compatible] incompatible none
nocow: 0
members_v2 (size 376):
Device: 0
Label: ssd1 (1)
UUID: dc54fd83-c9d9-4f45-bf95-8a3875149a6f
Size: 928 GiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 512 KiB
First bucket: 0
Buckets: 1899544
Last mount: Thu Nov 23 15:47:32 2023
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Discard: 1
Freespace initialized: 1
Device: 1
Label: ssd2 (2)
UUID: 375e0964-5443-49ab-be76-668b6934b78b
Size: 868 GiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 512 KiB
First bucket: 0
Buckets: 1776664
Last mount: Thu Nov 23 15:47:32 2023
State: rw
Data allowed: journal,btree,user
Has data: journal,btree,user,cached
Discard: 1
Freespace initialized: 1
Device: 2
Label: hdd1 (4)
UUID: e60c9ec1-981c-46b4-a4fb-188cad1811cf
Size: 7.28 TiB
read errors: 0
write errors: 0
checksum errors: 0
seqread iops: 0
seqwrite iops: 0
randread iops: 0
randwrite iops: 0
Bucket size: 512 KiB
First bucket: 0
Buckets: 15261768
Last mount: Thu Nov 23 15:47:32 2023
State: rw
Data allowed: journal,btree,user
Has data: journal,user
Discard: 0
Freespace initialized: 1
crypt (size 64):
KFD: 0
scrypt n: 14
scrypt r: 3
scrypt p: 4
replicas_v0 (size 40):
user: 2 [0 2] btree: 2 [0 1] cached: 1 [0] journal: 2 [0 2] user: 2 [0 1] user: 2 [1 2] cached: 1 [1] journal: 2 [0 1]
dmesg
[ 504.971517] ------------[ cut here ]------------
[ 504.971521] btree trans held srcu lock (delaying memory reclaim) for 27 seconds
[ 504.971569] WARNING: CPU: 14 PID: 8606 at fs/bcachefs/btree_iter.c:2838 bch2_trans_put+0x20e/0x240 [bcachefs]
[ 504.971606] Modules linked in: qrtr snd_seq_dummy snd_hrtimer snd_seq rfcomm af_packet cmac algif_hash algif_skcipher af_alg bnep msr nls_iso8859_1 nls_cp437 vfat fat iwlmvm mac80211 libarc4 snd_hda_codec_realtek snd_hda_codec_generic xone_dongle(O) snd_hda_codec_hdmi iwlwifi snd_hda_intel edac_mce_amd snd_usb_audio snd_intel_dspcfg edac_core r8169 intel_rapl_msr snd_intel_sdw_acpi xone_gip(O) intel_rapl_common crc32_pclmul realtek snd_hda_codec polyval_clmulni mdio_devres polyval_generic snd_usbmidi_lib gf128mul snd_rawmidi ghash_clmulni_intel libphy snd_hda_core snd_seq_device mc snd_hwdep sch_fq_codel rapl btusb snd_pcm btrtl btintel snd_timer ucsi_ccg btbcm btmtk typec_ucsi snd typec eeepc_wmi mousedev joydev evdev cfg80211 bluetooth asus_ec_sensors igb uas soundcore roles asus_wmi mac_hid cdc_acm battery ptp ledtrig_audio sp5100_tco sparse_keymap watchdog platform_profile ecdh_generic pps_core ecc i2c_designware_pci crc16 i2c_ccgx_ucsi rfkill wmi_bmof mxm_wmi dca tpm_crb i2c_piix4 k10temp tpm_tis tpm_tis_core
[ 504.971674] xt_conntrack tiny_power_button button nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip6t_rpfilter ipt_rpfilter xt_pkttype xt_LOG nf_log_syslog xt_tcpudp nft_compat nf_tables nfnetlink uinput ctr atkbd libps2 serio vivaldi_fmap loop tun tap macvlan bridge stp llc kvm_amd ccp kvm irqbypass nct6775 nct6775_core hwmon_vid fuse efi_pstore configfs zstd zram efivarfs dmi_sysfs ip_tables x_tables autofs4 poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha bcachefs libcrc32c crc32c_generic xor lz4hc_compress lz4_compress raid6_pq dm_crypt aes_generic cbc encrypted_keys trusted asn1_encoder tee tpm rng_core hid_logitech_hidpp hid_logitech_dj usb_storage input_leds led_class hid_generic usbhid hid sd_mod ahci libahci xhci_pci xhci_pci_renesas xhci_hcd libata crc32c_intel sha512_ssse3 sha512_generic nvme sha256_ssse3 sha1_ssse3 nvme_core usbcore aesni_intel scsi_mod libaes crypto_simd t10_pi cryptd crc64_rocksoft crc_t10dif crct10dif_generic crct10dif_pclmul scsi_common usb_common crc64
[ 504.971747] crct10dif_common rtc_cmos dm_mod dax amdgpu i2c_algo_bit drm_ttm_helper ttm agpgart video wmi drm_exec drm_suballoc_helper amdxcp drm_buddy gpu_sched drm_display_helper drm_kms_helper drm backlight firmware_class
[ 504.971763] CPU: 14 PID: 8606 Comm: cat Tainted: G O 6.7.0-rc2 #1-NixOS
[ 504.971766] Hardware name: ASUS System Product Name/ROG CROSSHAIR VIII DARK HERO, BIOS 4702 10/20/2023
[ 504.971767] RIP: 0010:bch2_trans_put+0x20e/0x240 [bcachefs]
[ 504.971794] Code: ff f3 48 c7 c7 98 25 66 c1 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 c1 ea 03 48 f7 e2 48 89 d6 48 c1 ee 04 e8 93 51 89 f2 90 <0f> 0b 90 90 8b b3 90 00 00 00 48 8d bd 30 35 00 00 83 fe 01 77 0a
[ 504.971795] RSP: 0018:ffffc9002460fbb8 EFLAGS: 00010286
[ 504.971797] RAX: 0000000000000000 RBX: ffff8886723e4000 RCX: 0000000000000027
[ 504.971799] RDX: ffff888fee921588 RSI: 0000000000000001 RDI: ffff888fee921580
[ 504.971800] RBP: ffff888117800000 R08: 0000000000000000 R09: ffffc9002460fa58
[ 504.971801] R10: 0000000000000003 R11: ffffffffb5538a68 R12: 000000000000002b
[ 504.971802] R13: ffff8886723e4010 R14: ffff88811782e5c0 R15: 0000000000000001
[ 504.971803] FS: 00007f21a8eaf740(0000) GS:ffff888fee900000(0000) knlGS:0000000000000000
[ 504.971805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 504.971806] CR2: 00007f3d5c30f004 CR3: 000000075b22a000 CR4: 0000000000f50ef0
[ 504.971807] PKRU: 55555554
[ 504.971809] Call Trace:
[ 504.971811] <TASK>
[ 504.971812] ? bch2_trans_put+0x20e/0x240 [bcachefs]
[ 504.971837] ? __warn+0x81/0x130
[ 504.971843] ? bch2_trans_put+0x20e/0x240 [bcachefs]
[ 504.971869] ? report_bug+0x171/0x1a0
[ 504.971875] ? handle_bug+0x42/0x70
[ 504.971878] ? exc_invalid_op+0x17/0x70
[ 504.971881] ? asm_exc_invalid_op+0x1a/0x20
[ 504.971886] ? bch2_trans_put+0x20e/0x240 [bcachefs]
[ 504.971910] ? bch2_trans_put+0x20d/0x240 [bcachefs]
[ 504.971934] bch2_compression_stats_to_text.isra.0+0x4ff/0x690 [bcachefs]
[ 504.971967] ? bch2_compression_stats_to_text.isra.0+0x160/0x690 [bcachefs]
[ 504.971997] bch2_fs_to_text+0x54b/0x650 [bcachefs]
[ 504.972021] bch2_fs_show+0x61/0x170 [bcachefs]
[ 504.972045] sysfs_kf_seq_show+0xa7/0x100
[ 504.972049] seq_read_iter+0x126/0x470
[ 504.972054] vfs_read+0x1f6/0x320
[ 504.972060] ksys_read+0x6f/0xf0
[ 504.972063] do_syscall_64+0x46/0xf0
[ 504.972067] entry_SYSCALL_64_after_hwframe+0x6f/0x77
[ 504.972069] RIP: 0033:0x7f21a8fbe341
[ 504.972089] Code: 00 48 8b 15 f1 8a 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb c3 e8 60 c7 01 00 f3 0f 1e fa 80 3d 25 10 0e 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 48 83 ec 28 48 89 54
[ 504.972091] RSP: 002b:00007ffd1484f628 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 504.972093] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f21a8fbe341
[ 504.972094] RDX: 0000000000020000 RSI: 00007f219b89c000 RDI: 0000000000000003
[ 504.972095] RBP: 0000000000020000 R08: 00000000ffffffff R09: 0000000000000000
[ 504.972096] R10: 0000000000000022 R11: 0000000000000246 R12: 00007f219b89c000
[ 504.972097] R13: 0000000000000003 R14: 00007f219b89c000 R15: 0000000000020000
[ 504.972101] </TASK>
[ 504.972102] ---[ end trace 0000000000000000 ]---
I'm not sure if this is the same error, but I'm seeing this with 6.7.0-rc6
:
dmesg
btree trans held srcu lock (delaying memory reclaim) for 18 seconds
WARNING: CPU: 1 PID: 361263 at fs/bcachefs/btree_iter.c:2838 bch2_trans_put+0x228/0x250 [bcachefs]
Modules linked in: bcachefs lz4hc_compress lz4_compress vhost_net vhost vhost_iotlb xt_mark xt_comment nft_chain_nat xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo xt_addrtype overlay cmac algif_hash algif_skcipher af_alg bnep msr ses enclosure scsi_transport_sas snd_hda_codec_hdmi btusb btrtl btintel hid_sensor_als hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_rotation hid_sensor_magn_3d joydev cmdlinepart hid_sensor_incl_3d btbcm hid_sensor_trigger industrialio_triggered_buffer spi_nor btmtk kfifo_buf hid_sensor_iio_common mousedev industrialio hid_sensor_custom hid_multitouch hid_sensor_hub x86_pkg_temp_thermal mtd hid_generic bluetooth intel_powerclamp i915 iwlmvm nls_iso8859_1 coretemp nls_cp437 crc32_pclmul mei_pxp mei_hdcp ecdh_generic intel_rapl_msr vfat ecc mac80211 polyval_clmulni polyval_generic crc16 fat uas cdc_acm gf128mul ghash_clmulni_intel ptp pps_core sha512_ssse3 libarc4 sha512_generic drm_buddy ttm usb_storage usbhid iTCO_wdt intel_pmc_bxt drm_display_helper
spi_intel_platform watchdog spi_intel sha256_ssse3 xt_conntrack snd_hda_codec_realtek cec sha1_ssse3 snd_hda_codec_generic aesni_intel iwlwifi ledtrig_audio libaes crypto_simd drm_kms_helper snd_hda_intel uvcvideo cryptd rapl ip6t_rpfilter processor_thermal_device_pci_legacy intel_cstate snd_intel_dspcfg snd_intel_sdw_acpi ipt_rpfilter videobuf2_vmalloc intel_uncore uvc processor_thermal_device r8153_ecm snd_hda_codec psmouse ideapad_laptop processor_thermal_wt_hint thermal videobuf2_memops snd_hda_core intel_gtt sparse_keymap xt_pkttype cdc_ether processor_thermal_rfim cfg80211 snd_hwdep platform_profile agpgart processor_thermal_rapl videobuf2_v4l2 xt_LOG snd_pcm usbnet battery snd_timer nf_log_syslog mei_me i2c_algo_bit intel_rapl_common i2c_hid_acpi videodev snd video tiny_power_button xt_tcpudp i2c_hid processor_thermal_wt_req nft_compat wmi hid mei soundcore processor_thermal_power_floor i2c_i801 processor_thermal_mbox int340x_thermal_zone rfkill evdev soc_button_array lpc_ich int3400_thermal
intel_soc_dts_iosf i2c_smbus intel_smartconnect videobuf2_common input_leds acpi_thermal_rel button intel_pch_thermal ac mac_hid mc serio_raw r8152 nf_tables led_class mii nfnetlink sch_fq_codel loop xt_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter veth tun tap macvlan bridge stp llc kvm_intel kvm drm irqbypass fuse efi_pstore backlight configfs efivarfs dmi_sysfs ip_tables x_tables autofs4 sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic ahci xhci_pci xhci_pci_renesas libahci firmware_class xhci_hcd libata ehci_pci ehci_hcd atkbd libps2 vivaldi_fmap usbcore scsi_mod crct10dif_pclmul crct10dif_common scsi_common usb_common i8042 serio rtc_cmos dm_mod dax btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq
CPU: 1 PID: 361263 Comm: fish Not tainted 6.7.0-rc6 #1-NixOS
Hardware name: LENOVO 20266/Yoga2, BIOS 76CN43WW 03/30/2015
RIP: 0010:bch2_trans_put+0x228/0x250 [bcachefs]
Code: 75 f1 48 c7 c7 b8 c4 fe c1 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 c1 ea 03 48 f7 e2 48 89 d6 48 c1 ee 04 e8 09 34 ff ef 90 <0f> 0b 90 90 8b b3 90 00 00 00 48 8d bd 38 35 00 00 83 fe 01 77 0a
RSP: 0018:ffffc900033d7b58 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8881314b8000 RCX: 0000000000000027
RDX: ffff8882570a1548 RSI: 0000000000000001 RDI: ffff8882570a1540
RBP: ffff88812c800000 R08: 0000000000000000 R09: ffffc900033d79f8
R10: 0000000000000003 R11: ffffffffb3738a68 R12: 000000000000000d
R13: ffff8881314b8010 R14: ffff88812c82add0 R15: 0000000000000001
FS: 00007ff899dfe6c0(0000) GS:ffff888257080000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8daba87d40 CR3: 00000001081e6005 CR4: 00000000001726f0
Call Trace:
<TASK>
? bch2_trans_put+0x228/0x250 [bcachefs]
? __warn+0x81/0x130
? bch2_trans_put+0x228/0x250 [bcachefs]
? report_bug+0x171/0x1a0
? handle_bug+0x42/0x70
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? bch2_trans_put+0x228/0x250 [bcachefs]
? bch2_trans_put+0x227/0x250 [bcachefs]
bch2_dirent_lookup+0xc7/0x130 [bcachefs]
bch2_lookup+0x83/0xd0 [bcachefs]
__lookup_slow+0x86/0x130
walk_component+0xdb/0x150
path_lookupat+0x67/0x190
? page_add_anon_rmap+0x65/0x1f0
filename_lookup+0xe8/0x1f0
? getname_flags.part.0+0x4b/0x1f0
user_path_at_empty+0x3a/0x60
do_faccessat+0x106/0x2f0
do_syscall_64+0x46/0xf0
entry_SYSCALL_64_after_hwframe+0x6f/0x77
RIP: 0033:0x7ff89ad1384b
Code: 8b 15 e9 d5 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb d4 e8 58 12 02 00 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 15 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 b1 d5 0d 00 f7 d8
RSP: 002b:00007ff899dfd5f8 EFLAGS: 00000206 ORIG_RAX: 0000000000000015
RAX: ffffffffffffffda RBX: 00007ff899dfd830 RCX: 00007ff89ad1384b
RDX: 000000000000000b RSI: 0000000000000001 RDI: 00007ff899dfd710
RBP: 00007ff899dfd900 R08: 00007ff899dfd718 R09: 00007ff899dfd480
R10: 0000000000000004 R11: 0000000000000206 R12: 00007ff899dfd6b8
R13: 00007ff899dfd740 R14: 00007ff899dfd6e8 R15: 0000000001ae1bc0
</TASK>
---[ end trace 0000000000000000 ]---
And fsck
seemed okay, but did report some similar warnings:
fsck
mounting version 1.3: rebalance_work opts=degraded,fsck,fix_errors=ask
recovering from clean shutdown, journal seq 112625
journal read done, replaying entries 112625-112625
alloc_read... done
stripes_read... done
snapshots_read... done
WARNING at libbcachefs/btree_iter.c:2838: btree trans held srcu lock (delaying memory reclaim) by more than 10 seconds
check_allocations... done
journal_replay... done
WARNING at libbcachefs/btree_iter.c:2838: btree trans held srcu lock (delaying memory reclaim) by more than 10 seconds
check_alloc_info... done
check_lrus... done
check_btree_backpointers... done
check_backpointers_to_extents... done
check_extents_to_backpointers... done
check_alloc_to_lru_refs... done
check_snapshot_trees... done
check_snapshots... done
check_subvols... done
delete_dead_snapshots... done
resume_logged_ops... done
check_inodes... done
check_extents... done
check_indirect_extents... done
check_dirents... done
check_xattrs... done
check_root... done
check_directory_structure... done
check_nlinks... done
delete_dead_inodes... done
going read-write
In case it helps, this is on an external disk connected through a powered usb3 hub that also has a couple other disks connected.
So in your case it's probably being caused by high IO latency - we do btree node prefetching in the snapshot read path, so that can multiply latency of a particular btree node read depending on submit order.
The main suspected cause I'm hunting for right now is memory reclaim related - I suspect there's places where we're holding btree node locks while doing memory reclaim; tracking those down won't be too hard once I teach lockdep to check for it.
also been seeing this for a while now, most recently on 6.8-rc2 dmesg.txt
fsck completes successfully as well
will post the output of bcachefs fs usage
in an edit shortly
EDIT:
bcachefs fs usage
Filesystem: 00345bfa-2c4a-434f-a6ff-73f3874dbb26
Size: 1840376583680
Used: 992210965504
Online reserved: 2531328
Data type Required/total Durability Devices
reserved: 1/0 [] 5621252096
btree: 1/1 1 [nvme0n1] 12130189312
user: 1/1 1 [nvme0n1] 621832521728
user: 1/1 1 [sda] 343939324416
cached: 1/1 1 [nvme0n1] 45736656384
cached: 1/1 1 [sda] 199923758080
fast.big (device 2): nvme0n1 rw
data buckets fragmented
free: 305063788544 581863
sb: 3149824 7 520192
journal: 4294967296 8192
btree: 12130189312 36176 6836453376
user: 621832521728 1189049 1571600384
cached: 45736656384 92452
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 1000204664832 1907739
slow (device 1): sda rw
data buckets fragmented
free: 421821153280 804560
sb: 3149824 7 520192
journal: 4294967296 8192
btree: 0 0
user: 343939324416 680391 12782702080
cached: 199923758080 414587
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 1048576 2
capacity: 1000204664832 1907739
This is still an issue for me with Git master as of https://github.com/koverstreet/bcachefs/commit/b3c7fd35c03c17a950737fb56a06b730a7962d28
Apr 01 17:26:09 archlinux kernel: ------------[ cut here ]------------
Apr 01 17:26:09 archlinux kernel: btree trans held srcu lock (delaying memory reclaim) for 14 seconds
Apr 01 17:26:09 archlinux kernel: WARNING: CPU: 6 PID: 4947 at fs/bcachefs/btree_iter.c:2871 bch2_trans_srcu_unlock+0x112/0x120 [bcachefs]
Apr 01 17:26:09 archlinux kernel: Modules linked in: poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha uas usb_storage snd_seq_dummy snd_hrtimer snd_seq bridge stp llc hid_playstation mousedev led_class_mu>
Apr 01 17:26:09 archlinux kernel: snd_timer mac_hid snd soundcore usbhid winesync(OE) i2c_dev sg vhba(OE) crypto_user fuse dm_mod loop nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 nvme crc32c_intel nvme_core xhci_pc>
Apr 01 17:26:09 archlinux kernel: CPU: 6 PID: 4947 Comm: bch-rebalance/s Tainted: G OE 6.9.0-rc2-1-custom-bcachefs-git-00032-gb3c7fd35c03c #1 5c48e1afc39b4f295430db40cd48f5e24e7376f2
Apr 01 17:26:09 archlinux kernel: Hardware name: Micro-Star International Co., Ltd. MS-7B85/B450 GAMING PRO CARBON AC (MS-7B85), BIOS 1.G0 07/25/2022
Apr 01 17:26:09 archlinux kernel: RIP: 0010:bch2_trans_srcu_unlock+0x112/0x120 [bcachefs]
Apr 01 17:26:09 archlinux kernel: Code: 7f 95 c5 48 b8 cf f7 53 e3 a5 9b c4 20 48 c7 c7 18 0f 9d c1 48 29 ca 48 c1 ea 03 48 f7 e2 48 89 d6 48 c1 ee 04 e8 5e 3d df c3 <0f> 0b e9 69 ff ff ff 0f 0b e9 78 ff ff ff 66 0f 1f 00 0f 1f 44 00
Apr 01 17:26:09 archlinux kernel: RSP: 0018:ffffc103413bf728 EFLAGS: 00010282
Apr 01 17:26:09 archlinux kernel: RAX: 0000000000000000 RBX: ffff9f1aceeac000 RCX: 0000000000000027
Apr 01 17:26:09 archlinux kernel: RDX: ffff9f1b2ef219c8 RSI: 0000000000000001 RDI: ffff9f1b2ef219c0
Apr 01 17:26:09 archlinux kernel: RBP: ffff9f1a36700000 R08: 0000000000000000 R09: 0000000000000003
Apr 01 17:26:09 archlinux kernel: R10: ffffc103413bf5b8 R11: ffffffff872c6e28 R12: ffffc103413bfea0
Apr 01 17:26:09 archlinux kernel: R13: dead000000000122 R14: dead000000000100 R15: ffff9f18bf947800
Apr 01 17:26:09 archlinux kernel: FS: 0000000000000000(0000) GS:ffff9f1b2ef00000(0000) knlGS:0000000000000000
Apr 01 17:26:09 archlinux kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 01 17:26:09 archlinux kernel: CR2: 00007cbc1beef000 CR3: 000000024b05b000 CR4: 0000000000f50ef0
Apr 01 17:26:09 archlinux kernel: PKRU: 55555554
Apr 01 17:26:09 archlinux kernel: Call Trace:
Apr 01 17:26:09 archlinux kernel: <TASK>
Apr 01 17:26:09 archlinux kernel: ? __warn+0x80/0x120
Apr 01 17:26:09 archlinux kernel: ? bch2_trans_srcu_unlock+0x112/0x120 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: ? report_bug+0x160/0x190
Apr 01 17:26:09 archlinux kernel: ? prb_read_valid+0x1b/0x20
Apr 01 17:26:09 archlinux kernel: ? handle_bug+0x38/0x70
Apr 01 17:26:09 archlinux kernel: ? exc_invalid_op+0x17/0x60
Apr 01 17:26:09 archlinux kernel: ? asm_exc_invalid_op+0x1a/0x20
Apr 01 17:26:09 archlinux kernel: ? bch2_trans_srcu_unlock+0x112/0x120 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: ? bch2_trans_srcu_unlock+0x112/0x120 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: bch2_moving_ctxt_do_pending_writes+0x75/0x220 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: bch2_data_update_init+0x680/0x1380 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: ? bch2_move_extent+0x3d4/0xb20 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: bch2_move_extent+0x3d4/0xb20 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: ? do_rebalance_extent+0x2e8/0x710 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: do_rebalance_extent+0x2e8/0x710 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: do_rebalance+0x309/0x8b0 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: ? mempool_alloc+0x83/0x1a0
Apr 01 17:26:09 archlinux kernel: ? __alloc_pages+0x1a5/0x360
Apr 01 17:26:09 archlinux kernel: ? local_clock_noinstr+0xd/0xb0
Apr 01 17:26:09 archlinux kernel: ? local_clock+0x15/0x30
Apr 01 17:26:09 archlinux kernel: ? __bch2_trans_get+0x153/0x230 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: ? do_rebalance+0x8b0/0x8b0 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: bch2_rebalance_thread+0x66/0xa0 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: ? bch2_rebalance_thread+0x5c/0xa0 [bcachefs 1f8b0db9c530aa78f3137e0f0ae4a2b54d9a1466]
Apr 01 17:26:09 archlinux kernel: kthread+0xcf/0x100
Apr 01 17:26:09 archlinux kernel: ? kthread_complete_and_exit+0x20/0x20
Apr 01 17:26:09 archlinux kernel: ret_from_fork+0x31/0x50
Apr 01 17:26:09 archlinux kernel: ? kthread_complete_and_exit+0x20/0x20
Apr 01 17:26:09 archlinux kernel: ret_from_fork_asm+0x11/0x20
Apr 01 17:26:09 archlinux kernel: </TASK>
Apr 01 17:26:09 archlinux kernel: ---[ end trace 0000000000000000 ]---
I seem to be able to trigger it reliably when there is a rebalance process running for any filesystem, and I then try and copy any large mass of files. All bcachefs filesystems tend to lockup at that point. Though I didn't think to check and see whether it would correct itself once the rebalance had finished.
I seem to be able to trigger it reliably when there is a rebalance process running for any filesystem, and I then try and copy any large mass of files. All bcachefs filesystems tend to lockup at that point. Though I didn't think to check and see whether it would correct itself once the rebalance had finished.
I am seeing the same thing - not sure if it is because I am transferring a large amount of data to a new BCFS (which has an NVME LV device and a large RUST LV. - NB Both VG's are LUKS encrypted.) Is the rust too slow?
Kernel 6.9.0-rc7 Data_replicas = 1 (for now) Compression/Background = none/zstd:10 (and lz4, tried both)
[Wed May 8 04:05:27 2024] ------------[ cut here ]------------ [Wed May 8 04:05:27 2024] btree trans held srcu lock (delaying memory reclaim) for 40 seconds [Wed May 8 04:05:27 2024] ------------[ cut here ]------------ [Wed May 8 04:05:27 2024] btree trans held srcu lock (delaying memory reclaim) for 40 seconds [Wed May 8 04:05:27 2024] btree trans held srcu lock (delaying memory reclaim) for 39 seconds [Wed May 8 04:05:27 2024] btree trans held srcu lock (delaying memory reclaim) for 40 seconds [Wed May 8 04:05:27 2024] WARNING: CPU: 4 PID: 14740 at fs/bcachefs/btree_iter.c:2871 bch2_trans_put+0x250/0x290 [Wed May 8 04:05:27 2024] WARNING: CPU: 11 PID: 83823 at fs/bcachefs/btree_iter.c:2871 bch2_trans_srcu_unlock+0x136/0x150 [Wed May 8 04:05:27 2024] WARNING: CPU: 5 PID: 80226 at fs/bcachefs/btree_iter.c:2871 bch2_trans_put+0x250/0x290 ...
[Wed May 8 04:05:27 2024] Workqueue: bcachefs_btree_io btree_node_write_work [Wed May 8 04:05:27 2024] R13: ffff88aa6ef4bf38 R14: ffff88aa9c800000 R15: 0000000000000000
[Wed May 8 04:05:27 2024] FS: 00007fbe73fa3c40(0000) GS:ffff88ae4ee00000(0000) knlGS:0000000000000000 [Wed May 8 04:05:27 2024] FS: 00007fbe73fa3c40(0000) GS:ffff88ae4f100000(0000) knlGS:0000000000000000 [Wed May 8 04:05:27 2024] RIP: 0010:bch2_trans_put+0x250/0x290 [Wed May 8 04:05:27 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Wed May 8 04:05:27 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Wed May 8 04:05:27 2024] CR2: 0000556e27ebc000 CR3: 00000003d7120000 CR4: 0000000000350ef0 [Wed May 8 04:05:27 2024] Code: c6 f8 71 01 48 c7 c7 38 1c 86 84 48 b8 cf f7 53 e3 a5 9b c4 20 48 29 ca 48 d1 ea 48 f7 e2 48 89 d6 48 c1 ee 04 e8 30 2e a1 ff <0f> 0b 41 8b b4 24 a8 00 00 00 49 8d be 68 36 00 00 83 fe 01 77 1b [Wed May 8 04:05:27 2024] CR2: 00007f818dbe9c3c CR3: 00000003d45a4000 CR4: 0000000000350ef0 [Wed May 8 04:05:27 2024] Call Trace: [Wed May 8 04:05:27 2024] RSP: 0018:ffff98f10710fdb8 EFLAGS: 00010246 [Wed May 8 04:05:27 2024] Call Trace:
[Wed May 8 04:05:27 2024] <TASK> [Wed May 8 04:05:27 2024] <TASK> [Wed May 8 04:05:27 2024] RAX: 0000000000000000 RBX: ffff88ac5dc0a200 RCX: 0000000000000000 [Wed May 8 04:05:27 2024] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [Wed May 8 04:05:27 2024] RBP: ffff98f10710fde0 R08: 0000000000000000 R09: 0000000000000000 [Wed May 8 04:05:27 2024] R13: ffff88ac16f88000 R14: ffff88aa9c800000 R15: 00000be70e5a5601 [Wed May 8 04:05:27 2024] ? show_regs+0x6c/0x80 [Wed May 8 04:05:27 2024] ? show_regs+0x6c/0x80 [Wed May 8 04:05:27 2024] FS: 0000000000000000(0000) GS:ffff88ae4f180000(0000) knlGS:0000000000000000 [Wed May 8 04:05:27 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Wed May 8 04:05:27 2024] ? __warn+0x88/0x140 [Wed May 8 04:05:27 2024] ? __warn+0x88/0x140 [Wed May 8 04:05:27 2024] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88a95b4b0000 [Wed May 8 04:05:27 2024] R13: ffff88a75204c720 R14: ffff88aa9c800000 R15: 0000000000000000 [Wed May 8 04:05:27 2024] ? bch2_trans_put+0x250/0x290 [Wed May 8 04:05:27 2024] ? bch2_trans_put+0x250/0x290 [Wed May 8 04:05:27 2024] FS: 0000000000000000(0000) GS:ffff88ae4ee80000(0000) knlGS:0000000000000000 [Wed May 8 04:05:27 2024] CR2: 00007feaec1febcc CR3: 000000012bc06000 CR4: 0000000000350ef0 [Wed May 8 04:05:27 2024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Wed May 8 04:05:27 2024] Call Trace: [Wed May 8 04:05:27 2024] CR2: 00007f22b7ce7d18 CR3: 0000000223328000 CR4: 0000000000350ef0 [Wed May 8 04:05:27 2024] ? report_bug+0x182/0x1b0 [Wed May 8 04:05:27 2024] <TASK> [Wed May 8 04:05:27 2024] Call Trace: [Wed May 8 04:05:27 2024] <TASK> [Wed May 8 04:05:27 2024] ? show_regs+0x6c/0x80 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? report_bug+0x182/0x1b0 [Wed May 8 04:05:27 2024] ? __warn+0x88/0x140 [Wed May 8 04:05:27 2024] ? handle_bug+0x51/0xa0 [Wed May 8 04:05:27 2024] ? bch2_trans_srcu_unlock+0x136/0x150 [Wed May 8 04:05:27 2024] ? handle_bug+0x51/0xa0 [Wed May 8 04:05:27 2024] ? exc_invalid_op+0x18/0x80 [Wed May 8 04:05:27 2024] ? show_regs+0x6c/0x80 [Wed May 8 04:05:27 2024] ? report_bug+0x182/0x1b0 [Wed May 8 04:05:27 2024] ? exc_invalid_op+0x18/0x80 [Wed May 8 04:05:27 2024] ? __warn+0x88/0x140 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? asm_exc_invalid_op+0x1b/0x20 [Wed May 8 04:05:27 2024] ? asm_exc_invalid_op+0x1b/0x20 [Wed May 8 04:05:27 2024] ? bch2_trans_put+0x250/0x290 [Wed May 8 04:05:27 2024] ? report_bug+0x182/0x1b0 [Wed May 8 04:05:27 2024] ? bch2_trans_put+0x250/0x290 [Wed May 8 04:05:27 2024] ? bch2_trans_put+0x250/0x290 [Wed May 8 04:05:27 2024] ? handle_bug+0x51/0xa0 [Wed May 8 04:05:27 2024] bch2_setattr_nonsize+0x1a0/0x4a0 [Wed May 8 04:05:27 2024] bch2_setattr_nonsize+0x1a0/0x4a0 [Wed May 8 04:05:27 2024] ? exc_invalid_op+0x18/0x80 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? handle_bug+0x51/0xa0 [Wed May 8 04:05:27 2024] ? exc_invalid_op+0x18/0x80 [Wed May 8 04:05:27 2024] ? asm_exc_invalid_op+0x1b/0x20 : [Wed May 8 04:05:27 2024] bch2_setattr+0x88/0xa0 [Wed May 8 04:05:27 2024] bch2_setattr+0x88/0xa0 [Wed May 8 04:05:27 2024] ? bch2_trans_put+0x250/0x290 [Wed May 8 04:05:27 2024] notify_change+0x1f1/0x4d0 [Wed May 8 04:05:27 2024] notify_change+0x1f1/0x4d0 [Wed May 8 04:05:27 2024] ? bch2_trans_put+0x250/0x290 [Wed May 8 04:05:27 2024] btree_node_write_work+0x140/0x480 [Wed May 8 04:05:27 2024] chown_common+0x249/0x260 [Wed May 8 04:05:27 2024] vfs_utimes+0x13d/0x270 [Wed May 8 04:05:27 2024] ? chown_common+0x249/0x260 [Wed May 8 04:05:27 2024] ? vfs_utimes+0x13d/0x270 [Wed May 8 04:05:27 2024] process_one_work+0x18d/0x3f0 [Wed May 8 04:05:27 2024] ? asm_exc_invalid_op+0x1b/0x20 [Wed May 8 04:05:27 2024] do_fchownat+0xe2/0x110 [Wed May 8 04:05:27 2024] do_utimes+0xd5/0x150 [Wed May 8 04:05:27 2024] ? bch2_trans_srcu_unlock+0x136/0x150 [Wed May 8 04:05:27 2024] worker_thread+0x304/0x440 [Wed May 8 04:05:27 2024] bch2_trans_begin+0x608/0x6a0 [Wed May 8 04:05:27 2024] __x64_sys_utimensat+0x9d/0xf0 [Wed May 8 04:05:27 2024] ? __pfx_worker_thread+0x10/0x10 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] kthread+0xe4/0x110 [Wed May 8 04:05:27 2024] __x64_sys_lchown+0x23/0x40 [Wed May 8 04:05:27 2024] ? bch2_path_put+0x412/0x640 [Wed May 8 04:05:27 2024] ? __pfx_kthread+0x10/0x10 [Wed May 8 04:05:27 2024] x64_sys_call+0x1513/0x25c0 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] do_syscall_64+0x71/0x130 [Wed May 8 04:05:27 2024] ret_from_fork+0x47/0x70 [Wed May 8 04:05:27 2024] bch2_write_index_default+0xce/0x4e0 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? __pfx_kthread+0x10/0x10 [Wed May 8 04:05:27 2024] ? mntput_no_expire+0x51/0x260 [Wed May 8 04:05:27 2024] ret_from_fork_asm+0x1a/0x30 [Wed May 8 04:05:27 2024] ? update_load_avg+0x82/0x840 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] </TASK> [Wed May 8 04:05:27 2024] ? mntput+0x24/0x50 [Wed May 8 04:05:27 2024] x64_sys_call+0x1f5d/0x25c0 [Wed May 8 04:05:27 2024] __bch2_write_index+0x1b2/0x2c0 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] do_syscall_64+0x71/0x130 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? path_put+0x1e/0x30 [Wed May 8 04:05:27 2024] ? __bch2_write_index+0x1b2/0x2c0 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ---[ end trace 0000000000000000 ]--- [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? path_getxattr+0x88/0xe0 [Wed May 8 04:05:27 2024] bch2_write_point_do_index_updates+0xa1/0x190 [Wed May 8 04:05:27 2024] ? syscall_exit_to_user_mode+0x81/0x1c0 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] process_one_work+0x18d/0x3f0 [Wed May 8 04:05:27 2024] ? do_syscall_64+0x7d/0x130 [Wed May 8 04:05:27 2024] ? syscall_exit_to_user_mode+0x81/0x1c0 [Wed May 8 04:05:27 2024] worker_thread+0x304/0x440 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? do_syscall_64+0x7d/0x130 [Wed May 8 04:05:27 2024] ? __pfx_worker_thread+0x10/0x10 [Wed May 8 04:05:27 2024] kthread+0xe4/0x110 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? syscall_exit_to_user_mode+0x81/0x1c0 [Wed May 8 04:05:27 2024] ? __pfx_kthread+0x10/0x10 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ret_from_fork+0x47/0x70 [Wed May 8 04:05:27 2024] ? do_syscall_64+0x7d/0x130 [Wed May 8 04:05:27 2024] ? __pfx_kthread+0x10/0x10 [Wed May 8 04:05:27 2024] ? do_syscall_64+0x7d/0x130 [Wed May 8 04:05:27 2024] ret_from_fork_asm+0x1a/0x30 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? do_syscall_64+0x7d/0x130 [Wed May 8 04:05:27 2024] </TASK> [Wed May 8 04:05:27 2024] ? syscall_exit_to_user_mode+0x81/0x1c0 [Wed May 8 04:05:27 2024] ? terminate_walk+0x65/0x100 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? do_syscall_64+0x7d/0x130 [Wed May 8 04:05:27 2024] ---[ end trace 0000000000000000 ]--- [Wed May 8 04:05:27 2024] ? do_syscall_64+0x7d/0x130 [Wed May 8 04:05:27 2024] ? path_lookupat+0x96/0x1b0 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? filename_lookup+0xda/0x1f0 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? _copy_to_user+0x25/0x50 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? cp_new_stat+0x142/0x180 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Wed May 8 04:05:27 2024] RIP: 0033:0x7fbe7409ddc7 [Wed May 8 04:05:27 2024] Code: 73 01 c3 48 8b 0d 51 a0 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 5e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 a0 0d 00 f7 d8 64 89 01 48 [Wed May 8 04:05:27 2024] RSP: 002b:00007fffb020ae38 EFLAGS: 00000246 ORIG_RAX: 000000000000005e [Wed May 8 04:05:27 2024] RAX: ffffffffffffffda RBX: 0000563e286758e8 RCX: 00007fbe7409ddc7 [Wed May 8 04:05:27 2024] RDX: 00000000000003e8 RSI: 00000000000003e8 RDI: 00007fffb020d0d0 [Wed May 8 04:05:27 2024] RBP: 0000000000000004 R08: 00000000000003e8 R09: 0000563e2cde5a30 [Wed May 8 04:05:27 2024] R10: 61f689028dc19bfe R11: 0000000000000246 R12: 00000000000081a4 [Wed May 8 04:05:27 2024] ? __do_sys_newfstatat+0x53/0x90 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? syscall_exit_to_user_mode+0x81/0x1c0 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? do_syscall_64+0x7d/0x130 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? do_syscall_64+0x7d/0x130 [Wed May 8 04:05:27 2024] ? srso_return_thunk+0x5/0x5f [Wed May 8 04:05:27 2024] ? do_syscall_64+0x7d/0x130 [Wed May 8 04:05:27 2024] entry_SYSCALL_64_after_hwframe+0x76/0x7e [Wed May 8 04:05:27 2024] RIP: 0033:0x7fbe740a117f [Wed May 8 04:05:27 2024] Code: 44 00 00 48 8b 15 99 6c 0d 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 41 89 ca 48 85 f6 74 2a b8 18 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 09 c3 0f 1f 84 00 00 00 00 00 48 8b 15 61 6c [Wed May 8 04:05:27 2024] RSP: 002b:00007fffb020bc08 EFLAGS: 00000202 ORIG_RAX: 0000000000000118 [Wed May 8 04:05:27 2024] RAX: ffffffffffffffda RBX: 00007fffb020bca0 RCX: 00007fbe740a117f [Wed May 8 04:05:27 2024] RDX: 00007fffb020bc10 RSI: 00007fffb020e0d0 RDI: 00000000ffffff9c [Wed May 8 04:05:27 2024] RBP: 00007fffb020e0d0 R08: 0000000000000008 R09: 0000000000000001 [Wed May 8 04:05:27 2024] R10: 0000000000000100 R11: 0000000000000202 R12: 00000000000045c0 [Wed May 8 04:05:27 2024] R13: 0000000000000008 R14: 00007fffb020bfa0 R15: 0000000000000004 [Wed May 8 04:05:27 2024] R13: 00007fffb020b0d0 R14: 00007fffb020ae60 R15: 00000000000003e8 [Wed May 8 04:05:27 2024] </TASK> [Wed May 8 04:05:27 2024] ---[ end trace 0000000000000000 ]--- [Wed May 8 04:05:27 2024] </TASK> [Wed May 8 04:05:27 2024] ---[ end trace 0000000000000000 ]---
Requires a long reboot to recover, although I can still read the fs, writes hang. There is no IO on dm-1 and dm-17 so it does not appear to be doing any balancing or compression.
I see a lot of these too, but one of my block devices is network backed, and the network can be occasionally slow. It would be nice to disable these warnings, because they are expected, are not signs of a problem, but still fill up the system log
Kernel 6.8.9-arch1-2 I only trigger this issue when doing heavy downloading that exceeds my cache drive.
- queue 500gb of downloads in sabnzbd
- 250gb cache drive fills up
- sabnzbd still trying to push 70mb/s into the fs with completely filled cache, while doing post processing like repair or archive unpacking on finished downloads
- server gets funny and starts spamming btree trans held srcu lock until reboot.
it seems that sabnzbd and the file system ends up in some form of "write-stalemate".