bcachefs icon indicating copy to clipboard operation
bcachefs copied to clipboard

NULL pointer dereference, followed by "Journal stuck? Waited for 10 seconds"

Open ramonacat opened this issue 8 months ago • 4 comments

My bcachefs became inaccessible today, dmesg below. I'm on NixOS, kernel 6.14.3. The last section keeps reapeating. I've yet to reboot the machine, I will report if that's not enough to recover.

uname -a:

Linux hallewell 6.14.3 #1-NixOS SMP PREEMPT_DYNAMIC Sun Apr 20 08:23:22 UTC 2025 x86_64 GNU/Linux
[108535.916975] BUG: kernel NULL pointer dereference, address: 0000000000000000                                                                                                                                                                                                                         04:33:44 [1046/1852]
[108535.917117] #PF: supervisor write access in kernel mode                                                                                                                                                                                                                                                                 [108535.917252] #PF: error_code(0x0002) - not-present page                                                                                                                                                                                                                                                                  [108535.917390] PGD 0 P4D 0                                                                                                                                                                                                                                                                                                 [108535.917529] Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI                                                                                                                                                                                                                                                                     [108535.917672] CPU: 2 UID: 0 PID: 963 Comm: bch-reclaim/8f5 Tainted: G        W          6.14.3 #1-NixOS                                                                                                                                                                                                                   [108535.917823] Tainted: [W]=WARN                                                                                                                                                                                                                                                                                           [108535.917969] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z390 Extreme4, BIOS P4.30 11/29/2019                                                                                                                                                                                                           [108535.918128] RIP: 0010:btree_key_cache_flush_pos.constprop.0+0x361/0x390 [bcachefs]                                                                                                                                                                                                                                      [108535.918357] Code: 08 00 00 44 89 95 e0 fe ff ff e8 7a 4c 04 00 44 8b 95 e0 fe ff ff 84 c0 74 0b 44 89 d3 e9 d1 fd ff ff 90 0f 0b 44 89 d7 be a6 <08> 00 00 44 89 95 e0 fe ff ff e8 50 4c 04 00 44 8b 95 e0 fe ff ff                                                                                                     [108535.918546] RSP: 0018:ffffa9594097fdb8 EFLAGS: 00010a97                                                                                                                                                                                                                                                                 [108535.918749] RAX: 0000000000000000 RBX: ffff95113a811070 RCX: 00000000ffffffff                                                                                                                                                                                                                                           [108535.918953] RDX: 000000000ad38616 RSI: ffff95113a811071 RDI: ffff9505a8b67001                                                                                                                                                                                                                                           [108535.919159] RBP: 000000000ad38616 R08: ffffa95941052290 R09: 000000000ad3961d                                                                                                                                                                                                                                           [108535.919367] R10: 0000000000007fff R11: ffffa95941000000 R12: 000000000ad38616                                                                                                                                                                                                                                           [108535.919578] R13: 0000000000000000 R14: ffff9505a8b67000 R15: ffffffffc31f0b80                                                                                                                                                                                                                                           [108535.919792] FS:  0000000000000000(0000) GS:ffff9514adf00000(0000) knlGS:0000000000000000                                                                                                                                                                                                                                [108535.920012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                                                                                                                                                                           [108535.920233] CR2: 0000000000000000 CR3: 000000044c422001 CR4: 00000000003726f0                                                                                                                                                                                                                                           [108535.920499] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                                                                                                                                                                                                           [108535.920726] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                                                                                                                                                                                                                           [108535.920955] Call Trace:                                                                                                                                                                                                                                                                                                 [108535.921184]  <TASK>                                                                                                                                                                                                                                                                                                     [108535.921413]  ? journal_flush_pins.constprop.0+0x191/0x340 [bcachefs]                                                                                                                                                                                                                                                    [108535.921716]  ? __bch2_journal_reclaim+0x1e7/0x3a0 [bcachefs]                                                                                                                                                                                                                                                            [108535.922016]  ? bch2_journal_reclaim_thread+0x6e/0x150 [bcachefs]                                                                                                                                                                                                                                                        [108535.922318]  ? __pfx_bch2_journal_reclaim_thread+0x10/0x10 [bcachefs]                                                                                                                                                                                                                                                   [108535.922620]  ? kthread+0xeb/0x240                                                                                                                                                                                                                                                                                       [108535.922866]  ? __pfx_kthread+0x10/0x10                                                                                                                                                                                                                                                                                  [108535.923111]  ? ret_from_fork+0x31/0x50                                                                                                                                                                                                                                                                                  [108535.923356]  ? __pfx_kthread+0x10/0x10                                                                                                                                                                                                                                                                                  [108535.923602]  ? ret_from_fork_asm+0x1a/0x30                                                                                                                                                                                                                                                                              [108535.923851]  </TASK>                                                                                                                                                                                                                                                                                                    [108535.924098] Modules linked in: bluetooth ecdh_generic ecc msr xt_conntrack xt_MASQUERADE xt_mark nft_compat nft_chain_nat nf_nat af_packet overlay cfg80211 rfkill 8021q bcachefs lz4hc_compress lz4_compress xor raid6_pq nfs netfs nf_log_syslog nft_log nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nf_tables snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_hda_codec_hdmi snd_sof_pci snd_sof_xtensa_dsp sch_fq_codel snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi tls soundwire_bus atkbd snd_soc_sdca libps2 serio snd_soc_avs vivaldi_fmap loop snd_soc_hda_codec tun intel_rapl_msr snd_hda_ext_core tap intel_rapl_common macvlan snd_soc_core intel_uncore_frequency intel_uncore_frequency_common bridge intel_tcc_cooling                                                                                                                                                                                                                                      [108535.925695]  vfio iommufd kvm_intel nfsd kvm auth_rpcgss nfs_acl lockd grace nfs_localio irqbypass fuse sunrpc efi_pstore configfs nfnetlink efivarfs dmi_sysfs autofs4 ext4 crc16 mbcache jbd2 hid_generic sd_mod usbhid hid ahci libahci libata xhci_pci nvme xhci_hcd scsi_mod nvme_core sha256_ssse3 nvme_auth scsi_common video wmi dm_mod dax                                                                                                                                                                                                                                                                                                 [108535.928359] CR2: 0000000000000000                                                                                                                                                                                                                                                                                       [108535.928775] ---[ end trace 0000000000000000 ]--- 
[108536.132437] RIP: 0010:btree_key_cache_flush_pos.constprop.0+0x361/0x390 [bcachefs]                                                                                                                                                                                                                                      [108536.133004] Code: 08 00 00 44 89 95 e0 fe ff ff e8 7a 4c 04 00 44 8b 95 e0 fe ff ff 84 c0 74 0b 44 89 d3 e9 d1 fd ff ff 90 0f 0b 44 89 d7 be a6 <08> 00 00 44 89 95 e0 fe ff ff e8 50 4c 04 00 44 8b 95 e0 fe ff ff                                                                                                     [108536.133517] RSP: 0018:ffffa9594097fdb8 EFLAGS: 00010a97                                                                                                                                                                                                                                                                 [108536.134041] RAX: 0000000000000000 RBX: ffff95113a811070 RCX: 00000000ffffffff                                                                                                                                                                                                                                           [108536.134569] RDX: 000000000ad38616 RSI: ffff95113a811071 RDI: ffff9505a8b67001                                                                                                                                                                                                                                           [108536.135104] RBP: 000000000ad38616 R08: ffffa95941052290 R09: 000000000ad3961d                                                                                                                                                                                                                                           [108536.135644] R10: 0000000000007fff R11: ffffa95941000000 R12: 000000000ad38616                                                                                                                                                                                                                                           [108536.136182] R13: 0000000000000000 R14: ffff9505a8b67000 R15: ffffffffc31f0b80                                                                                                                                                                                                                                           [108536.136727] FS:  0000000000000000(0000) GS:ffff9514adf00000(0000) knlGS:0000000000000000                                                                                                                                                                                                                                [108536.137275] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                                                                                                                                                                                                                           [108536.137829] CR2: 0000000000000000 CR3: 000000011e15c006 CR4: 00000000003726f0                                                                                                                                                                                                                                           [108536.138384] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                                                                                                                                                                                                                           [108536.138944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                                                                                                                                                                                                                           [108536.139505] note: bch-reclaim/8f5[963] exited with irqs disabled  
[108584.545667] bcachefs (8f552709-24e3-4387-8183-23878c94d00b): Journal stuck? Waited for 10 seconds...                                                                                                                                                                                                                                    flags:                     replay_done,running,may_skip_flush,space_low                                                                                                                                                                                                                                                     dirty journal entries:     24577/32768                                                                                                                                                                                                                                                                                      seq:                       181659158
                seq_ondisk:                181659158
                last_seq:                  181634582
                last_seq_ondisk:           181634582
                flushed_seq_ondisk:        181659158                                                                                                                                                                                                                                                                                        watermark:                 reclaim                                                                                                                                                                                                                                                                                          each entry reserved:       321                                                                                                                                                                                                                                                                                              nr flush writes:           147824                                                                                                                                                                                                                                                                                           nr noflush writes:         17301135                                                                                                                                                                                                                                                                                         average write size:        510 KiB                                                                                                                                                                                                                                                                                          nr direct reclaim:         516214                                                                                                                                                                                                                                                                                           nr background reclaim:     641531661                                                                                                                                                                                                                                                                                        reclaim kicked:            1                                                                                                                                                                                                                                                                                                reclaim runs in:           0 ms                                                                                                                                                                                                                                                                                             blocked:                   0                                                                                                                                                                                                                                                                                                current entry sectors:     1024                                                                                                                                                                                                                                                                                             current entry error:       ok                                                                                                                                                                                                                                                                                               current entry:             closed                                                                                                                                                                                                                                                                                           unwritten entries:                                                                                                                                                                                                                                                                                                          last buf closed                                                                                                                                                                                                                                                                                                             space:                                                                                                                                                                                                                                                                                                                        discarded                1024:9865216                                                                                                                                                                                                                                                                                       clean ondisk             1024:9865216                                                                                                                                                                                                                                                                                       clean                    1024:9865216                                                                                                                                                                                                                                                                                       total                    1024:16777216                                                                                                                                                                                                                                                                                    dev 0                              

ramonacat avatar Apr 28 '25 06:04 ramonacat

we need a faddr2line for btree_key_cache_flush_pos.constprop.0+0x361/0x390 - that's always tricky with distro kernels, though

If you can join the IRC channel we might be able to figure it out, we've got nixos folks and nixos builds being reproducible helps

koverstreet avatar Apr 28 '25 15:04 koverstreet

I will join IRC when I have more time, for now just leaving notes so I don't forget (and in case someone with the right knowledge stumbles upon this).

The kernel I was running was /boot/kernels/h76kikmwyw3vvcwvwn2jkvpwppmx6f0j-linux-6.14.3-bzImage, which was built from /nix/store/7n33chl4l4n3ii20c0bkp9z7pk1knx9x-linux-6.14.3.tar.xz. I managed to run faddr2line, but it's complains about missing debug symbols (ERROR: CONFIG_DEBUG_INFO not enabled), not sure where to get them from.

ramonacat avatar Apr 29 '25 18:04 ramonacat

yeah, you need the vmlinux (if bcachefs was built in) or some .o files (if built as a module) from the kernel source tree where the build was done.

we really need a standardized way for distros to ship these debug symbols, it's been a real issue

koverstreet avatar May 04 '25 18:05 koverstreet

I saw that the NixOS way to enable kernel debug info would be to set boot.kernel.features.debug = true in the system configuration, and then to reboot into the newly built kernel/configuration. After that faddr2line should just work. I think at this time it would be preferable to just leave this on when you are running bcachefs, just in case some issues would appear and you would need faddr2line to work on the running kernel to help with troubleshooting.

himikof avatar May 04 '25 19:05 himikof

do we know if this bug is still live?

koverstreet avatar Aug 02 '25 01:08 koverstreet

I have only seen it that one time. Never before and never after.

ramonacat avatar Aug 02 '25 06:08 ramonacat