NULL pointer dereference, followed by "Journal stuck? Waited for 10 seconds"
My bcachefs became inaccessible today, dmesg below. I'm on NixOS, kernel 6.14.3. The last section keeps reapeating. I've yet to reboot the machine, I will report if that's not enough to recover.
uname -a:
Linux hallewell 6.14.3 #1-NixOS SMP PREEMPT_DYNAMIC Sun Apr 20 08:23:22 UTC 2025 x86_64 GNU/Linux
[108535.916975] BUG: kernel NULL pointer dereference, address: 0000000000000000 04:33:44 [1046/1852]
[108535.917117] #PF: supervisor write access in kernel mode [108535.917252] #PF: error_code(0x0002) - not-present page [108535.917390] PGD 0 P4D 0 [108535.917529] Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI [108535.917672] CPU: 2 UID: 0 PID: 963 Comm: bch-reclaim/8f5 Tainted: G W 6.14.3 #1-NixOS [108535.917823] Tainted: [W]=WARN [108535.917969] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z390 Extreme4, BIOS P4.30 11/29/2019 [108535.918128] RIP: 0010:btree_key_cache_flush_pos.constprop.0+0x361/0x390 [bcachefs] [108535.918357] Code: 08 00 00 44 89 95 e0 fe ff ff e8 7a 4c 04 00 44 8b 95 e0 fe ff ff 84 c0 74 0b 44 89 d3 e9 d1 fd ff ff 90 0f 0b 44 89 d7 be a6 <08> 00 00 44 89 95 e0 fe ff ff e8 50 4c 04 00 44 8b 95 e0 fe ff ff [108535.918546] RSP: 0018:ffffa9594097fdb8 EFLAGS: 00010a97 [108535.918749] RAX: 0000000000000000 RBX: ffff95113a811070 RCX: 00000000ffffffff [108535.918953] RDX: 000000000ad38616 RSI: ffff95113a811071 RDI: ffff9505a8b67001 [108535.919159] RBP: 000000000ad38616 R08: ffffa95941052290 R09: 000000000ad3961d [108535.919367] R10: 0000000000007fff R11: ffffa95941000000 R12: 000000000ad38616 [108535.919578] R13: 0000000000000000 R14: ffff9505a8b67000 R15: ffffffffc31f0b80 [108535.919792] FS: 0000000000000000(0000) GS:ffff9514adf00000(0000) knlGS:0000000000000000 [108535.920012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [108535.920233] CR2: 0000000000000000 CR3: 000000044c422001 CR4: 00000000003726f0 [108535.920499] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [108535.920726] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [108535.920955] Call Trace: [108535.921184] <TASK> [108535.921413] ? journal_flush_pins.constprop.0+0x191/0x340 [bcachefs] [108535.921716] ? __bch2_journal_reclaim+0x1e7/0x3a0 [bcachefs] [108535.922016] ? bch2_journal_reclaim_thread+0x6e/0x150 [bcachefs] [108535.922318] ? __pfx_bch2_journal_reclaim_thread+0x10/0x10 [bcachefs] [108535.922620] ? kthread+0xeb/0x240 [108535.922866] ? __pfx_kthread+0x10/0x10 [108535.923111] ? ret_from_fork+0x31/0x50 [108535.923356] ? __pfx_kthread+0x10/0x10 [108535.923602] ? ret_from_fork_asm+0x1a/0x30 [108535.923851] </TASK> [108535.924098] Modules linked in: bluetooth ecdh_generic ecc msr xt_conntrack xt_MASQUERADE xt_mark nft_compat nft_chain_nat nf_nat af_packet overlay cfg80211 rfkill 8021q bcachefs lz4hc_compress lz4_compress xor raid6_pq nfs netfs nf_log_syslog nft_log nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nf_tables snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_hda_codec_hdmi snd_sof_pci snd_sof_xtensa_dsp sch_fq_codel snd_sof snd_sof_utils snd_soc_acpi_intel_match snd_soc_acpi_intel_sdca_quirks soundwire_generic_allocation snd_soc_acpi tls soundwire_bus atkbd snd_soc_sdca libps2 serio snd_soc_avs vivaldi_fmap loop snd_soc_hda_codec tun intel_rapl_msr snd_hda_ext_core tap intel_rapl_common macvlan snd_soc_core intel_uncore_frequency intel_uncore_frequency_common bridge intel_tcc_cooling [108535.925695] vfio iommufd kvm_intel nfsd kvm auth_rpcgss nfs_acl lockd grace nfs_localio irqbypass fuse sunrpc efi_pstore configfs nfnetlink efivarfs dmi_sysfs autofs4 ext4 crc16 mbcache jbd2 hid_generic sd_mod usbhid hid ahci libahci libata xhci_pci nvme xhci_hcd scsi_mod nvme_core sha256_ssse3 nvme_auth scsi_common video wmi dm_mod dax [108535.928359] CR2: 0000000000000000 [108535.928775] ---[ end trace 0000000000000000 ]---
[108536.132437] RIP: 0010:btree_key_cache_flush_pos.constprop.0+0x361/0x390 [bcachefs] [108536.133004] Code: 08 00 00 44 89 95 e0 fe ff ff e8 7a 4c 04 00 44 8b 95 e0 fe ff ff 84 c0 74 0b 44 89 d3 e9 d1 fd ff ff 90 0f 0b 44 89 d7 be a6 <08> 00 00 44 89 95 e0 fe ff ff e8 50 4c 04 00 44 8b 95 e0 fe ff ff [108536.133517] RSP: 0018:ffffa9594097fdb8 EFLAGS: 00010a97 [108536.134041] RAX: 0000000000000000 RBX: ffff95113a811070 RCX: 00000000ffffffff [108536.134569] RDX: 000000000ad38616 RSI: ffff95113a811071 RDI: ffff9505a8b67001 [108536.135104] RBP: 000000000ad38616 R08: ffffa95941052290 R09: 000000000ad3961d [108536.135644] R10: 0000000000007fff R11: ffffa95941000000 R12: 000000000ad38616 [108536.136182] R13: 0000000000000000 R14: ffff9505a8b67000 R15: ffffffffc31f0b80 [108536.136727] FS: 0000000000000000(0000) GS:ffff9514adf00000(0000) knlGS:0000000000000000 [108536.137275] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [108536.137829] CR2: 0000000000000000 CR3: 000000011e15c006 CR4: 00000000003726f0 [108536.138384] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [108536.138944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [108536.139505] note: bch-reclaim/8f5[963] exited with irqs disabled
[108584.545667] bcachefs (8f552709-24e3-4387-8183-23878c94d00b): Journal stuck? Waited for 10 seconds... flags: replay_done,running,may_skip_flush,space_low dirty journal entries: 24577/32768 seq: 181659158
seq_ondisk: 181659158
last_seq: 181634582
last_seq_ondisk: 181634582
flushed_seq_ondisk: 181659158 watermark: reclaim each entry reserved: 321 nr flush writes: 147824 nr noflush writes: 17301135 average write size: 510 KiB nr direct reclaim: 516214 nr background reclaim: 641531661 reclaim kicked: 1 reclaim runs in: 0 ms blocked: 0 current entry sectors: 1024 current entry error: ok current entry: closed unwritten entries: last buf closed space: discarded 1024:9865216 clean ondisk 1024:9865216 clean 1024:9865216 total 1024:16777216 dev 0
we need a faddr2line for btree_key_cache_flush_pos.constprop.0+0x361/0x390 - that's always tricky with distro kernels, though
If you can join the IRC channel we might be able to figure it out, we've got nixos folks and nixos builds being reproducible helps
I will join IRC when I have more time, for now just leaving notes so I don't forget (and in case someone with the right knowledge stumbles upon this).
The kernel I was running was /boot/kernels/h76kikmwyw3vvcwvwn2jkvpwppmx6f0j-linux-6.14.3-bzImage, which was built from /nix/store/7n33chl4l4n3ii20c0bkp9z7pk1knx9x-linux-6.14.3.tar.xz. I managed to run faddr2line, but it's complains about missing debug symbols (ERROR: CONFIG_DEBUG_INFO not enabled), not sure where to get them from.
yeah, you need the vmlinux (if bcachefs was built in) or some .o files (if built as a module) from the kernel source tree where the build was done.
we really need a standardized way for distros to ship these debug symbols, it's been a real issue
I saw that the NixOS way to enable kernel debug info would be to set boot.kernel.features.debug = true in the system configuration, and then to reboot into the newly built kernel/configuration. After that faddr2line should just work.
I think at this time it would be preferable to just leave this on when you are running bcachefs, just in case some issues would appear and you would need faddr2line to work on the running kernel to help with troubleshooting.
do we know if this bug is still live?
I have only seen it that one time. Never before and never after.