BUG: unable to handle page fault for address: 0000000000422a99, looks like invalid pointer to kfree()?
System information
Linux 6.8.0-56-generic #58-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 14 15:33:28 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/os-release PRETTY_NAME="Ubuntu 24.04.2 LTS" NAME="Ubuntu" VERSION_ID="24.04" VERSION="24.04.2 LTS (Noble Numbat)" VERSION_CODENAME=noble ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=noble LOGO=ubuntu-logo
filename: /lib/modules/6.8.0-56-generic/kernel/zfs/zfs.ko.zst version: 2.2.2-0ubuntu9.1
| Type | Version/Name |
|---|---|
| Distribution Name | Ubuntu |
| Distribution Version | 24.04 |
| Kernel Version | 6.8.0-56-generic #58-Ubuntu SMP PREEMPT_DYNAMIC |
| Architecture | x86_64 |
| OpenZFS Version | 2.2.2-0ubuntu9.1 |
zfs version zfs-2.2.2-0ubuntu9.1 zfs-kmod-2.2.2-0ubuntu9.1
Describe the problem you're observing
Application was using zfs and we received this in the kernel log...
May 25 17:24:05 Model-5TB kernel: BUG: unable to handle page fault for address: 0000000000422a99
May 25 17:24:05 Model-5TB kernel: #PF: supervisor write access in kernel mode
May 25 17:24:05 Model-5TB kernel: #PF: error_code(0x0002) - not-present page
May 25 17:24:05 Model-5TB kernel: PGD 0 P4D 0
May 25 17:24:05 Model-5TB kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
May 25 17:24:05 Model-5TB kernel: CPU: 6 PID: 2741119 Comm: z_wr_iss Tainted: P O 6.8.0-56-generic #58-Ubuntu
May 25 17:24:05 Model-5TB kernel: Hardware name: AZW SER8/SER8, BIOS SER8_P5C8V29 08/14/2024
Describe how to reproduce the problem
Sorry, no idea, I was hoping the stack trace would help, I will attach it.
Include any warning/errors/backtraces from the system logs
May 25 17:24:05 Model-5TB kernel: BUG: unable to handle page fault for address: 0000000000422a99
May 25 17:24:05 Model-5TB kernel: #PF: supervisor write access in kernel mode
May 25 17:24:05 Model-5TB kernel: #PF: error_code(0x0002) - not-present page
May 25 17:24:05 Model-5TB kernel: PGD 0 P4D 0
May 25 17:24:05 Model-5TB kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
May 25 17:24:05 Model-5TB kernel: CPU: 6 PID: 2741119 Comm: z_wr_iss Tainted: P O 6.8.0-56-generic #58-Ubuntu
May 25 17:24:05 Model-5TB kernel: Hardware name: AZW SER8/SER8, BIOS SER8_P5C8V29 08/14/2024
May 25 17:24:05 Model-5TB kernel: RIP: 0010:free_unref_page_commit+0x25/0x370
May 25 17:24:05 Model-5TB kernel: Code: 90 90 90 90 90 0f 1f 44 00 00 55 41 89 c9 44 89 c1 48 89 e5 41 57 41 56 41 55 49 89 fd 41 54 49 89 d4 53 48 89 f3 48 83 ec 18 <d0> 6e 19 41 83 f8 1f 0f 87 f4 58 d8 00 41 bf 01 00 00 00 41 d3 e7
May 25 17:24:05 Model-5TB kernel: RSP: 0018:ffff9b558d373420 EFLAGS: 00010282
May 25 17:24:05 Model-5TB kernel: RAX: 0017ffffc0000000 RBX: 0000000000422a80 RCX: 0000000000000000
May 25 17:24:05 Model-5TB kernel: RDX: 0000000000000084 RSI: 0000000000422a80 RDI: 0000000000000007
May 25 17:24:05 Model-5TB kernel: RBP: ffff9b558d373440 R08: 0000000000000000 R09: 0000000000000008
May 25 17:24:05 Model-5TB kernel: R10: 0000000000000001 R11: 0000000000000008 R12: fffffc5b508aa000
May 25 17:24:05 Model-5TB kernel: R13: 0000000000000003 R14: 0000000000000008 R15: 0000000000000000
May 25 17:24:05 Model-5TB kernel: FS: 0000000000000000(0000) GS:ffff8ed8be500000(0000) knlGS:0000000000000000
May 25 17:24:05 Model-5TB kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 25 17:24:05 Model-5TB kernel: CR2: 0000000000422a99 CR3: 000000041083c000 CR4: 0000000000f50ef0
May 25 17:24:05 Model-5TB kernel: PKRU: 55555554
May 25 17:24:05 Model-5TB kernel: Call Trace:
May 25 17:24:05 Model-5TB kernel: <TASK>
May 25 17:24:05 Model-5TB kernel: ? show_regs+0x6d/0x80
May 25 17:24:05 Model-5TB kernel: ? __die+0x24/0x80
May 25 17:24:05 Model-5TB kernel: ? page_fault_oops+0x99/0x1b0
May 25 17:24:05 Model-5TB kernel: ? do_user_addr_fault+0x2e9/0x670
May 25 17:24:05 Model-5TB kernel: ? exc_page_fault+0x83/0x1b0
May 25 17:24:05 Model-5TB kernel: ? asm_exc_page_fault+0x27/0x30
May 25 17:24:05 Model-5TB kernel: ? free_unref_page_commit+0x25/0x370
May 25 17:24:05 Model-5TB kernel: free_unref_page_prepare+0x235/0x3e0
May 25 17:24:05 Model-5TB kernel: free_unref_page+0x34/0x1c0
May 25 17:24:05 Model-5TB kernel: destroy_large_folio+0x6c/0xa0
May 25 17:24:05 Model-5TB kernel: __folio_put+0x78/0x90
May 25 17:24:05 Model-5TB kernel: free_large_kmalloc+0x6b/0xc0
May 25 17:24:05 Model-5TB kernel: kfree+0x2ab/0x370
May 25 17:24:05 Model-5TB kernel: ? srso_alias_return_thunk+0x5/0xfbef5
May 25 17:24:05 Model-5TB kernel: spl_kmem_free_impl+0x2c/0x40 [spl]
May 25 17:24:05 Model-5TB kernel: spl_vmem_free+0xe/0x20 [spl]
May 25 17:24:05 Model-5TB kernel: gcm_mode_encrypt_contiguous_blocks_avx+0x166/0x550 [zfs]
May 25 17:24:05 Model-5TB kernel: gcm_mode_encrypt_contiguous_blocks+0x360/0x460 [zfs]
May 25 17:24:05 Model-5TB kernel: ? gcm_init_avx+0x17c/0x250 [zfs]
May 25 17:24:05 Model-5TB kernel: ? srso_alias_return_thunk+0x5/0xfbef5
May 25 17:24:05 Model-5TB kernel: ? gcm_init_ctx_impl+0x150/0x310 [zfs]
May 25 17:24:05 Model-5TB kernel: ? __pfx_aes_encrypt_contiguous_blocks+0x10/0x10 [zfs]
May 25 17:24:05 Model-5TB kernel: aes_encrypt_contiguous_blocks+0x109/0x130 [zfs]
May 25 17:24:05 Model-5TB kernel: ? __pfx_aes_copy_block+0x10/0x10 [zfs]
May 25 17:24:05 Model-5TB kernel: ? __pfx_aes_xor_block+0x10/0x10 [zfs]
May 25 17:24:05 Model-5TB kernel: crypto_update_uio+0xcd/0x110 [zfs]
May 25 17:24:05 Model-5TB kernel: aes_encrypt_atomic+0x146/0x340 [zfs]
May 25 17:24:05 Model-5TB kernel: crypto_encrypt+0x75/0x220 [zfs]
May 25 17:24:05 Model-5TB kernel: zio_do_crypt_uio+0x275/0x3c0 [zfs]
May 25 17:24:05 Model-5TB kernel: zio_do_crypt_data+0x28f/0x4e0 [zfs]
May 25 17:24:05 Model-5TB kernel: spa_do_crypt_abd+0x11e/0x2f0 [zfs]
May 25 17:24:05 Model-5TB kernel: zio_encrypt+0x4da/0x750 [zfs]
May 25 17:24:05 Model-5TB kernel: zio_execute+0x92/0xf0 [zfs]
May 25 17:24:05 Model-5TB kernel: taskq_thread+0x1f3/0x3c0 [spl]
May 25 17:24:05 Model-5TB kernel: ? __pfx_default_wake_function+0x10/0x10
May 25 17:24:05 Model-5TB kernel: ? __pfx_zio_execute+0x10/0x10 [zfs]
May 25 17:24:05 Model-5TB kernel: ? __pfx_taskq_thread+0x10/0x10 [spl]
May 25 17:24:05 Model-5TB kernel: kthread+0xef/0x120
May 25 17:24:05 Model-5TB kernel: ? __pfx_kthread+0x10/0x10
May 25 17:24:05 Model-5TB kernel: ret_from_fork+0x44/0x70
May 25 17:24:05 Model-5TB kernel: ? __pfx_kthread+0x10/0x10
May 25 17:24:05 Model-5TB kernel: ret_from_fork_asm+0x1b/0x30
May 25 17:24:05 Model-5TB kernel: </TASK>
May 25 17:24:05 Model-5TB kernel: Modules linked in: netlink_diag vhost_net vhost vhost_iotlb tap xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables libcrc32c wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel sp5100_tco binfmt_misc intel_rapl_msr intel_rapl_common snd_soc_dmic snd_soc_ps_mach snd_ps_pdm_dma snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir iwlmvm snd_sof_amd_acp edac_mce_amd snd_sof_pci snd_sof_xtensa_dsp kvm_amd snd_sof snd_hda_codec_realtek snd_sof_utils snd_hda_codec_generic snd_hda_codec_hdmi kvm snd_soc_core mac80211 snd_compress btusb snd_hda_intel ac97_bus amdxcp snd_intel_dspcfg snd_pcm_dmaengine btrtl drm_exec snd_intel_sdw_acpi irqbypass btintel gpu_sched crct10dif_pclmul snd_pci_ps drm_buddy btbcm snd_rpl_pci_acp6x polyval_clmulni snd_hda_codec polyval_generic drm_suballoc_helper btmtk
May 25 17:24:05 Model-5TB kernel: snd_acp_pci ghash_clmulni_intel drm_ttm_helper libarc4 snd_hda_core snd_acp_legacy_common sha256_ssse3 snd_hwdep bluetooth snd_pci_acp6x ttm sha1_ssse3 bridge snd_pcm aesni_intel drm_display_helper crypto_simd nls_iso8859_1 stp ecdh_generic iwlwifi llc snd_pci_acp5x snd_timer cryptd ecc cec snd_rn_pci_acp3x snd_acp_config snd rc_core zfs(PO) snd_soc_acpi i2c_algo_bit ccp snd_pci_acp3x soundcore rapl serio_raw k10temp i2c_piix4 cfg80211 spl(O) amd_pmc mac_hid sch_fq_codel efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 nvme thunderbolt crc32_pclmul psmouse nvme_core r8169 xhci_pci xhci_pci_renesas nvme_auth realtek video wmi
May 25 17:24:05 Model-5TB kernel: CR2: 0000000000422a99
May 25 17:24:05 Model-5TB kernel: ---[ end trace 0000000000000000 ]---
May 25 17:24:05 Model-5TB kernel: RIP: 0010:free_unref_page_commit+0x25/0x370
May 25 17:24:05 Model-5TB kernel: Code: 90 90 90 90 90 0f 1f 44 00 00 55 41 89 c9 44 89 c1 48 89 e5 41 57 41 56 41 55 49 89 fd 41 54 49 89 d4 53 48 89 f3 48 83 ec 18 <d0> 6e 19 41 83 f8 1f 0f 87 f4 58 d8 00 41 bf 01 00 00 00 41 d3 e7
May 25 17:24:05 Model-5TB kernel: RSP: 0018:ffff9b558d3734h20 EFLAGS: 00010282
May 25 17:24:05 Model-5TB kernel: RAX: 0017ffffc0000000 RBX: 0000000000422a80 RCX: 0000000000000000
May 25 17:24:05 Model-5TB kernel: RDX: 0000000000000084 RSI: 0000000000422a80 RDI: 0000000000000007
May 25 17:24:05 Model-5TB kernel: RBP: ffff9b558d373440 R08: 0000000000000000 R09: 0000000000000008
May 25 17:24:05 Model-5TB kernel: R10: 0000000000000001 R11: 0000000000000008 R12: fffffc5b508aa000
May 25 17:24:05 Model-5TB kernel: R13: 0000000000000003 R14: 0000000000000008 R15: 0000000000000000
May 25 17:24:05 Model-5TB kernel: FS: 0000000000000000(0000) GS:ffff8ed8be500000(0000) knlGS:0000000000000000
May 25 17:24:05 Model-5TB kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 25 17:24:05 Model-5TB kernel: CR2: 0000000000422a99 CR3: 0000000107884000 CR4: 0000000000f50ef0
May 25 17:24:05 Model-5TB kernel: PKRU: 55555554
May 25 17:24:05 Model-5TB kernel: note: z_wr_iss[2741119] exited with irqs disabled
May 25 17:24:05 Model-5TB systemd[1]: Reloading requested from client PID 2742601 ('systemctl') (unit box.service)...
May 25 17:24:05 Model-5TB systemd[1]: Reloading...
May 25 17:24:05 Model-5TB systemd[1]: Reloading finished in 128 ms.
Are you using encryption? I wouldn't do that in production yet, and if I really wanted to do it, I'd use the ZFS code directly from openzfs, not Ubuntu's version.
Why not? Encryption has been in zfs since 2019 v 0.8.0. that's 6 years. And it's not ready for prime time?
there's been a history of encryption-related bugs. To my knowledge the only current known issues are with zfs receive of things sent with zfs send -w. At any rate, Ubuntu hasn't updated their ZFS in 24.04 since 2.2.2. Instead they cherry-pick patches. but it's not clear they take all patches. A lot of us think it's hard to be sure that the result is actually equivalent in reliability to the latest Openzfs release, which would be 2.2.7, and shortly 2.2.8. Particularly if I was going to use a feature with a known history of issues, I'd prefer an Openzfs released version.
There have been various proposals to put warnings in documentation. Here's the most recent: https://github.com/openzfs/zfs/pull/16745 Note that the situation where problems occur is fairly limited, but it's also a scenario that comes up in production sites, since send | receive is commonly used for backup. We lost a large production file server to an encryption bug triggered by zfs send | receive to a backup server. I suspect the particular problem we ran into is now fixed, but it looks like there may be one remaining.
FYI : ZFS 2.2.8 includes a fix for ZFS send bug which triggered when encryption is enabled. I don't think there are any known bugs related to ZFS encryption, which cause stability problems.
You mean this? Fix 2 bugs in non-raw send with encryption https://github.com/openzfs/zfs/issues/12014 https://github.com/openzfs/zfs/pull/17340
We're doing raw sends, so maybe that's related maybe not?
Yes, this is what I meant. It might be unrelated. There was also https://github.com/openzfs/zfs/pull/17353 which fixes another issue which can possibly cause in memory corruption. There is also https://github.com/openzfs/zfs/pull/16723 which seem to fix folio migration.
In grand scheme of things, ZFS devs know much better what happens in the openzfs tree and most familiar with the master branch where the main development happens. They know less about current and previous release, but they care about it because those are used by their employers. Ubuntu ZFS is outside of control for the openzfs developers. They have no sense of ownership to care to fix that, it is up-to Ubuntu developers to make sure that everything works fine in their tree.
To make this report actionable you might want to reproduce it on openzfs {master, 2.2.8, 2.3.2+}. People will not want to spend time debugging something, which might have already been fixed.
So we discussed and realized the same thing, we're not expecting anybody to dig into ubuntu's who-knows-what-patches version of 2.2.2 to fix bugs that are probably already fixed. But while everybody says "oh just upgrade to 2.2.8" the problem is that requires manpower and a lot of testing resources we don't have, and there's always a risk we'd be trading one bug for another. This is not easily reproducible, we saw it in the wild so I opened an issue hoping somebody would know something, and to that end it did work, I hadn't realized some related bugs had been fixed. So thanks for the feedback, I'll close this issue as I don't realistically expect anybody to spend any more time on it, but at least there's a record of it in case somebody else comes across the same thing.
Thanks for the info.