zfs
zfs copied to clipboard
kmod-zfs fails to load on Rocky Linux 8.7 kernel
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Rocky Linux |
| Distribution Version | 8.7 |
| Kernel Version | 4.18.0-425.3.1.el8 |
| Architecture | x86_64 |
| OpenZFS Version | 2.1.6 |
Describe the problem you're observing
kmod-zfs-2.1.6-1.el8.x86_64 fails to load and generates a CPU soft lockup
Describe how to reproduce the problem
Boot without ZFS sinstalled
[root@stashcache ~]# yum install zfs ZFS on Linux for EL8 KMOD 2.9 MB/s | 3.0 kB 00:00 Dependencies resolved. ============================================================================================================= Package Architecture Version Repository Size ============================================================================================================= Installing: zfs x86_64 2.1.6-1.el8 zfs-kmod 660 k Installing dependencies: kmod-zfs x86_64 2.1.6-1.el8 zfs-kmod 1.5 M libnvpair3 x86_64 2.1.6-1.el8 zfs-kmod 37 k libuutil3 x86_64 2.1.6-1.el8 zfs-kmod 32 k libzfs5 x86_64 2.1.6-1.el8 zfs-kmod 230 k libzpool5 x86_64 2.1.6-1.el8 zfs-kmod 1.3 M Transaction Summary ============================================================================================================= Install 6 Packages Total size: 3.7 M Installed size: 14 M Is this ok [y/N]: y Downloading Packages: Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Installing : libnvpair3-2.1.6-1.el8.x86_64 1/6 Installing : libuutil3-2.1.6-1.el8.x86_64 2/6 Installing : libzfs5-2.1.6-1.el8.x86_64 3/6 Installing : libzpool5-2.1.6-1.el8.x86_64 4/6 Installing : zfs-2.1.6-1.el8.x86_64 5/6 Running scriptlet: zfs-2.1.6-1.el8.x86_64 5/6 Installing : kmod-zfs-2.1.6-1.el8.x86_64 6/6 Running scriptlet: kmod-zfs-2.1.6-1.el8.x86_64 6/6 Running scriptlet: zfs-2.1.6-1.el8.x86_64 6/6 Running scriptlet: kmod-zfs-2.1.6-1.el8.x86_64 6/6 Verifying : kmod-zfs-2.1.6-1.el8.x86_64 1/6 Verifying : libnvpair3-2.1.6-1.el8.x86_64 2/6 Verifying : libuutil3-2.1.6-1.el8.x86_64 3/6 Verifying : libzfs5-2.1.6-1.el8.x86_64 4/6 Verifying : libzpool5-2.1.6-1.el8.x86_64 5/6 Verifying : zfs-2.1.6-1.el8.x86_64 6/6 Installed: kmod-zfs-2.1.6-1.el8.x86_64 libnvpair3-2.1.6-1.el8.x86_64 libuutil3-2.1.6-1.el8.x86_64 libzfs5-2.1.6-1.el8.x86_64 libzpool5-2.1.6-1.el8.x86_64 zfs-2.1.6-1.el8.x86_64 Complete!
root@stashcache ~]# modprobe zfs Message from [email protected] at Nov 16 10:59:12 ... kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:80625] Message from [email protected] at Nov 16 10:59:40 ... kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:80625] ...
Include any warning/errors/backtraces from the system logs
[root@stashcache ~]# tail -f /var/log/messages ... Nov 16 10:59:12 stashcache.ldas.cit kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:80625] Nov 16 10:59:12 stashcache.ldas.cit kernel: Modules linked in: spl(OE+) nfsv3 nfs_acl nfs lockd grace fscache 8021q garp mrp stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc intel_rapl_msr intel_rapl_common isst_if_common skx_edac iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl ipmi_ssif intel_cstate mlx5_ib ib_uverbs pcspkr ib_core intel_uncore joydev mei_me i2c_i801 lpc_ich mei ioatdma acpi_ipmi ipmi_si dax_pmem_compat ipmi_devintf device_dax ipmi_msghandler dax_pmem_core acpi_pad acpi_power_meter binfmt_misc xfs libcrc32c raid1 nd_pmem nd_btt sd_mod sg mlx5_core ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect mpt3sas sysimgblt fb_sys_fops drm_ttm_helper ttm ahci nvme raid_class libahci mlxfw pci_hyperv_intf Nov 16 10:59:12 stashcache.ldas.cit kernel: drm ixgbe scsi_transport_sas nvme_core crc32c_intel libata tls t10_pi psample mdio dca nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod [last unloaded: zunicode] Nov 16 10:59:12 stashcache.ldas.cit kernel: CPU: 0 PID: 80625 Comm: modprobe Kdump: loaded Tainted: P OE --------- - - 4.18.0-425.3.1.el8.x86_64 #1 Nov 16 10:59:12 stashcache.ldas.cit kernel: Hardware name: Supermicro SYS-2029U-TN24R4T/X11DPU, BIOS 3.8 08/19/2022 Nov 16 10:59:12 stashcache.ldas.cit kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x17b/0x1c0 Nov 16 10:59:12 stashcache.ldas.cit kernel: Code: 74 22 48 89 c1 0f 0d 08 eb 20 f3 90 8b 07 85 c0 75 f8 f0 0f b1 17 75 f2 65 ff 0d 6c 5e 0d 75 e9 1b b6 aa 00 31 c9 eb 02 f3 90 07 66 85 c0 75 f7 41 89 c0 66 45 31 c0 44 39 c6 74 20 c6 07 01 Nov 16 10:59:12 stashcache.ldas.cit kernel: RSP: 0018:ffff9b606e40fc30 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 Nov 16 10:59:12 stashcache.ldas.cit kernel: RAX: 0000000000040001 RBX: 0000000000000002 RCX: 0000000000000000 Nov 16 10:59:12 stashcache.ldas.cit kernel: RDX: ffff8ccd8082bcc0 RSI: 0000000000040000 RDI: ffff8c9f079b3cd0 Nov 16 10:59:12 stashcache.ldas.cit kernel: RBP: ffff8c9f079b3cd0 R08: ffff9b606e40fbd8 R09: ffff8c9f079b2000 Nov 16 10:59:12 stashcache.ldas.cit kernel: R10: ffff8c9f410a6b40 R11: 0000000000000246 R12: 0000000000400001 Nov 16 10:59:12 stashcache.ldas.cit kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000008000 Nov 16 10:59:12 stashcache.ldas.cit kernel: FS: 00007f4272058740(0000) GS:ffff8ccd80800000(0000) knlGS:0000000000000000 Nov 16 10:59:12 stashcache.ldas.cit kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 16 10:59:12 stashcache.ldas.cit kernel: CR2: 00007f4271034f80 CR3: 000000010f4be003 CR4: 00000000007706f0 Nov 16 10:59:12 stashcache.ldas.cit kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 16 10:59:12 stashcache.ldas.cit kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Nov 16 10:59:12 stashcache.ldas.cit kernel: PKRU: 55555554 Nov 16 10:59:12 stashcache.ldas.cit kernel: Call Trace: Nov 16 10:59:12 stashcache.ldas.cit kernel: tsd_hash_search.isra.4+0x7e/0x90 [spl] Nov 16 10:59:12 stashcache.ldas.cit kernel: tsd_create+0x8b/0x160 [spl] Nov 16 10:59:12 stashcache.ldas.cit kernel: ? 0xffffffffc0790000 Nov 16 10:59:12 stashcache.ldas.cit kernel: spl_taskq_init+0x2d/0x180 [spl] Nov 16 10:59:12 stashcache.ldas.cit kernel: spl_init+0x193/0x1000 [spl] Nov 16 10:59:12 stashcache.ldas.cit kernel: do_one_initcall+0x46/0x1d0 Nov 16 10:59:12 stashcache.ldas.cit kernel: ? do_init_module+0x22/0x230 Nov 16 10:59:12 stashcache.ldas.cit kernel: ? kmem_cache_alloc_trace+0x142/0x280 Nov 16 10:59:12 stashcache.ldas.cit kernel: do_init_module+0x5a/0x230 Nov 16 10:59:12 stashcache.ldas.cit kernel: load_module+0x14bf/0x17f0 Nov 16 10:59:12 stashcache.ldas.cit kernel: ? __do_sys_finit_module+0xb1/0x110 Nov 16 10:59:12 stashcache.ldas.cit kernel: __do_sys_finit_module+0xb1/0x110 Nov 16 10:59:12 stashcache.ldas.cit kernel: do_syscall_64+0x5b/0x1b0 Nov 16 10:59:12 stashcache.ldas.cit kernel: entry_SYSCALL_64_after_hwframe+0x61/0xc6 Nov 16 10:59:12 stashcache.ldas.cit kernel: RIP: 0033:0x7f4270f6b91d Nov 16 10:59:12 stashcache.ldas.cit kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 55 38 00 f7 d8 64 89 01 48 Nov 16 10:59:12 stashcache.ldas.cit kernel: RSP: 002b:00007fff8ab90548 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 Nov 16 10:59:12 stashcache.ldas.cit kernel: RAX: ffffffffffffffda RBX: 000055a7230c2900 RCX: 00007f4270f6b91d Nov 16 10:59:12 stashcache.ldas.cit kernel: RDX: 0000000000000000 RSI: 000055a7214a08b6 RDI: 0000000000000003 Nov 16 10:59:12 stashcache.ldas.cit kernel: RBP: 000055a7214a08b6 R08: 0000000000000000 R09: 0000000000000000 Nov 16 10:59:12 stashcache.ldas.cit kernel: R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000 Nov 16 10:59:12 stashcache.ldas.cit kernel: R13: 000055a7230c28b0 R14: 0000000000040000 R15: 0000000000000000 Message from [email protected] at Nov 16 10:59:40 ... kernel:watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:80625] Nov 16 10:59:40 stashcache.ldas.cit kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:80625] Nov 16 10:59:40 stashcache.ldas.cit kernel: Modules linked in: spl(OE+) nfsv3 nfs_acl nfs lockd grace fscache 8021q garp mrp stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc intel_rapl_msr intel_rapl_common isst_if_common skx_edac iTCO_wdt iTCO_vendor_support x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rapl ipmi_ssif intel_cstate mlx5_ib ib_uverbs pcspkr ib_core intel_uncore joydev mei_me i2c_i801 lpc_ich mei ioatdma acpi_ipmi ipmi_si dax_pmem_compat ipmi_devintf device_dax ipmi_msghandler dax_pmem_core acpi_pad acpi_power_meter binfmt_misc xfs libcrc32c raid1 nd_pmem nd_btt sd_mod sg mlx5_core ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect mpt3sas sysimgblt fb_sys_fops drm_ttm_helper ttm ahci nvme raid_class libahci mlxfw pci_hyperv_intf Nov 16 10:59:40 stashcache.ldas.cit kernel: drm ixgbe scsi_transport_sas nvme_core crc32c_intel libata tls t10_pi psample mdio dca nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod [last unloaded: zunicode] Nov 16 10:59:40 stashcache.ldas.cit kernel: CPU: 0 PID: 80625 Comm: modprobe Kdump: loaded Tainted: P OEL --------- - - 4.18.0-425.3.1.el8.x86_64 #1 Nov 16 10:59:40 stashcache.ldas.cit kernel: Hardware name: Supermicro SYS-2029U-TN24R4T/X11DPU, BIOS 3.8 08/19/2022 Nov 16 10:59:40 stashcache.ldas.cit kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x17b/0x1c0 Nov 16 10:59:40 stashcache.ldas.cit kernel: Code: 74 22 48 89 c1 0f 0d 08 eb 20 f3 90 8b 07 85 c0 75 f8 f0 0f b1 17 75 f2 65 ff 0d 6c 5e 0d 75 e9 1b b6 aa 00 31 c9 eb 02 f3 90 07 66 85 c0 75 f7 41 89 c0 66 45 31 c0 44 39 c6 74 20 c6 07 01 Nov 16 10:59:40 stashcache.ldas.cit kernel: RSP: 0018:ffff9b606e40fc30 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 Nov 16 10:59:40 stashcache.ldas.cit kernel: RAX: 0000000000040001 RBX: 0000000000000002 RCX: 0000000000000000 Nov 16 10:59:40 stashcache.ldas.cit kernel: RDX: ffff8ccd8082bcc0 RSI: 0000000000040000 RDI: ffff8c9f079b3cd0 Nov 16 10:59:40 stashcache.ldas.cit kernel: RBP: ffff8c9f079b3cd0 R08: ffff9b606e40fbd8 R09: ffff8c9f079b2000 Nov 16 10:59:40 stashcache.ldas.cit kernel: R10: ffff8c9f410a6b40 R11: 0000000000000246 R12: 0000000000400001 Nov 16 10:59:40 stashcache.ldas.cit kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000008000 Nov 16 10:59:40 stashcache.ldas.cit kernel: FS: 00007f4272058740(0000) GS:ffff8ccd80800000(0000) knlGS:0000000000000000 Nov 16 10:59:40 stashcache.ldas.cit kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 16 10:59:40 stashcache.ldas.cit kernel: CR2: 00007f4271034f80 CR3: 000000010f4be003 CR4: 00000000007706f0 Nov 16 10:59:40 stashcache.ldas.cit kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Nov 16 10:59:40 stashcache.ldas.cit kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Nov 16 10:59:40 stashcache.ldas.cit kernel: PKRU: 55555554 Nov 16 10:59:40 stashcache.ldas.cit kernel: Call Trace: Nov 16 10:59:40 stashcache.ldas.cit kernel: tsd_hash_search.isra.4+0x7e/0x90 [spl] Nov 16 10:59:40 stashcache.ldas.cit kernel: tsd_create+0x8b/0x160 [spl] Nov 16 10:59:40 stashcache.ldas.cit kernel: ? 0xffffffffc0790000 Nov 16 10:59:40 stashcache.ldas.cit kernel: spl_taskq_init+0x2d/0x180 [spl] Nov 16 10:59:40 stashcache.ldas.cit kernel: spl_init+0x193/0x1000 [spl] Nov 16 10:59:40 stashcache.ldas.cit kernel: do_one_initcall+0x46/0x1d0 Nov 16 10:59:40 stashcache.ldas.cit kernel: ? do_init_module+0x22/0x230 Nov 16 10:59:40 stashcache.ldas.cit kernel: ? kmem_cache_alloc_trace+0x142/0x280 Nov 16 10:59:40 stashcache.ldas.cit kernel: do_init_module+0x5a/0x230 Nov 16 10:59:40 stashcache.ldas.cit kernel: load_module+0x14bf/0x17f0 Nov 16 10:59:40 stashcache.ldas.cit kernel: ? __do_sys_finit_module+0xb1/0x110 Nov 16 10:59:40 stashcache.ldas.cit kernel: __do_sys_finit_module+0xb1/0x110 Nov 16 10:59:40 stashcache.ldas.cit kernel: do_syscall_64+0x5b/0x1b0 Nov 16 10:59:40 stashcache.ldas.cit kernel: entry_SYSCALL_64_after_hwframe+0x61/0xc6 Nov 16 10:59:40 stashcache.ldas.cit kernel: RIP: 0033:0x7f4270f6b91d Nov 16 10:59:40 stashcache.ldas.cit kernel: Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3b 55 38 00 f7 d8 64 89 01 48 Nov 16 10:59:40 stashcache.ldas.cit kernel: RSP: 002b:00007fff8ab90548 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 Nov 16 10:59:40 stashcache.ldas.cit kernel: RAX: ffffffffffffffda RBX: 000055a7230c2900 RCX: 00007f4270f6b91d Nov 16 10:59:40 stashcache.ldas.cit kernel: RDX: 0000000000000000 RSI: 000055a7214a08b6 RDI: 0000000000000003 Nov 16 10:59:40 stashcache.ldas.cit kernel: RBP: 000055a7214a08b6 R08: 0000000000000000 R09: 0000000000000000 Nov 16 10:59:40 stashcache.ldas.cit kernel: R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000 Nov 16 10:59:40 stashcache.ldas.cit kernel: R13: 000055a7230c28b0 R14: 0000000000040000 R15: 0000000000000000
Switching to zfs-dkms-2.1.6-1.el8 avoids this problem
Note, this prevents a system from booting (even to single user mode) if kmod-zfs is installed whether or not there is a zpool.
We saw this too. A case of weak-modules' method not detecting some incompatibility, which made me less than hopeful that recompiling would work, so it's good to know that the DKMS package does. I would think the boot hang would be happening when udev sees a disk with a ZFS label and therefore tries to load the module, so
Note, this prevents a system from booting (even to single user mode) if
kmod-zfsis installed whether or not there is a zpool.
surprises me, unless you have it in modules-load.d (or in your initrd's) or something like that, but I never tried it (by then I was behind on maintenance and whether one could boot without a resource the host is supposed to serve was sort of immaterial anyway).
kmods have been added to the repositories for the Alma/Rocky/RHEL 8.7 kernel. I know it's a bother but it'd be great if you could verify they work correctly on Rocky Linux. They're built on AlmaLinux 8.7 so there is a small chance there's some subtle kernel difference which caused this. cc @tonyhutter
That works on the same system that failed above. Thanks for the quick fix.
I know very little about Linux kernel modules, but is there a general solution to automate compatibility checks to catch this earlier and throw an error rather than creating an un-bootable system?
Please also consider updating the package name when rebuilding, e.g., increase the build number from -1 to -2, or follow what some other packages due to include the OS point release, e.g., sssd-2.7.3-4.el8_7.1.x86_64. I think it would be useful for future bug reports for some dnf, rpm, or yum command that can unambiguously let you know which kmod-zfs-2.1.6-1.el8.x86_64 is installed.
In theory, Linux doesn't promise any binary compatibility between kernel versions.
RH decided they wanted to promise some, and thus the whole weak-modules thing exists, but as you see, it's not remotely perfect.
I suppose there could be a fundamentally different kind of test bot that runs nightly or so and just tries loading the latest module packages against updated RHEL/Alma/what-have-you, since RH are the only ones where a premade binary is provided. @behlendorf does that sound like something the project would be interested in doing, or too niche a problem to solve?
Bumping the 8.7 rebuild from kmod-zfs-2.1.6-1.el8.x86_64 to kmod-zfs-2.1.6-2.el8.x86_64 would also have the advantage that users upgrading EL8.6 systems to EL8.7 could be blissfully unaware of this problem and receive an automatic update (assuming they properly update /etc/yum.repos.d/zfs.repo). However, as it stands an update of an EL8.6 system that already has a working kmod-zfs-2.1.6-1.el8.x86_64 installed will not automatically reinstall the working 8.7 kmod and a reboot will hang.
RedHat does promise kernel binary compatibility within a minor release as long as the kmod uses only whitelisted symbols. Unfortunately, ZFS needs symbols beyond those on the whitelist, and at least in this case there was a minor version bump, so no guarantees.
@rincebrain my feeling is this is a little too much of a niche problem. When a new RHEL/Alma/Rocky release is made we build against that kernel and verify the build with a full test suite run. Only if it passes do we post packages with kmods. Using binary kmods built against kernels from other minor releases isn't recommended. They may work, but they won't have been tested.
would also have the advantage that users upgrading EL8.6 systems to EL8.7 could be blissfully unaware of this problem and receive an automatic update
That's a good point and something we should consider doing in the future.
Can we set the dist part of the RPMs to have something like el8_<min_version> instead of el8 (eg: el8_6, el8_7)? This is a fairly common practice that we can see on redhat packages. This would allow us to upgrade from 8.6 to 8.7 by upgrading packages on different repositories. Right now, the packages on both 8.6 and 8.7 repositories have the same release, version etc.
@behlendorf, is it now save to upgrade rocky from 8.6 to 8.7? Because here, I don't see any new version and my update notification don't list any new zfs update.
@jb-alvarado I am successfully using the new 8.7 repository, but note that even if you have that repository configured, if you already installed kmod-zfs-2.1.6 from the 8.6 repository, then because the Release number was not bumped, it will not show as an update, and you need to dnf reinstall it.
Thank you @quartsize for the help!
It was now a bit complicate: I though I can run reinstall after reboot, but this did not work. So I had to boot in older Kernel, remove zfs, boot in new Kernel run sed -i "s/8.6/8.7/g" /etc/yum.repos.d/zfs.repo an install zfs again.
If you look in the https://zfsonlinux.org/epel/zfs-release-2-2$(rpm --eval "%{dist}").noarch.rpm package given on the page you linked, you'll see that the repofile provided therein uses the releasever variable -- on my Rocky 8 systems that's simply 8, and so I get http://download.zfsonlinux.org/epel/8/x86_64/, whose repomd.xml is the same as for 8.7, so using that version of the release package and/or repofile might save you needing to edit it to have the most recent repository available for any dnf reinstalls.