zfs
zfs copied to clipboard
ZPOOL unavailable/gone after upgrade to Fedora 30, ZoL 0.8.2, ZFS modules not loaded, DKMS error, kernel warning
System information
Type | Version/Name |
---|---|
Distribution Name | Fedora |
Distribution Version | 30 |
Linux Kernel | 5.4.12-100 |
Architecture | x86_64 |
ZFS Version | 0.8.2-1 |
SPL Version | 0.8.2-1 |
Describe the problem you're observing
After upgrading a system running Fedora 29 with ZoL 0.8.1 to Fedora 30 with ZoL 0.8.2, the ZPOOL isn't mounted automatically on startup anymore.
Besides other issues, the system was updated hoping the update would fix the resilver bug that was preventing a possibly bad drive to be replaced - the resilver process kept being restarted and stayed at 0% (also logging a lot of seemingly meaningless messages). However, the update made it worse because the pool is not available at all anymore.
# zpool status
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.
# journalctl -u zfs-import-cache.service
-- Reboot --
systemd[1]: Starting Import ZFS pools by cache file...
zpool[1106]: The ZFS modules are not loaded.
zpool[1106]: Try running '/sbin/modprobe zfs' as root to load them.
systemd[1]: zfs-import-cache.service: Main process exited, code=exited, status=1/FAILURE
systemd[1]: zfs-import-cache.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Import ZFS pools by cache file.
# /sbin/modprobe zfs
modprobe: FATAL: Module zfs not found in directory /lib/modules/5.4.12-100.fc30.x86_64
# dkms status
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/zfs/0.8.1/source/dkms.conf does not exist.
Why is it looking for a config file for an older version?
# uname -sr
Linux 5.4.12-100.fc30.x86_64
# ls /var/lib/dkms/zfs/
0.8.1
0.8.2
kernel-5.0.16-100.fc28.x86_64-x86_64 -> 0.8.1/5.0.16-100.fc28.x86_64/x86_64
kernel-5.2.7-100.fc29.x86_64-x86_64 -> 0.8.2/5.2.7-100.fc29.x86_64/x86_64
I've found an old bug report where someone recommended to delete the old version. I won't delete the only working ZFS kernel module. Why would I lose the only fallback option of booting the older kernel?
Assuming there isn't really any real issue that's preventing the ZFS module from being built and/or loaded, I tried:
# dkms install -m zfs -v 0.8.2
Kernel preparation unnecessary for this kernel. Skipping...
Running the pre_build script:
...
After that, I was able to load the module:
# /sbin/modprobe zfs
Some promising dmesg entries appeared:
perf: interrupt took too long (3202 > 3183), lowering kernel.perf_event_max_sample_rate to 62000
spl: loading out-of-tree module taints kernel.
spl: module verification failed: signature and/or required key missing - tainting kernel
znvpair: module license 'CDDL' taints kernel.
Disabling lock debugging due to kernel taint
ZFS: Loaded module v0.8.2-1, ZFS pool version 5000, ZFS filesystem version 5
After trying to import the pool, less promising dmesg entries appeared:
# systemctl start zfs-import-cache.service
------------[ cut here ]------------
General protection fault in user access. Non-canonical address?
WARNING: CPU: 10 PID: 7077 at arch/x86/mm/extable.c:126 ex_handler_uaccess+0x4d/0x60
Modules linked in: zfs(POE) zunicode(POE) zavl(POE) icp(POE) zlua(POE) zcommon(POE) znvpair(POE) spl(OE) bi
...
Call Trace:
fixup_exception+0x45/0x58
do_general_protection+0x49/0x150
general_protection+0x32/0x40
RIP: 0010:strnlen_user+0x47/0x110
Code: 86 de 00 00 00 55 48 29 f8 45 31 c9 53 66 66 90 0f ae e8 48 39 c6 49 89 fa 48 0f 46 c6 41 83 e2 07 48 83 e7 f8 31 c9 4c 01 d0 <4c> 8b 1f 85 c9 0f 85 96 00 00 00 42 8d 0c d5 00 00 00 00 41 b8 01
RSP: 0018:ffffa1d340553de0 EFLAGS: 00010206
RAX: 0000000000020000 RBX: ffff91e223ae2000 RCX: 0000000000000000
RDX: f47da860f9c07400 RSI: 0000000000020000 RDI: f47da860f9c07400
RBP: 00007fffffffefc3 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffffc6293b8eb880 R14: f47da860f9c07400 R15: ffff91e19df6e400
copy_strings.isra.0+0xbd/0x410
__do_execve_file.isra.0+0x4be/0x8b0
? call_usermodehelper+0xa0/0xa0
do_execve+0x21/0x30
call_usermodehelper_exec_async+0x186/0x1b0
? recalc_sigpending+0x17/0x50
ret_from_fork+0x35/0x40
---[ end trace 7188f177469a840f ]---
So the pool is now available but I'm afraid it won't stay.
Also, after mounting the pool, all the zfs mountpoints were just empty directories, there were no files. I was able to mount it using this makeshift script (from bug 9207): https://github.com/zfsonlinux/zfs/issues/9207#issuecomment-525390928
# /root/bin/zpool-mount
ZPOOL NOT MOUNTED - /z... not a mountpoint
ATTEMPTING TO MOUNT ZFS...
ZFS complains about its mountpoint not being empty or something
rmdir: removing directory, '/z.../.../'
...
rmdir: removing directory, '/z.../'
ZFS MOUNTED
Attempting to re-mount bind mounts in /data...
To help put your mind at ease your pool is not in any risk. I'm glad you were able to get it imported and mounted.
There have been some recent changes dkms in Fedora which might have caused the build to fail. Manually rebuilding was the right thing to do. Alternately a clean uninstall and install should also have worked.
As for the dmesg warning this can be safely ignored and will be resolved in the next zfs point release 0.8.3 due out this month. I'm not sure why empty directories would have been created.
Thank you for the clarification! That's good to know.
For the record, all I could find in /var/log/messages was this:
...
dracut[3721]: *** Creating initramfs image file '/boot/init
ramfs-5.4.12-100.fc30.x86_64.img' done ***
dnf[1100]: Running scriptlet: kernel-core-5.4.12-100.fc30.
x86_64 2320/2320
dnf[1100]: Error! Could not locate dkms.conf file.
dnf[1100]: File: /var/lib/dkms/zfs/0.8.1/source/dkms.conf does not exist.
It just seems like a bug if dkms tries to rebuild the old ZFS module 0.8.1 after upgrading to 0.8.2 - and even if that fails for some reason, I don't understand why it doesn't build the ZFS module 0.8.2 which would be loaded as soon as the upgrade process has finished. If that's because it wants to build both ZFS modules and it decides to rebuild the old one first and doesn't continue with the new one if it fails to build the old one (seems like that), that would seem like a design flaw in dkms, leading to missing kernel modules after an upgrade. Then again, if that's the case, it's not a ZoL issue and we could close this bug (unless it keeps happening).
As for those empty directories: This keeps happening everytime, so without manual intervention or a script, the pool is never mounted on boot. It seems like it's mounted if you just check the contents of /datapool but it never is and what you'll find in there are empty directories, not mountpoints. I have written a script that does just that (mounts pool on reboot), see bug 9207.
I reached this bugreport while searching for dkms / modules problems.
My problem is: I installed ZFS 0.8.3 dkms on both Fedora 31 and CentOS 8. On both systems after every boot I have to manually run #modproble zfs
because the zfs module is not loaded automatically
It's not that my system did not try to load the zfs module. The file /etc/modules-load.d/zfs.conf tells it to load it (it's there but it "is not owned by any package").
Unfortunately, it seems like this wasn't a one-time thing. After upgrading to Fedora 31, it just seemed like nothing was working (at least everything that requires zfs was failing and at least one service wrote to the root filesystem instead because zfs wasn't mounted, leading to some minor loss that had to be fixed, it almost seems better to not have all services start automatically on Fedora, at least not after upgrades).
# modprobe zfs
modprobe: FATAL: Module zfs not found in directory /lib/modules/5.8.18-100.fc31.x86_64
# uname -sr
Linux 5.8.18-100.fc31.x86_64
# dnf install zfs
...
Package zfs-2.0.0-1.fc31.x86_64 is already installed.
...
I'm not sure what it is, but something is not working when it comes to upgrades on Fedora. It's not as much work as on Debian though (which keeps asking to confirm config diffs in the middle of the process plus many other questions and at some point, the zfs repo had to be changed some time ago).
I don't remember if or how dkms failed but I tried the alternative this time, I reinstalled zfs. Let me tell you that it wasn't fun and maybe there should even be warnings to not do that unless you've tried everything including the dkms command. When uninstalling zfs on Fedora (dnf remove zfs zfs-dkms
), you also lose a bunch of system components like make, perl or even libvirt-daemon-kvm to name just a few. I'm actually not sure where these dependencies are coming from at this point and I may have overlooked a few details in the hurry to fix the system after the upgrade but I had to reinstall a few components (during which a special fs configuration was reverted which wasn't very good either because it led to some data loss but that's another story). Another thing is that the bind mounts were empty, so I've now added the mount option "x-systemd.after=zfs-mount.service". I still think that release upgrades are generally much easier on Fedora than on Debian or others because they're non-interactive (I have a script that runs the commands because I forget to update the dnf module, it's in my repo "fedora-release-upgrade-gui" although it's not really a gui and it's not really pretty but it makes the process easier) - if only zfs would keep working after upgrading, that would be great...
Long story short: Don't reinstall everything if you can identify and fix or reinstall the part that's actually missing, which appears to be the dkms module.
Since ZFS got lost again, I thought I'd add a comment: How to fix "The ZFS modules are not loaded." after a system update on Fedora 32
How to reproduce the issue: Have ZFS on Fedora 32, run dnf upgrade
, reboot
Maybe don't make any other plans for the rest of the day.
# zpool status
The ZFS modules are not loaded.
Try running '/sbin/modprobe zfs' as root to load them.
# /sbin/modprobe zfs
modprobe: FATAL: Module zfs not found in directory /lib/modules/5.11.22-100.fc32.x86_64
Check version
# uname -sr; rpm -q zfs
Linux 5.11.22-100.fc32.x86_64
zfs-2.1.0-1.fc32.x86_64
Run dkms, fingers crossed
# dkms install -m zfs -v 2.1.0
...
DKMS: install completed.
If it worked, the ZFS module can now be loaded:
# /sbin/modprobe zfs
# zpool status
no pools available
Re-import your pool(s) ... again:
# zpool import
...
# zpool import <Z_POOL>
I just experienced something similar.
Yesterday dnf update
zfs to version 2.1.1 followed by a reboot - pool mounted successfully.
Today dnf update
kernel to 5.13.16-200.fc34.x86_64, but dkms failed during update with:
dkms: running auto installation service for kernel 5.13.16-200.fc34.x86_64
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/zfs/2.1.0/source/dkms.conf does not exist.
I also had the same issue in fc33, after an update of zfs to 2.1.1, reboot (zpool still worked at this point), and update kernel to 5.13.16-100.fc33.x86_64.
The module installation issue was solved by:
dkms install -m zfs -v 2.1.1 -k 5.13.16-100.fc33.x86_64
(kernel version is not strictly needed if the kernel is running for which you want to install the module)
However, the problem with the dkms.conf
file remained:
[...] dkms status
Error! Could not locate dkms.conf file.
File: /var/lib/dkms/zfs/2.1.0/source/dkms.conf does not exist.`
I solved that by removing the folder /var/lib/dkms/zfs/2.1.0
and link /var/lib/dkms/zfs/kernel-5.12.14-200.fc33.x86_64-x86_64 -> 2.1.0/5.12.14-200.fc33.x86_64/x86_64
.
Thanks to David Fraser, who gives some explanation why the latter error happens and how to find what to delete here.
It would be good to find out why the module was not properly installed in the first place.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.