zfs
zfs copied to clipboard
Permanent errors have been detected in the following files with clean scrub and no other errors
System information
Type | Version/Name |
---|---|
Distribution Name | Ubuntu |
Distribution Version | 22.04 |
Kernel Version | 5.15.0-47-generic |
Architecture | Intel x64 |
OpenZFS Version | 2.1.5-1~22.04.york0 (zfs version also lists zfs-kmod-2.1.4-0ubuntu0.1!?!?) |
Describe the problem you're observing
After a power loss event I am unable to mount most of my ZFS filesystems. I have performed at least three scrubs now. The output of the zfs status command claims that a device experienced an error but I can find no information about which device or when. I'm wondering if this is a bug due to seemingly inconsistent information from zpool status. After each scrub the zpool status -v command again shows entirely clean. However when I try to zfs mount -a I get the following
rkeiii@ate:~$ sudo zfs mount -a
cannot mount 'bits/enc/ghd': Input/output error
cannot mount 'bits/enc/vmware': Input/output error
cannot mount 'bits/enc/downloads': Input/output error
cannot mount 'bits/enc/home': Input/output error
cannot mount 'bits/enc/backups': Input/output error
cannot mount 'bits/enc/personal': Input/output error
zpool status -v output
rkeiii@ate:~$ sudo zpool status -v
pool: bits
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 02:58:52 with 0 errors on Fri Sep 9 02:34:04 2022
config:
NAME STATE READ WRITE CKSUM
bits ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ata-WDC_WD100EMAZ-00WJTA0_JEG7NHAN ONLINE 0 0 0
ata-WDC_WD100EMAZ-00WJTA0_2YJ0RXPD ONLINE 0 0 0
ata-WDC_WD100EMAZ-00WJTA0_2YHZ6URD ONLINE 0 0 0
ata-WDC_WD100EMAZ-00WJTA0_2YJ0K1MD ONLINE 0 0 0
ata-WDC_WD100EMAZ-00WJTA0_2YHZH64D ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
bits/enc/backups:<0x0>
bits/enc/vmware:<0x0>
bits/enc/personal:<0x0>
bits/enc/downloads:<0x0>
bits/enc/home:<0x0>
bits/enc/ghd:<0x0>
rkeiii@ate:~$
I was able to track down my original pool setup commands and the commands I used to transfer the volumes from unencrypted to encrypted ZFS filesystems:
pool create command used originally (4-5 years ago)
sudo zpool create -f bits raidz2 sda sdb sdc sdd sde
zfs create command for the encrypted fs
sudo zfs create -o compression=lz4 -o encryption=on -o keyformat=passphrase bits/enc
zfs send/recv command used to transfer the data from unencrypted ZFS FS to encrypted ZFS FS
sudo zfs send -Rw bits/downloads@zfs-auto-snap_frequent-2019-08-25-0215 | mbuffer -s 128k -m 4G | sudo zfs recv bits/enc/downloads
Describe how to reproduce the problem
I am unsure what led to this. Possibilities include:
- I originally migrated non-encrypted ZFS datasets from within the same pool to encrypted ZFS datasets
- The power loss event (but scrub and status are willing to report no issues?)
Include any warning/errors/backtraces from the system logs
Native encryption strikes again.
(The versions in zfs version
differ because you're using the kernel module that shipped with your Ubuntu install and the userland from, I'm going to not-really-guess jonathonf's PPA - you need the zfs-dkms
package from that PPA to run the newer kernel module too...)
I'd bet at least a nickel that the problem is the same as #13521 and #13709, so the terrible workaround I suggested there will probably work here too.
@rincebrain Thank you for the workaround! That worked like a charm. I'm including the exact commands I used below for others reference if they run into this:
root@ate:~# zfs snapshot bits/enc/downloads/tv@recover1
root@ate:~# zfs snapshot bits/enc/downloads/tv@recover2
root@ate:~# zfs send --raw -i bits/enc/downloads/tv@recover1 bits/enc/downloads/tv@recover2 > /bits/recover_downloads_tv
root@ate:~# zfs rollback -r bits/enc/downloads/tv@recover1
root@ate:~# zfs receive -F -v bits/enc/downloads/tv < /bits/recover_downloads_tv
receiving incremental stream of bits/enc/downloads/tv@recover2 into bits/enc/downloads/tv@recover2
received 1.31K stream in 1 seconds (1.31K/sec)
root@ate:~# sudo zfs mount -a
Also here's a gist with a convenient little script I cobbled together to do this (because I had 15+ afflicted filesystems): https://gist.github.com/rkeiii/0fe05fdcee6f520c208280acbf2b49ea
The script is intended to be invoked as "./recover $zfs_fs_name"
@rincebrain Thank you for the workaround! That worked like a charm. I'm including the exact commands I used below for others reference if they run into this:
root@ate:~# zfs snapshot bits/enc/downloads/tv@recover1 root@ate:~# zfs snapshot bits/enc/downloads/tv@recover2 root@ate:~# zfs send --raw -i bits/enc/downloads/tv@recover1 bits/enc/downloads/tv@recover2 > /bits/recover_downloads_tv root@ate:~# zfs rollback -r bits/enc/downloads/tv@recover1 root@ate:~# zfs receive -F -v bits/enc/downloads/tv < /bits/recover_downloads_tv receiving incremental stream of bits/enc/downloads/tv@recover2 into bits/enc/downloads/tv@recover2 received 1.31K stream in 1 seconds (1.31K/sec) root@ate:~# sudo zfs mount -a
Also here's a gist with a convenient little script I cobbled together to do this (because I had 15+ afflicted filesystems): https://gist.github.com/rkeiii/0fe05fdcee6f520c208280acbf2b49ea
The script is intended to be invoked as "./recover $zfs_fs_name"
@rkeiii & @rincebrain: you made my day/night. Awesome! Thank you very much! I also ran into this (and this https://github.com/openzfs/zfs/issues/13709) and can confirm that I was able to mount my datasets again!
I'm unsure how to 100% tell which datasets are affected.
Probably just try to mount them all. Or is it only the ones reported by zpool status:
errors: Permanent errors have been detected in the following files:
tank/encrptd/Flo_Data:<0x0>
tank/encrptd/micro_boot_backup:<0x0>
Best regards, Flo.