zfs icon indicating copy to clipboard operation
zfs copied to clipboard

Permanent errors have been detected in the following files with clean scrub and no other errors

Open rkeiii opened this issue 1 year ago • 2 comments

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 22.04
Kernel Version 5.15.0-47-generic
Architecture Intel x64
OpenZFS Version 2.1.5-1~22.04.york0 (zfs version also lists zfs-kmod-2.1.4-0ubuntu0.1!?!?)

Describe the problem you're observing

After a power loss event I am unable to mount most of my ZFS filesystems. I have performed at least three scrubs now. The output of the zfs status command claims that a device experienced an error but I can find no information about which device or when. I'm wondering if this is a bug due to seemingly inconsistent information from zpool status. After each scrub the zpool status -v command again shows entirely clean. However when I try to zfs mount -a I get the following

rkeiii@ate:~$ sudo zfs mount -a
cannot mount 'bits/enc/ghd': Input/output error
cannot mount 'bits/enc/vmware': Input/output error
cannot mount 'bits/enc/downloads': Input/output error
cannot mount 'bits/enc/home': Input/output error
cannot mount 'bits/enc/backups': Input/output error
cannot mount 'bits/enc/personal': Input/output error

zpool status -v output

rkeiii@ate:~$ sudo zpool status -v
  pool: bits
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 02:58:52 with 0 errors on Fri Sep  9 02:34:04 2022
config:

	NAME                                    STATE     READ WRITE CKSUM
	bits                                    ONLINE       0     0     0
	  raidz2-0                              ONLINE       0     0     0
	    ata-WDC_WD100EMAZ-00WJTA0_JEG7NHAN  ONLINE       0     0     0
	    ata-WDC_WD100EMAZ-00WJTA0_2YJ0RXPD  ONLINE       0     0     0
	    ata-WDC_WD100EMAZ-00WJTA0_2YHZ6URD  ONLINE       0     0     0
	    ata-WDC_WD100EMAZ-00WJTA0_2YJ0K1MD  ONLINE       0     0     0
	    ata-WDC_WD100EMAZ-00WJTA0_2YHZH64D  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        bits/enc/backups:<0x0>
        bits/enc/vmware:<0x0>
        bits/enc/personal:<0x0>
        bits/enc/downloads:<0x0>
        bits/enc/home:<0x0>
        bits/enc/ghd:<0x0>
rkeiii@ate:~$

I was able to track down my original pool setup commands and the commands I used to transfer the volumes from unencrypted to encrypted ZFS filesystems:

pool create command used originally (4-5 years ago)

sudo zpool create -f bits raidz2 sda sdb sdc sdd sde

zfs create command for the encrypted fs

sudo zfs create -o compression=lz4 -o encryption=on -o keyformat=passphrase bits/enc

zfs send/recv command used to transfer the data from unencrypted ZFS FS to encrypted ZFS FS

sudo zfs send -Rw bits/downloads@zfs-auto-snap_frequent-2019-08-25-0215 | mbuffer -s 128k -m 4G | sudo zfs recv bits/enc/downloads

Describe how to reproduce the problem

I am unsure what led to this. Possibilities include:

  • I originally migrated non-encrypted ZFS datasets from within the same pool to encrypted ZFS datasets
  • The power loss event (but scrub and status are willing to report no issues?)

Include any warning/errors/backtraces from the system logs

rkeiii avatar Sep 09 '22 15:09 rkeiii

Native encryption strikes again.

(The versions in zfs version differ because you're using the kernel module that shipped with your Ubuntu install and the userland from, I'm going to not-really-guess jonathonf's PPA - you need the zfs-dkms package from that PPA to run the newer kernel module too...)

I'd bet at least a nickel that the problem is the same as #13521 and #13709, so the terrible workaround I suggested there will probably work here too.

rincebrain avatar Sep 10 '22 21:09 rincebrain

@rincebrain Thank you for the workaround! That worked like a charm. I'm including the exact commands I used below for others reference if they run into this:

root@ate:~# zfs snapshot bits/enc/downloads/tv@recover1
root@ate:~# zfs snapshot bits/enc/downloads/tv@recover2
root@ate:~# zfs send --raw -i bits/enc/downloads/tv@recover1 bits/enc/downloads/tv@recover2 > /bits/recover_downloads_tv
root@ate:~# zfs rollback -r bits/enc/downloads/tv@recover1
root@ate:~# zfs receive -F -v bits/enc/downloads/tv < /bits/recover_downloads_tv
receiving incremental stream of bits/enc/downloads/tv@recover2 into bits/enc/downloads/tv@recover2
received 1.31K stream in 1 seconds (1.31K/sec)
root@ate:~# sudo zfs mount -a

Also here's a gist with a convenient little script I cobbled together to do this (because I had 15+ afflicted filesystems): https://gist.github.com/rkeiii/0fe05fdcee6f520c208280acbf2b49ea

The script is intended to be invoked as "./recover $zfs_fs_name"

rkeiii avatar Sep 10 '22 23:09 rkeiii

@rincebrain Thank you for the workaround! That worked like a charm. I'm including the exact commands I used below for others reference if they run into this:

root@ate:~# zfs snapshot bits/enc/downloads/tv@recover1
root@ate:~# zfs snapshot bits/enc/downloads/tv@recover2
root@ate:~# zfs send --raw -i bits/enc/downloads/tv@recover1 bits/enc/downloads/tv@recover2 > /bits/recover_downloads_tv
root@ate:~# zfs rollback -r bits/enc/downloads/tv@recover1
root@ate:~# zfs receive -F -v bits/enc/downloads/tv < /bits/recover_downloads_tv
receiving incremental stream of bits/enc/downloads/tv@recover2 into bits/enc/downloads/tv@recover2
received 1.31K stream in 1 seconds (1.31K/sec)
root@ate:~# sudo zfs mount -a

Also here's a gist with a convenient little script I cobbled together to do this (because I had 15+ afflicted filesystems): https://gist.github.com/rkeiii/0fe05fdcee6f520c208280acbf2b49ea

The script is intended to be invoked as "./recover $zfs_fs_name"

@rkeiii & @rincebrain: you made my day/night. Awesome! Thank you very much! I also ran into this (and this https://github.com/openzfs/zfs/issues/13709) and can confirm that I was able to mount my datasets again!

I'm unsure how to 100% tell which datasets are affected.

Probably just try to mount them all. Or is it only the ones reported by zpool status:
errors: Permanent errors have been detected in the following files:

        tank/encrptd/Flo_Data:<0x0>
        tank/encrptd/micro_boot_backup:<0x0>

Best regards, Flo.

florian-obradovic avatar Oct 19 '22 21:10 florian-obradovic