zfs
zfs copied to clipboard
Pool reports ZFS-8000-8A after zfs send but then clears
System information
Type | Version/Name |
---|---|
Distribution Name | NixOS |
Distribution Version | 23.11 |
Kernel Version | 6.7.10 |
Architecture | x84_64 |
OpenZFS Version | 2.2.3-1 |
Describe the problem you're observing
When trying to send particular snapshots (as determined by syncoid), the sending pool will start reporting ZFS-8000-8A, with an empty file list. But then there error clears on it own after 5-30 seconds and multiple runs of zpool status -v. The output of zfs send is also invalid.
Describe how to reproduce the problem
Happens every time, don't know how it got in this state.
Include any warning/errors/backtraces from the system logs
zfs status output after zfs send:
pool: data
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:03:14 with 0 errors on Tue Apr 30 21:01:52 2024
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-0HC_Volume_30264525 ONLINE 0 0 0
scsi-0HC_Volume_31118548 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
zfs status output after 5-30 seconds and multiple runs of zpool status -v:
pool: data
state: ONLINE
scan: scrub repaired 0B in 00:03:14 with 0 errors on Tue Apr 30 21:01:52 2024
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-0HC_Volume_30264525 ONLINE 0 0 0
scsi-0HC_Volume_31118548 ONLINE 0 0 0
errors: No known data errors
Output of zfs send -I 'data/postgres@autosnap_2024-04-30_00:00:02_daily' 'data/postgres@autosnap_2024-05-01_01:00:13_hourly' | zstreamdump
:
BEGIN record
hdrtype = 2
features = 0
magic = 2f5bacbac
creation_time = 0
type = 0
flags = 0x0
toguid = 0
fromguid = 0
toname = data/postgres@autosnap_2024-05-01_01:00:13_hourly
payloadlen = 0
END checksum = 4e3d8c4e3/13074827678/2629d7a733f5/34974e936a868
BEGIN record
hdrtype = 1
features = 4
magic = 2f5bacbac
creation_time = 66304295
type = 2
flags = 0xc
toguid = 952f620822cc13b3
fromguid = 18d0c7688a7b650d
toname = data/postgres@autosnap_2024-04-30_01:00:05_hourly
payloadlen = 0
warning: cannot send 'data/postgres@autosnap_2024-04-30_01:00:05_hourly': Input/output error
BEGIN record
hdrtype = 1
features = 4
magic = 2f5bacbac
creation_time = 663050a2
type = 2
flags = 0xc
toguid = 8e478c81e645c5c2
fromguid = 952f620822cc13b3
toname = data/postgres@autosnap_2024-04-30_02:00:02_hourly
payloadlen = 0
invalid checksum
Incorrect checksum in record header.
Expected checksum = f9770daf2/9c3131f152a/35dc04be10236/d38c2d9e9ccd71
SUMMARY:
Total DRR_BEGIN records = 3 (0 bytes)
Total DRR_END records = 1 (0 bytes)
Total DRR_OBJECT records = 0 (0 bytes)
Total DRR_FREEOBJECTS records = 0 (0 bytes)
Total DRR_WRITE records = 0 (0 bytes)
Total DRR_WRITE_BYREF records = 0 (0 bytes)
Total DRR_WRITE_EMBEDDED records = 0 (0 bytes)
Total DRR_FREE records = 0 (0 bytes)
Total DRR_SPILL records = 0 (0 bytes)
Total records = 4
Total payload size = 0 (0x0)
Total header overhead = 1248 (0x4e0)
Total stream length = 1560 (0x618)
Can you share the output of zfs get all
on data/postgres
, otherwise it's going to be rather difficult to try and reproduce what's going on here.
@rincebrain here ya go:
NAME PROPERTY VALUE SOURCE
data/postgres type filesystem -
data/postgres creation Mon Apr 10 23:15 2023 -
data/postgres used 221M -
data/postgres available 25.5G -
data/postgres referenced 117M -
data/postgres compressratio 1.71x -
data/postgres mounted yes -
data/postgres quota none default
data/postgres reservation none default
data/postgres recordsize 8K local
data/postgres mountpoint /var/lib/postgresql/ local
data/postgres sharenfs off default
data/postgres checksum on default
data/postgres compression on default
data/postgres atime on default
data/postgres devices on default
data/postgres exec on default
data/postgres setuid on default
data/postgres readonly off default
data/postgres zoned off default
data/postgres snapdir hidden default
data/postgres aclmode discard default
data/postgres aclinherit restricted default
data/postgres createtxg 174 -
data/postgres canmount on local
data/postgres xattr on default
data/postgres copies 1 default
data/postgres version 5 -
data/postgres utf8only off -
data/postgres normalization none -
data/postgres casesensitivity sensitive -
data/postgres vscan off default
data/postgres nbmand off default
data/postgres sharesmb off default
data/postgres refquota none default
data/postgres refreservation none default
data/postgres guid 970173609522556085 -
data/postgres primarycache metadata local
data/postgres secondarycache all default
data/postgres usedbysnapshots 103M -
data/postgres usedbydataset 117M -
data/postgres usedbychildren 0B -
data/postgres usedbyrefreservation 0B -
data/postgres logbias throughput local
data/postgres objsetid 274 -
data/postgres dedup off default
data/postgres mlslabel none default
data/postgres sync standard default
data/postgres dnodesize legacy default
data/postgres refcompressratio 2.02x -
data/postgres written 1.64M -
data/postgres logicalused 351M -
data/postgres logicalreferenced 225M -
data/postgres volmode default default
data/postgres filesystem_limit none default
data/postgres snapshot_limit none default
data/postgres filesystem_count none default
data/postgres snapshot_count none default
data/postgres snapdev hidden default
data/postgres acltype off default
data/postgres context none default
data/postgres fscontext none default
data/postgres defcontext none default
data/postgres rootcontext none default
data/postgres relatime on default
data/postgres redundant_metadata all default
data/postgres overlay on default
data/postgres encryption aes-256-gcm -
data/postgres keylocation file:///var/secrets/zfs.key local
data/postgres keyformat raw -
data/postgres pbkdf2iters 0 default
data/postgres encryptionroot data/postgres -
data/postgres keystatus available -
data/postgres special_small_blocks 0 default
data/postgres snapshots_changed Wed May 1 15:00:06 2024 -
data/postgres nixos:shutdown-time Mon Mar 18 12:20:20 AM EDT 2024 inherited from data
My assumption would be that something is erroring on decryption during send, since it's not using -w
, and somehow because it's transient it's vanishing almost immediately, but the error is still breaking your send stream.
zpool events -v
might know what it was that it burped on, though probably not why. (My guess would be something strange like it's hitting some metadata object where DVA[0]
fails to decrypt but DVA[1]
does and somehow native encryption is screwing up error handling so trying DVA[1]
doesn't stop it from returning an error? Not sure, I've not seen this reproduce locally, but people have occasionally reported strange hiccups from native encryption that made me wonder if something strange was going on even after 2163cde450d0898b5f7bac16afb4e238485411ff.)
I was just poking at my system and noticed that a NixOS update had downgraded the kernel version to 6.6.29, but I didn't notice and hadn't rebooted. So I rebooted by my system. After the reboot my pool showed errors like before. After 2 scrubs the errors cleared and doing the zfs send from before no longer produces the bug.
Getting the same kind of issue with an empty file list, albeit it's not clearing and survives export and import. Doing a scrub now.