zfs
zfs copied to clipboard
crash doing zfs send encrypted
System information
Type | Version/Name --- | 0.8.3 Distribution Name | Ubuntu Distribution Version | 20.03 Kernel Version | 5.4.0 Architecture | Intel 64 OpenZFS Version | 0.8.3
Describe the problem you're observing
We` had zfs hang while doing a zfs send. I reported it with 11679, but was asked to report it separately.
You asked for a backtrace of our send-side failure. Here it is. The next line in kern.log is from a reboot an hour later. There are no further backtraces. At that point our problems are with ZFS at a user level. I don't have a full narrative of everything we saw. I apologize. The system has 512 G of memory, a bunch of disks as RAIDZ1's, a mirror of SSD as special for metadata, and a mirror of SSD as slog, and a l2arc (ssd mirror, as an experiment -- it wasn't worth it). Quotas are in use, but I don't know whether the file system that was being sent used quotas.
Recovery options were limited by the fact that it takes a few days to do a scrub, but downtime of the main file systems on it are a real problem. So I went to a mode of recovery I was confident would work (rebuilding the whole thing from backup, with the most commonly used file systems first, and no encryption). The backup system was also encrypted (that's where the send was going), but the restore was unencrypted. I believe at the time the system was around 500 TB, with 200 TB in use.
I believe the file system was being sent and simultaneously used by NFS.
This is Ubuntu 20, with the ZFS that comes with it.
Describe how to reproduce the problem
Can't reproduce.
Include any warning/errors/backtraces from the system logs
According to the stack trace there's some unexplained damage to one of the ZFS block pointers which is causing the crash. Specifically, there's no valid checksum algorithm set: PANIC: blkptr at 00000000f9099df0 has invalid CHECKSUM 0
. I can't explain how that would have happened, but as of ZFS 2.0.7 this kind of damage will be handled gracefully and an error returned rather than a system crash.
That's good news. In the current version how much would we lose? A file? The file system?
It would depend on exactly which block is damaged and if it's the only one. If it's just this block pointer then most likely a file, or a portion of the file.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.