PANIC at dsl_crypt.c:2441:dsl_crypto_populate_key_nvlist()
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Debian |
| Distribution Version | 10.10 |
| Kernel Version | 4.19.0-17-amd64 |
| Architecture | x86_64 |
| OpenZFS Version | 2.0.3-8~bpo10+1 |
Describe the problem you're observing
Someone said to me "hey, I tried zfs send -w pool/encrypted/filesystem | zfs receive some/where and it works, but zfs send -v -w pool/encrypted/filesystem | zfs receive some/where errors out with Invalid argument, what gives", I went "I don't think I expect that to work", tried zfs send -w mypool/encroot > /dev/null and the magic smoke came out.
(Since I couldn't find any report of this before, I figured it was worth reporting for search results even if I try reproducing it on git master and it's been corrected.)
Describe how to reproduce the problem
Above.
Include any warning/errors/backtraces from the system logs
zfs get all phantasm/encroot:
NAME PROPERTY VALUE SOURCE
phantasm/encroot type filesystem -
phantasm/encroot creation Fri May 28 1:15 2021 -
phantasm/encroot used 707G -
phantasm/encroot available 2.82T -
phantasm/encroot referenced 760K -
phantasm/encroot compressratio 1.08x -
phantasm/encroot mounted no -
phantasm/encroot quota none default
phantasm/encroot reservation none default
phantasm/encroot recordsize 128K default
phantasm/encroot mountpoint /phantasm/encroot default
phantasm/encroot sharenfs off default
phantasm/encroot checksum on default
phantasm/encroot compression gzip-9 local
phantasm/encroot atime off inherited from phantasm
phantasm/encroot devices on default
phantasm/encroot exec on default
phantasm/encroot setuid on default
phantasm/encroot readonly off default
phantasm/encroot zoned off default
phantasm/encroot snapdir hidden default
phantasm/encroot aclmode discard default
phantasm/encroot aclinherit restricted default
phantasm/encroot createtxg 15965940 -
phantasm/encroot canmount on default
phantasm/encroot xattr sa inherited from phantasm
phantasm/encroot copies 1 default
phantasm/encroot version 5 -
phantasm/encroot utf8only off -
phantasm/encroot normalization none -
phantasm/encroot casesensitivity sensitive -
phantasm/encroot vscan off default
phantasm/encroot nbmand off default
phantasm/encroot sharesmb off default
phantasm/encroot refquota none default
phantasm/encroot refreservation none default
phantasm/encroot guid 8760648724915161698 -
phantasm/encroot primarycache all default
phantasm/encroot secondarycache all default
phantasm/encroot usedbysnapshots 0B -
phantasm/encroot usedbydataset 760K -
phantasm/encroot usedbychildren 707G -
phantasm/encroot usedbyrefreservation 0B -
phantasm/encroot logbias latency default
phantasm/encroot objsetid 914 -
phantasm/encroot dedup off default
phantasm/encroot mlslabel none default
phantasm/encroot sync standard default
phantasm/encroot dnodesize auto local
phantasm/encroot refcompressratio 1.00x -
phantasm/encroot written 760K -
phantasm/encroot logicalused 760G -
phantasm/encroot logicalreferenced 302K -
phantasm/encroot volmode default default
phantasm/encroot filesystem_limit none default
phantasm/encroot snapshot_limit none default
phantasm/encroot filesystem_count none default
phantasm/encroot snapshot_count none default
phantasm/encroot snapdev hidden default
phantasm/encroot acltype off default
phantasm/encroot context none default
phantasm/encroot fscontext none default
phantasm/encroot defcontext none default
phantasm/encroot rootcontext none default
phantasm/encroot relatime off default
phantasm/encroot redundant_metadata all default
phantasm/encroot overlay on default
phantasm/encroot encryption aes-256-gcm -
phantasm/encroot keylocation file:///workspace/highchurn/enc.key local
phantasm/encroot keyformat raw -
phantasm/encroot pbkdf2iters 0 default
phantasm/encroot encryptionroot phantasm/encroot -
phantasm/encroot keystatus available -
phantasm/encroot special_small_blocks 0 default
phantasm/encroot com.sun:auto-snapshot false inherited from phantasm
dmesg | tail -n mumble
[1284794.092468] VERIFY3(dp->dp_spa->spa_errata != 0) failed (0 != 0)
[1284794.092493] PANIC at dsl_crypt.c:2441:dsl_crypto_populate_key_nvlist()
[1284794.092513] Showing stack for process 35452
[1284794.092515] CPU: 15 PID: 35452 Comm: zfs Kdump: loaded Tainted: P OE 4.19.0-17-amd64 #1 Debian 4.19.194-2
[1284794.092516] Hardware name: Supermicro Super Server/X10SDV-TLN4F, BIOS 2.1 11/22/2019
[1284794.092516] Call Trace:
[1284794.092525] dump_stack+0x66/0x81
[1284794.092536] spl_panic+0xd3/0xfb [spl]
[1284794.092623] ? __raw_spin_unlock+0x5/0x10 [zfs]
[1284794.092692] ? zap_hashbits+0xa/0x20 [zfs]
[1284794.092759] ? zap_hash+0x36/0x210 [zfs]
[1284794.092826] ? zap_lookup_norm+0x9a/0xd0 [zfs]
[1284794.092878] dsl_crypto_populate_key_nvlist+0x5df/0x730 [zfs]
[1284794.092930] dmu_send_impl+0x5e3/0xbe0 [zfs]
[1284794.092938] ? tsd_hash_search+0x75/0xa0 [spl]
[1284794.092988] dmu_send+0x4ba/0x7e0 [zfs]
[1284794.092991] ? __switch_to+0x115/0x440
[1284794.092995] ? __switch_to_asm+0x35/0x70
[1284794.092998] ? _cond_resched+0x15/0x30
[1284794.093002] ? __kmalloc_node+0x1ea/0x2c0
[1284794.093006] ? spl_kmem_alloc_impl+0xd6/0xe0 [spl]
[1284794.093008] ? _cond_resched+0x15/0x30
[1284794.093015] ? nvt_nvpair_match+0x53/0x90 [znvpair]
[1284794.093019] ? nvt_remove_nvpair+0x88/0x120 [znvpair]
[1284794.093022] ? nvt_add_nvpair+0x54/0x110 [znvpair]
[1284794.093026] ? nvs_native_nvp_op+0xd0/0xd0 [znvpair]
[1284794.093029] ? nvs_decode_pairs+0x9d/0x120 [znvpair]
[1284794.093097] zfs_ioc_send_new+0x17d/0x1c0 [zfs]
[1284794.093168] ? dump_bytes_cb+0x20/0x20 [zfs]
[1284794.093237] zfsdev_ioctl_common+0x1f1/0x620 [zfs]
[1284794.093309] zfsdev_ioctl+0x4f/0xe0 [zfs]
[1284794.093312] do_vfs_ioctl+0xa4/0x630
[1284794.093316] ? do_munmap+0x33c/0x430
[1284794.093317] ksys_ioctl+0x60/0x90
[1284794.093319] __x64_sys_ioctl+0x16/0x20
[1284794.093323] do_syscall_64+0x53/0x110
[1284794.093326] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1284794.093327] RIP: 0033:0x7f9cc7861427
[1284794.093329] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
[1284794.093330] RSP: 002b:00007ffe117e1298 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1284794.093332] RAX: ffffffffffffffda RBX: 00007ffe117e12c0 RCX: 00007f9cc7861427
[1284794.093333] RDX: 00007ffe117e12c0 RSI: 0000000000005a40 RDI: 0000000000000005
[1284794.093333] RBP: 00007ffe117e48b0 R08: 0000000000000002 R09: 00005613abc60340
[1284794.093334] R10: 00005613abc56010 R11: 0000000000000246 R12: 0000000000000000
[1284794.093335] R13: 0000000000005a40 R14: 0000000000005a40 R15: 00005613abc60340
[1284926.573319] INFO: task zfs:35452 blocked for more than 120 seconds.
[1284926.573345] Tainted: P OE 4.19.0-17-amd64 #1 Debian 4.19.194-2
[1284926.573368] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1284926.573391] zfs D 0 35452 35451 0x80000004
[1284926.573394] Call Trace:
[1284926.573400] __schedule+0x29f/0x840
[1284926.573404] schedule+0x28/0x80
[1284926.573414] spl_panic+0xf9/0xfb [spl]
[1284926.573504] ? zap_hashbits+0xa/0x20 [zfs]
[1284926.573569] ? zap_hash+0x36/0x210 [zfs]
[1284926.573635] ? zap_lookup_norm+0x9a/0xd0 [zfs]
[1284926.573686] dsl_crypto_populate_key_nvlist+0x5df/0x730 [zfs]
[1284926.573737] dmu_send_impl+0x5e3/0xbe0 [zfs]
[1284926.573745] ? tsd_hash_search+0x75/0xa0 [spl]
[1284926.573794] dmu_send+0x4ba/0x7e0 [zfs]
[1284926.573799] ? __switch_to+0x115/0x440
[1284926.573802] ? __switch_to_asm+0x35/0x70
[1284926.573805] ? _cond_resched+0x15/0x30
[1284926.573808] ? __kmalloc_node+0x1ea/0x2c0
[1284926.573813] ? spl_kmem_alloc_impl+0xd6/0xe0 [spl]
[1284926.573814] ? _cond_resched+0x15/0x30
[1284926.573821] ? nvt_nvpair_match+0x53/0x90 [znvpair]
[1284926.573825] ? nvt_remove_nvpair+0x88/0x120 [znvpair]
[1284926.573828] ? nvt_add_nvpair+0x54/0x110 [znvpair]
[1284926.573831] ? nvs_native_nvp_op+0xd0/0xd0 [znvpair]
[1284926.573834] ? nvs_decode_pairs+0x9d/0x120 [znvpair]
[1284926.573899] zfs_ioc_send_new+0x17d/0x1c0 [zfs]
[1284926.573967] ? dump_bytes_cb+0x20/0x20 [zfs]
[1284926.574033] zfsdev_ioctl_common+0x1f1/0x620 [zfs]
[1284926.574101] zfsdev_ioctl+0x4f/0xe0 [zfs]
[1284926.574105] do_vfs_ioctl+0xa4/0x630
[1284926.574108] ? do_munmap+0x33c/0x430
[1284926.574110] ksys_ioctl+0x60/0x90
[1284926.574112] __x64_sys_ioctl+0x16/0x20
[1284926.574115] do_syscall_64+0x53/0x110
[1284926.574118] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1284926.574120] RIP: 0033:0x7f9cc7861427
[1284926.574125] Code: Bad RIP value.
[1284926.574126] RSP: 002b:00007ffe117e1298 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1284926.574127] RAX: ffffffffffffffda RBX: 00007ffe117e12c0 RCX: 00007f9cc7861427
[1284926.574128] RDX: 00007ffe117e12c0 RSI: 0000000000005a40 RDI: 0000000000000005
[1284926.574129] RBP: 00007ffe117e48b0 R08: 0000000000000002 R09: 00005613abc60340
[1284926.574129] R10: 00005613abc56010 R11: 0000000000000246 R12: 0000000000000000
[1284926.574130] R13: 0000000000005a40 R14: 0000000000005a40 R15: 00005613abc60340
A few data points:
- The pool was created on 2018-01-01, presumably on a stable 0.7.X (I don't have the relevant logs that far back, and the only features that were explicitly enabled after creation were spacemap_v2 and encryption...), with no features disabled
- The encrypted filesystem was created on 2021-05-28 with
zfs create phantasm/encroot -o encryption=on -o keylocation=file:///workspace/highchurn/enc.key -o keyformat=rawand version 2.0.3-1~bpo10+1 (DEBUG mode) on the pool, not by receiving or anything weird like that feature@encryptionhad been flipped to enabled a month prior, when the system was running v0.8.6-1~bpo10+1 (DEBUG mode), then a toy filesystem was created and shortly destroyed with-o encryption=on- It does the same thing on 1b50749ce (unsurprising, since I didn't think anything fundamentally had changed, but just for completeness)
- It happily functions normally if you do send/recv from a non-DEBUG build to a new pool created on 2.x...
- ...and still triggers this if you then try sending the received one around on a DEBUG build
If you'd like a pool in a file you can examine that will do this, here you go.
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
I hit this yesterday, doing some checks in preparation for a restore from backup.
I've been testing today, and its still present on master. However, I'm almost certain its a spurious ASSERT with --enable-debug, and works fine otherwise (my backups are on zfs-kmod-2.1.99-FreeBSD_gad0a55461, a FreeBSD 14 prerelease, which has all the debug compiled in as a matter of course).
Reproduction from scratch as follows. Create pool, encrypted dataset, put some data in, create a backup stream, and burn it all down. To be clear, it doesn't have to be a filesystem, a proper snapshot does the same thing.
zpool create tank loop0
zfs create \
-o encryption=aes-256-gcm \
-o keylocation=file:///tmp/zfskey \
-o keyformat=passphrase \
tank/enc
dd if=/dev/random of=/tank/enc/file bs=128k count=8
zfs unmount -a
zfs send -wecL tank/enc > /tmp/stream
zpool destroy tank
rm -f /tmp/zfskey
Then, create a new pool, and try to recieve the stream:
zpool create tank loop0
zfs recv tank/enc < /tmp/stream
zfs list -t all
zfs send -wecL tank/enc > /tmp/stream2
If compiled with --enable-debug, it will crash:
+ zfs send -wecL tank/enc
[ 3.788982] VERIFY3(dp->dp_spa->spa_errata != 0) failed (0 != 0)
[ 3.789085] PANIC at dsl_crypt.c:2442:dsl_crypto_populate_key_nvlist()
...
If not, the stream will be accepted. Sending it back out appears to yield effectively the same stream:
cat /tmp/stream | zstream dump > /tmp/dump
cat /tmp/stream2 | zstream dump > /tmp/dump2
diff -u /tmp/dump /tmp/dump2
--- /tmp/dump 2023-08-09 00:19:41.714426834 +0000
+++ /tmp/dump2 2023-08-09 00:19:41.722426834 +0000
@@ -5,7 +5,7 @@
creation_time = 64d2db9d
type = 2
flags = 0xc
- toguid = 3d4286753e33efde
+ toguid = 8f983d6834869827
fromguid = 0
toname = tank/enc@--head--
payloadlen = 1028
@@ -34,7 +34,7 @@
from_ivset_guid = 0x0
(end crypt_keydata)
-END checksum = 204c956ee7cd3/1f568d9cad058282/784e1ceb4822b03d/e382b1aa382b10ce
+END checksum = 204d7bed72106/1f61f3165ce72beb/79dfb23bb8ce9bd/790bfb7e8346e1ee
SUMMARY:
Total DRR_BEGIN records = 1 (1028 bytes)
Total DRR_END records = 1 (0 bytes)
So the answer may simply be that this assertion is erroneous, and should be removed. I haven't done the study to have any confidence though, and won't have time to do so for a while.
The workaround appears to be to send a snapshot, not a filesystem. For me at least, this won't be a big deal - this is a backup host carrying snapshots (no key material), so snapshots are a thing I have plenty of.