zfs icon indicating copy to clipboard operation
zfs copied to clipboard

PANIC at dsl_crypt.c:2441:dsl_crypto_populate_key_nvlist()

Open rincebrain opened this issue 4 years ago • 2 comments

System information

Type Version/Name
Distribution Name Debian
Distribution Version 10.10
Kernel Version 4.19.0-17-amd64
Architecture x86_64
OpenZFS Version 2.0.3-8~bpo10+1

Describe the problem you're observing

Someone said to me "hey, I tried zfs send -w pool/encrypted/filesystem | zfs receive some/where and it works, but zfs send -v -w pool/encrypted/filesystem | zfs receive some/where errors out with Invalid argument, what gives", I went "I don't think I expect that to work", tried zfs send -w mypool/encroot > /dev/null and the magic smoke came out.

(Since I couldn't find any report of this before, I figured it was worth reporting for search results even if I try reproducing it on git master and it's been corrected.)

Describe how to reproduce the problem

Above.

Include any warning/errors/backtraces from the system logs

zfs get all phantasm/encroot:

NAME              PROPERTY               VALUE                                SOURCE
phantasm/encroot  type                   filesystem                           -
phantasm/encroot  creation               Fri May 28  1:15 2021                -
phantasm/encroot  used                   707G                                 -
phantasm/encroot  available              2.82T                                -
phantasm/encroot  referenced             760K                                 -
phantasm/encroot  compressratio          1.08x                                -
phantasm/encroot  mounted                no                                   -
phantasm/encroot  quota                  none                                 default
phantasm/encroot  reservation            none                                 default
phantasm/encroot  recordsize             128K                                 default
phantasm/encroot  mountpoint             /phantasm/encroot                    default
phantasm/encroot  sharenfs               off                                  default
phantasm/encroot  checksum               on                                   default
phantasm/encroot  compression            gzip-9                               local
phantasm/encroot  atime                  off                                  inherited from phantasm
phantasm/encroot  devices                on                                   default
phantasm/encroot  exec                   on                                   default
phantasm/encroot  setuid                 on                                   default
phantasm/encroot  readonly               off                                  default
phantasm/encroot  zoned                  off                                  default
phantasm/encroot  snapdir                hidden                               default
phantasm/encroot  aclmode                discard                              default
phantasm/encroot  aclinherit             restricted                           default
phantasm/encroot  createtxg              15965940                             -
phantasm/encroot  canmount               on                                   default
phantasm/encroot  xattr                  sa                                   inherited from phantasm
phantasm/encroot  copies                 1                                    default
phantasm/encroot  version                5                                    -
phantasm/encroot  utf8only               off                                  -
phantasm/encroot  normalization          none                                 -
phantasm/encroot  casesensitivity        sensitive                            -
phantasm/encroot  vscan                  off                                  default
phantasm/encroot  nbmand                 off                                  default
phantasm/encroot  sharesmb               off                                  default
phantasm/encroot  refquota               none                                 default
phantasm/encroot  refreservation         none                                 default
phantasm/encroot  guid                   8760648724915161698                  -
phantasm/encroot  primarycache           all                                  default
phantasm/encroot  secondarycache         all                                  default
phantasm/encroot  usedbysnapshots        0B                                   -
phantasm/encroot  usedbydataset          760K                                 -
phantasm/encroot  usedbychildren         707G                                 -
phantasm/encroot  usedbyrefreservation   0B                                   -
phantasm/encroot  logbias                latency                              default
phantasm/encroot  objsetid               914                                  -
phantasm/encroot  dedup                  off                                  default
phantasm/encroot  mlslabel               none                                 default
phantasm/encroot  sync                   standard                             default
phantasm/encroot  dnodesize              auto                                 local
phantasm/encroot  refcompressratio       1.00x                                -
phantasm/encroot  written                760K                                 -
phantasm/encroot  logicalused            760G                                 -
phantasm/encroot  logicalreferenced      302K                                 -
phantasm/encroot  volmode                default                              default
phantasm/encroot  filesystem_limit       none                                 default
phantasm/encroot  snapshot_limit         none                                 default
phantasm/encroot  filesystem_count       none                                 default
phantasm/encroot  snapshot_count         none                                 default
phantasm/encroot  snapdev                hidden                               default
phantasm/encroot  acltype                off                                  default
phantasm/encroot  context                none                                 default
phantasm/encroot  fscontext              none                                 default
phantasm/encroot  defcontext             none                                 default
phantasm/encroot  rootcontext            none                                 default
phantasm/encroot  relatime               off                                  default
phantasm/encroot  redundant_metadata     all                                  default
phantasm/encroot  overlay                on                                   default
phantasm/encroot  encryption             aes-256-gcm                          -
phantasm/encroot  keylocation            file:///workspace/highchurn/enc.key  local
phantasm/encroot  keyformat              raw                                  -
phantasm/encroot  pbkdf2iters            0                                    default
phantasm/encroot  encryptionroot         phantasm/encroot                     -
phantasm/encroot  keystatus              available                            -
phantasm/encroot  special_small_blocks   0                                    default
phantasm/encroot  com.sun:auto-snapshot  false                                inherited from phantasm

dmesg | tail -n mumble

[1284794.092468] VERIFY3(dp->dp_spa->spa_errata != 0) failed (0 != 0)
[1284794.092493] PANIC at dsl_crypt.c:2441:dsl_crypto_populate_key_nvlist()
[1284794.092513] Showing stack for process 35452
[1284794.092515] CPU: 15 PID: 35452 Comm: zfs Kdump: loaded Tainted: P           OE     4.19.0-17-amd64 #1 Debian 4.19.194-2
[1284794.092516] Hardware name: Supermicro Super Server/X10SDV-TLN4F, BIOS 2.1 11/22/2019
[1284794.092516] Call Trace:
[1284794.092525]  dump_stack+0x66/0x81
[1284794.092536]  spl_panic+0xd3/0xfb [spl]
[1284794.092623]  ? __raw_spin_unlock+0x5/0x10 [zfs]
[1284794.092692]  ? zap_hashbits+0xa/0x20 [zfs]
[1284794.092759]  ? zap_hash+0x36/0x210 [zfs]
[1284794.092826]  ? zap_lookup_norm+0x9a/0xd0 [zfs]
[1284794.092878]  dsl_crypto_populate_key_nvlist+0x5df/0x730 [zfs]
[1284794.092930]  dmu_send_impl+0x5e3/0xbe0 [zfs]
[1284794.092938]  ? tsd_hash_search+0x75/0xa0 [spl]
[1284794.092988]  dmu_send+0x4ba/0x7e0 [zfs]
[1284794.092991]  ? __switch_to+0x115/0x440
[1284794.092995]  ? __switch_to_asm+0x35/0x70
[1284794.092998]  ? _cond_resched+0x15/0x30
[1284794.093002]  ? __kmalloc_node+0x1ea/0x2c0
[1284794.093006]  ? spl_kmem_alloc_impl+0xd6/0xe0 [spl]
[1284794.093008]  ? _cond_resched+0x15/0x30
[1284794.093015]  ? nvt_nvpair_match+0x53/0x90 [znvpair]
[1284794.093019]  ? nvt_remove_nvpair+0x88/0x120 [znvpair]
[1284794.093022]  ? nvt_add_nvpair+0x54/0x110 [znvpair]
[1284794.093026]  ? nvs_native_nvp_op+0xd0/0xd0 [znvpair]
[1284794.093029]  ? nvs_decode_pairs+0x9d/0x120 [znvpair]
[1284794.093097]  zfs_ioc_send_new+0x17d/0x1c0 [zfs]
[1284794.093168]  ? dump_bytes_cb+0x20/0x20 [zfs]
[1284794.093237]  zfsdev_ioctl_common+0x1f1/0x620 [zfs]
[1284794.093309]  zfsdev_ioctl+0x4f/0xe0 [zfs]
[1284794.093312]  do_vfs_ioctl+0xa4/0x630
[1284794.093316]  ? do_munmap+0x33c/0x430
[1284794.093317]  ksys_ioctl+0x60/0x90
[1284794.093319]  __x64_sys_ioctl+0x16/0x20
[1284794.093323]  do_syscall_64+0x53/0x110
[1284794.093326]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1284794.093327] RIP: 0033:0x7f9cc7861427
[1284794.093329] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 f7 d8 64 89 01 48
[1284794.093330] RSP: 002b:00007ffe117e1298 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1284794.093332] RAX: ffffffffffffffda RBX: 00007ffe117e12c0 RCX: 00007f9cc7861427
[1284794.093333] RDX: 00007ffe117e12c0 RSI: 0000000000005a40 RDI: 0000000000000005
[1284794.093333] RBP: 00007ffe117e48b0 R08: 0000000000000002 R09: 00005613abc60340
[1284794.093334] R10: 00005613abc56010 R11: 0000000000000246 R12: 0000000000000000
[1284794.093335] R13: 0000000000005a40 R14: 0000000000005a40 R15: 00005613abc60340
[1284926.573319] INFO: task zfs:35452 blocked for more than 120 seconds.
[1284926.573345]       Tainted: P           OE     4.19.0-17-amd64 #1 Debian 4.19.194-2
[1284926.573368] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1284926.573391] zfs             D    0 35452  35451 0x80000004
[1284926.573394] Call Trace:
[1284926.573400]  __schedule+0x29f/0x840
[1284926.573404]  schedule+0x28/0x80
[1284926.573414]  spl_panic+0xf9/0xfb [spl]
[1284926.573504]  ? zap_hashbits+0xa/0x20 [zfs]
[1284926.573569]  ? zap_hash+0x36/0x210 [zfs]
[1284926.573635]  ? zap_lookup_norm+0x9a/0xd0 [zfs]
[1284926.573686]  dsl_crypto_populate_key_nvlist+0x5df/0x730 [zfs]
[1284926.573737]  dmu_send_impl+0x5e3/0xbe0 [zfs]
[1284926.573745]  ? tsd_hash_search+0x75/0xa0 [spl]
[1284926.573794]  dmu_send+0x4ba/0x7e0 [zfs]
[1284926.573799]  ? __switch_to+0x115/0x440
[1284926.573802]  ? __switch_to_asm+0x35/0x70
[1284926.573805]  ? _cond_resched+0x15/0x30
[1284926.573808]  ? __kmalloc_node+0x1ea/0x2c0
[1284926.573813]  ? spl_kmem_alloc_impl+0xd6/0xe0 [spl]
[1284926.573814]  ? _cond_resched+0x15/0x30
[1284926.573821]  ? nvt_nvpair_match+0x53/0x90 [znvpair]
[1284926.573825]  ? nvt_remove_nvpair+0x88/0x120 [znvpair]
[1284926.573828]  ? nvt_add_nvpair+0x54/0x110 [znvpair]
[1284926.573831]  ? nvs_native_nvp_op+0xd0/0xd0 [znvpair]
[1284926.573834]  ? nvs_decode_pairs+0x9d/0x120 [znvpair]
[1284926.573899]  zfs_ioc_send_new+0x17d/0x1c0 [zfs]
[1284926.573967]  ? dump_bytes_cb+0x20/0x20 [zfs]
[1284926.574033]  zfsdev_ioctl_common+0x1f1/0x620 [zfs]
[1284926.574101]  zfsdev_ioctl+0x4f/0xe0 [zfs]
[1284926.574105]  do_vfs_ioctl+0xa4/0x630
[1284926.574108]  ? do_munmap+0x33c/0x430
[1284926.574110]  ksys_ioctl+0x60/0x90
[1284926.574112]  __x64_sys_ioctl+0x16/0x20
[1284926.574115]  do_syscall_64+0x53/0x110
[1284926.574118]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1284926.574120] RIP: 0033:0x7f9cc7861427
[1284926.574125] Code: Bad RIP value.
[1284926.574126] RSP: 002b:00007ffe117e1298 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1284926.574127] RAX: ffffffffffffffda RBX: 00007ffe117e12c0 RCX: 00007f9cc7861427
[1284926.574128] RDX: 00007ffe117e12c0 RSI: 0000000000005a40 RDI: 0000000000000005
[1284926.574129] RBP: 00007ffe117e48b0 R08: 0000000000000002 R09: 00005613abc60340
[1284926.574129] R10: 00005613abc56010 R11: 0000000000000246 R12: 0000000000000000
[1284926.574130] R13: 0000000000005a40 R14: 0000000000005a40 R15: 00005613abc60340

rincebrain avatar Jul 27 '21 18:07 rincebrain

A few data points:

  • The pool was created on 2018-01-01, presumably on a stable 0.7.X (I don't have the relevant logs that far back, and the only features that were explicitly enabled after creation were spacemap_v2 and encryption...), with no features disabled
  • The encrypted filesystem was created on 2021-05-28 with zfs create phantasm/encroot -o encryption=on -o keylocation=file:///workspace/highchurn/enc.key -o keyformat=raw and version 2.0.3-1~bpo10+1 (DEBUG mode) on the pool, not by receiving or anything weird like that
  • feature@encryption had been flipped to enabled a month prior, when the system was running v0.8.6-1~bpo10+1 (DEBUG mode), then a toy filesystem was created and shortly destroyed with -o encryption=on
  • It does the same thing on 1b50749ce (unsurprising, since I didn't think anything fundamentally had changed, but just for completeness)
  • It happily functions normally if you do send/recv from a non-DEBUG build to a new pool created on 2.x...
  • ...and still triggers this if you then try sending the received one around on a DEBUG build

If you'd like a pool in a file you can examine that will do this, here you go.

rincebrain avatar Jul 31 '21 01:07 rincebrain

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 31 '22 01:07 stale[bot]

I hit this yesterday, doing some checks in preparation for a restore from backup.

I've been testing today, and its still present on master. However, I'm almost certain its a spurious ASSERT with --enable-debug, and works fine otherwise (my backups are on zfs-kmod-2.1.99-FreeBSD_gad0a55461, a FreeBSD 14 prerelease, which has all the debug compiled in as a matter of course).

Reproduction from scratch as follows. Create pool, encrypted dataset, put some data in, create a backup stream, and burn it all down. To be clear, it doesn't have to be a filesystem, a proper snapshot does the same thing.

zpool create tank loop0

zfs create \
  -o encryption=aes-256-gcm \
  -o keylocation=file:///tmp/zfskey \
  -o keyformat=passphrase \
    tank/enc

dd if=/dev/random of=/tank/enc/file bs=128k count=8

zfs unmount -a

zfs send -wecL tank/enc > /tmp/stream

zpool destroy tank
rm -f /tmp/zfskey

Then, create a new pool, and try to recieve the stream:

zpool create tank loop0
zfs recv tank/enc < /tmp/stream

zfs list -t all

zfs send -wecL tank/enc > /tmp/stream2

If compiled with --enable-debug, it will crash:

+ zfs send -wecL tank/enc
[    3.788982] VERIFY3(dp->dp_spa->spa_errata != 0) failed (0 != 0)
[    3.789085] PANIC at dsl_crypt.c:2442:dsl_crypto_populate_key_nvlist()
...

If not, the stream will be accepted. Sending it back out appears to yield effectively the same stream:

cat /tmp/stream | zstream dump > /tmp/dump
cat /tmp/stream2 | zstream dump > /tmp/dump2

diff -u /tmp/dump /tmp/dump2
--- /tmp/dump	2023-08-09 00:19:41.714426834 +0000
+++ /tmp/dump2	2023-08-09 00:19:41.722426834 +0000
@@ -5,7 +5,7 @@
 	creation_time = 64d2db9d
 	type = 2
 	flags = 0xc
-	toguid = 3d4286753e33efde
+	toguid = 8f983d6834869827
 	fromguid = 0
 	toname = tank/enc@--head--
 	payloadlen = 1028
@@ -34,7 +34,7 @@
 		from_ivset_guid = 0x0
 	(end crypt_keydata)

-END checksum = 204c956ee7cd3/1f568d9cad058282/784e1ceb4822b03d/e382b1aa382b10ce
+END checksum = 204d7bed72106/1f61f3165ce72beb/79dfb23bb8ce9bd/790bfb7e8346e1ee
 SUMMARY:
 	Total DRR_BEGIN records = 1 (1028 bytes)
 	Total DRR_END records = 1 (0 bytes)

So the answer may simply be that this assertion is erroneous, and should be removed. I haven't done the study to have any confidence though, and won't have time to do so for a while.

The workaround appears to be to send a snapshot, not a filesystem. For me at least, this won't be a big deal - this is a backup host carrying snapshots (no key material), so snapshots are a thing I have plenty of.

robn avatar Aug 09 '23 00:08 robn