zfs icon indicating copy to clipboard operation
zfs copied to clipboard

ZFS on Linux null pointer dereference

Open segdy opened this issue 3 years ago • 157 comments

System information

Type Version/Name
Distribution Name Debian
Distribution Version 10 buster
Linux Kernel 4.19.0-14 (SMP Debian 4.19.171-2)
Architecture amd64
ZFS Version 2.0.3-1~bpo10+1
SPL Version 2.0.3-1~bpo10+1

Describe the problem you're observing

When I start sending raw ZFS snapshots to a different system, my Linux systen (4.19.0-14-amd64) starts to hang completely. I can ping it, I can start a very commands (such as dmesg) but most commands hang (incl zfs, zpool, htop, ps, ...). The entire systems hangs completely.

Dmesg shows the following entries at the time of the occurance:

[ 2293.134071] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 2293.149707] PGD 0 P4D 0
[ 2293.154752] Oops: 0000 [#1] SMP PTI
[ 2293.161701] CPU: 1 PID: 12576 Comm: receive_writer Tainted: P           OE     4.19.0-14-amd64 #1 Debian 4.19.171-2
[ 2293.182517] Hardware name: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0a 12/21/2015
[ 2293.196819] RIP: 0010:abd_verify+0x5/0x60 [zfs]
[ 2293.205865] Code: 0f 1f 44 00 00 0f 1f 44 00 00 8b 07 c1 e8 05 83 e0 01 c3 66 90 0f 1f 44 00 00 8b 07 c1 e8 06 83 e0 01 c3 66 90 0f 1f 44 00 00 <8b> 07 a8 01 74 01 c3 a8 40 74 43 41 54 4c 8d 67 68 55 53 48 8b 47
[ 2293.243325] RSP: 0018:ffffb12e4b6d7a28 EFLAGS: 00010246
[ 2293.253741] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2293.267974] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[ 2293.282205] RBP: 0000000000004000 R08: ffff935ec10b70b0 R09: 0000000000000000
[ 2293.296434] R10: 0000000000007130 R11: ffff935d75f984e0 R12: 0000000000004000
[ 2293.310664] R13: 0000000000000000 R14: ffffffffc0fea550 R15: 0000000000000020
[ 2293.324900] FS:  0000000000000000(0000) GS:ffff935ecfb00000(0000) knlGS:0000000000000000
[ 2293.341053] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2293.352510] CR2: 0000000000000000 CR3: 000000001340a001 CR4: 00000000000606e0
[ 2293.366743] Call Trace:
[ 2293.371704]  abd_borrow_buf+0x12/0x40 [zfs]
[ 2293.380104]  abd_borrow_buf_copy+0x28/0x70 [zfs]
[ 2293.389377]  zio_crypt_copy_dnode_bonus+0x36/0x130 [zfs]
[ 2293.400041]  arc_buf_fill+0x3ff/0xb60 [zfs]
[ 2293.408449]  ? zfs_btree_add_idx+0xd0/0x200 [zfs]
[ 2293.417889]  arc_untransform+0x1c/0x70 [zfs]
[ 2293.426461]  dbuf_read_verify_dnode_crypt+0xec/0x160 [zfs]
[ 2293.437466]  dbuf_read_impl.constprop.29+0x4ad/0x6b0 [zfs]
[ 2293.448423]  ? kmem_cache_alloc+0x167/0x1d0
[ 2293.456776]  ? __cv_init+0x3d/0x60 [spl]
[ 2293.464671]  ? dbuf_cons+0xa7/0xc0 [zfs]
[ 2293.472497]  ? spl_kmem_cache_alloc+0x108/0x7a0 [spl]
[ 2293.482583]  ? _cond_resched+0x15/0x30
[ 2293.490071]  ? _cond_resched+0x15/0x30
[ 2293.497542]  ? mutex_lock+0xe/0x30
[ 2293.504402]  ? aggsum_add+0x17a/0x190 [zfs]
[ 2293.512810]  dbuf_read+0x1b2/0x520 [zfs]
[ 2293.520672]  ? dnode_hold_impl+0x350/0xc20 [zfs]
[ 2293.529904]  dmu_bonus_hold_by_dnode+0x126/0x1a0 [zfs]
[ 2293.540186]  receive_object+0x403/0xc70 [zfs]
[ 2293.548906]  ? receive_freeobjects.isra.10+0x9d/0x120 [zfs]
[ 2293.560049]  receive_writer_thread+0x279/0xa00 [zfs]
[ 2293.569962]  ? set_curr_task_fair+0x26/0x50
[ 2293.578319]  ? receive_process_write_record+0x190/0x190 [zfs]
[ 2293.589793]  ? __thread_exit+0x20/0x20 [spl]
[ 2293.598317]  ? thread_generic_wrapper+0x6f/0x80 [spl]
[ 2293.608410]  ? receive_process_write_record+0x190/0x190 [zfs]
[ 2293.619882]  thread_generic_wrapper+0x6f/0x80 [spl]
[ 2293.629609]  kthread+0x112/0x130
[ 2293.636053]  ? kthread_bind+0x30/0x30
[ 2293.643351]  ret_from_fork+0x35/0x40
[ 2293.650473] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter veth pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) nf_tables nfnetlink vboxdrv(OE) bridge binfmt_misc zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) intel_rapl x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm irqbypass crct10dif_pclmul ib_iser joydev crc32_pclmul rdma_cm ghash_clmulni_intel iw_cm intel_cstate ib_cm intel_uncore ib_core intel_rapl_perf configfs ipmi_si sg ipmi_devintf iTCO_wdt iTCO_vendor_support pcc_cpufreq intel_pch_thermal iscsi_tcp ipmi_msghandler libiscsi_tcp libiscsi evdev scsi_transport_iscsi pcspkr tun nfsd auth_rpcgss nfs_acl lockd grace sunrpc lm85 dme1737 hwmon_vid iptable_nat ipt_MASQUERADE nf_nat_ipv4 nf_nat nf_conntrack
[ 2293.793008]  nf_defrag_ipv6 nf_defrag_ipv4 fuse loop 8021q garp stp mrp llc ecryptfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 raid10 uas usb_storage raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear hid_generic raid1 usbhid hid md_mod sd_mod ast ahci ttm libahci libata drm_kms_helper drm crc32c_intel igb i2c_i801 dca i2c_algo_bit scsi_mod lpc_ich mfd_core e1000e xhci_pci ehci_pci xhci_hcd ehci_hcd usbcore usb_common thermal fan video button
[ 2293.895677] CR2: 0000000000000000
[ 2293.902280] ---[ end trace 164c64ca87be80af ]---
[ 2294.020926] RIP: 0010:abd_verify+0x5/0x60 [zfs]
[ 2294.029975] Code: 0f 1f 44 00 00 0f 1f 44 00 00 8b 07 c1 e8 05 83 e0 01 c3 66 90 0f 1f 44 00 00 8b 07 c1 e8 06 83 e0 01 c3 66 90 0f 1f 44 00 00 <8b> 07 a8 01 74 01 c3 a8 40 74 43 41 54 4c 8d 67 68 55 53 48 8b 47
[ 2294.067433] RSP: 0018:ffffb12e4b6d7a28 EFLAGS: 00010246
[ 2294.077850] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2294.092082] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[ 2294.106312] RBP: 0000000000004000 R08: ffff935ec10b70b0 R09: 0000000000000000
[ 2294.120542] R10: 0000000000007130 R11: ffff935d75f984e0 R12: 0000000000004000
[ 2294.134774] R13: 0000000000000000 R14: ffffffffc0fea550 R15: 0000000000000020
[ 2294.149006] FS:  0000000000000000(0000) GS:ffff935ecfb00000(0000) knlGS:0000000000000000
[ 2294.165144] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2294.176600] CR2: 0000000000000000 CR3: 000000001340a001 CR4: 00000000000606e0

Interestingly, the transfer continues happily but just everything else in the system hangs. The only way to recover is resetting the machine (since not even reboot works).

Describe how to reproduce the problem

It's a tough one. It seems to me that the issue might be load related in some sense since it only occurs if I have two zfs send's (via syncoid) running in parallel that have to do with encrypted datasets.

Transfer 1

The first one sends datasets from an unecrypted dataset into an encrypted one (I migrate to encryption).

I use syncoid and use the command:

syncoid -r --skip-parent --no-sync-snap zpradix1imain/sys/vz zpradix1imain/sys/vz_enc

This translates into

zfs send -I 'zpradix1imain/sys/vz/main'@'zfs-auto-snap_hourly-2021-03-02-1917' 'zpradix1imain/sys/vz/main'@'zfs-auto-snap_frequent-2021-03-02-1932' | mbuffer -q -s 128k -m 16M 2>/dev/null | pv -s 16392592 | zfs receive -s -F 'zpradix1imain/sys/vz_enc/main'

Transfer 2

I transfer data from an encrypted dataset raw to a secondary server. The syncoid command is:

syncoid -r --skip-parent --no-sync-snap --sendoptions=w --exclude=zfs-auto-snap_hourly --exclude=zfs-auto-snap_frequent zpradix1imain/data [email protected]:zpzetta/radix/data

This translates into:

zfs send -w 'zpradix1imain/data/home'@'vicari-prev' | pv -s 179222507064 | lzop | mbuffer -q -s 128k -m 16M 2>/dev/null | ssh ...

In summary:

  • Both Transfer 1 and Transfer 2 have to be running in parallel
  • The issue only appears a few minutes/seconds after I started the transfer

segdy avatar Mar 02 '21 19:03 segdy

I am experiencing the same or similar issue. A hang-up during zfs send/receive of encrypted filesystem. The two filesystems are encrypted with different keys. I'm also using syncoid. My stack trace is slightly different, but the null pointer dereference is in the same region of abd_borrow_buf_copy

My system is a standard proxmox install 6.3

Os: Debian Buster 10 Kernel: 5.4.78-2-pve ZFS Module Version: v0.8.5-pve1 Pool: 4x2 two-drive mirrors of 2TB and 4TB vdev devices. zpool scrub returns no errors.

[54105.343179] BUG: kernel NULL pointer dereference, address: 0000000000000000
[54105.343185] #PF: supervisor read access in kernel mode
[54105.343187] #PF: error_code(0x0000) - not-present page
[54105.343188] PGD 0 P4D 0
[54105.343190] Oops: 0000 [#1] SMP PTI
[54105.343193] CPU: 1 PID: 13831 Comm: receive_writer Tainted: P           O      5.4.78-2-pve #1
[54105.343195] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H81 Pro BTC R2.0, BIOS P1.40 03/09/2018
[54105.343253] RIP: 0010:abd_borrow_buf+0x13/0x70 [zfs]
[54105.343255] Code: 49 8b 44 24 18 41 5c 41 5d 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 f5 41 54 53 <44> 8b 27 48 89 fb 41 83 e4 01 75 30 44 8b 77 1c 48 8b 47 20 4d 85
[54105.343258] RSP: 0018:ffff96ae063a39d8 EFLAGS: 00010246
[54105.343260] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[54105.343261] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[54105.343263] RBP: ffff96ae063a39f8 R08: 00000000000000a5 R09: 0000000000000000
[54105.343264] R10: 0000000000000706 R11: 0000000000000000 R12: 0000000000000000
[54105.343265] R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc07ee470
[54105.343267] FS:  0000000000000000(0000) GS:ffff88c9af500000(0000) knlGS:0000000000000000
[54105.343269] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54105.343271] CR2: 0000000000000000 CR3: 000000001980a004 CR4: 00000000000606e0
[54105.343272] Call Trace:
[54105.343292]  abd_borrow_buf_copy+0x2c/0x80 [zfs]
[54105.343323]  zio_crypt_copy_dnode_bonus+0x31/0x140 [zfs]
[54105.343327]  ? preempt_schedule_common+0xe/0x20
[54105.343345]  arc_buf_fill+0x402/0xb60 [zfs]
[54105.343364]  arc_untransform+0x21/0x90 [zfs]
[54105.343383]  dbuf_read_verify_dnode_crypt+0xee/0x160 [zfs]
[54105.343402]  dbuf_read+0x19d/0xb80 [zfs]
[54105.343425]  dmu_tx_check_ioerr+0x68/0xd0 [zfs]
[54105.343448]  dmu_tx_hold_free_impl+0x132/0x250 [zfs]
[54105.343470]  dmu_tx_hold_free+0x3c/0x50 [zfs]
[54105.343491]  dmu_free_long_range+0x234/0x4b0 [zfs]
[54105.343511]  dmu_free_long_object+0x26/0xc0 [zfs]
[54105.343533]  receive_freeobjects.isra.10+0x7a/0x100 [zfs]
[54105.343555]  receive_writer_thread+0x43c/0xb80 [zfs]
[54105.343558]  ? set_next_task_fair+0x73/0xb0
[54105.343565]  ? spl_kmem_free+0x33/0x40 [spl]
[54105.343586]  ? receive_read_prefetch+0x140/0x140 [zfs]
[54105.343591]  thread_generic_wrapper+0x74/0x90 [spl]
[54105.343613]  ? receive_read_prefetch+0x140/0x140 [zfs]
[54105.343617]  ? thread_generic_wrapper+0x74/0x90 [spl]
[54105.343620]  kthread+0x120/0x140
[54105.343624]  ? __thread_exit+0x20/0x20 [spl]
[54105.343626]  ? kthread_park+0x90/0x90
[54105.343629]  ret_from_fork+0x35/0x40
[54105.343631] Modules linked in: ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp sch_ingress iptable_filter bpfilter bonding 8021q garp mrp openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 softdog nfnetlink_log nfnetlink intel_rapl_msr mei_hdcp ppdev intel_rapl_common x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd rapl intel_cstate snd_hda_codec_hdmi pcspkr i915 drm_kms_helper snd_hda_codec_realtek mei_me snd_hda_codec_generic ledtrig_audio mei drm snd_hda_intel snd_intel_dspcfg snd_hda_codec i2c_algo_bit snd_hda_core fb_sys_fops syscopyarea sysfillrect snd_hwdep sysimgblt snd_pcm snd_timer snd soundcore parport_pc parport mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nct6775 hwmon_vid coretemp sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO)
[54105.343657]  zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq libcrc32c r8169 realtek e1000e i2c_i801 lpc_ich ahci libahci ehci_pci ehci_hcd xhci_pci xhci_hcd mpt3sas raid_class scsi_transport_sas video
[54105.343675] CR2: 0000000000000000
[54105.343677] ---[ end trace 61f4d52a93ce31c9 ]---
[54105.343695] RIP: 0010:abd_borrow_buf+0x13/0x70 [zfs]
[54105.343696] Code: 49 8b 44 24 18 41 5c 41 5d 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 f5 41 54 53 <44> 8b 27 48 89 fb 41 83 e4 01 75 30 44 8b 77 1c 48 8b 47 20 4d 85
[54105.343699] RSP: 0018:ffff96ae063a39d8 EFLAGS: 00010246
[54105.343700] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[54105.343702] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[54105.343703] RBP: ffff96ae063a39f8 R08: 00000000000000a5 R09: 0000000000000000
[54105.343705] R10: 0000000000000706 R11: 0000000000000000 R12: 0000000000000000
[54105.343706] R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc07ee470
[54105.343708] FS:  0000000000000000(0000) GS:ffff88c9af500000(0000) knlGS:0000000000000000
[54105.343709] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54105.343710] CR2: 0000000000000000 CR3: 000000001980a004 CR4: 00000000000606e0
[89614.646434] hrtimer: interrupt took 14015 ns

pquan avatar Mar 25 '21 10:03 pquan

Similar hung system, on the same distribution. The problem has been around ever since reinstalling the system and enabling encryption with ZFS 0.8.5, and persists with 5.4.114-1-pve / 2.0.4-pve1

dmesg
 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
 kernel: #PF: supervisor read access in kernel mode
 kernel: #PF: error_code(0x0000) - not-present page
 kernel: PGD 0 P4D 0
 kernel: Oops: 0000 [#1] SMP PTI
 kernel: CPU: 1 PID: 23147 Comm: receive_writer Tainted: P           O      5.4.114-1-pve #1
 kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q2900-ITX, BIOS P1.60 03/01/2018
 kernel: RIP: 0010:abd_verify+0x5/0x70 [zfs]
 kernel: Code: 00 00 00 66 90 0f 1f 44 00 00 55 8b 07 c1 e8 06 48 89 e5 83 e0 01 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 <8b> 07 a8 01 75 4f 55 48 89 e5 41 55 41 54 49 89 fc 53 a8 40 74 40
 kernel: RSP: 0018:ffffbd418d5b7930 EFLAGS: 00010246
 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
 kernel: RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
 kernel: RBP: ffffbd418d5b7948 R08: 0000000000000037 R09: 0000000000000000
 kernel: R10: 000000000000b1e5 R11: 0000000000000000 R12: 0000000000004000
 kernel: R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc06b19d0
 kernel: FS:  0000000000000000(0000) GS:ffff97e2efe80000(0000) knlGS:0000000000000000
 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 0000000000000000 CR3: 00000003150e4000 CR4: 00000000001026e0
 kernel: Call Trace:
 kernel:  ? abd_borrow_buf+0x17/0x40 [zfs]
 kernel:  abd_borrow_buf_copy+0x2c/0x80 [zfs]
 kernel:  zio_crypt_copy_dnode_bonus+0x31/0x140 [zfs]
 kernel:  ? preempt_schedule_common+0xe/0x20
 kernel:  arc_buf_fill+0x402/0xb40 [zfs]
 kernel:  arc_untransform+0x21/0x80 [zfs]
 kernel:  dbuf_read_verify_dnode_crypt+0xee/0x160 [zfs]
 kernel:  ? __cv_init+0x42/0x60 [spl]
 kernel:  dbuf_read_impl.constprop.33+0x10c/0x6e0 [zfs]
 kernel:  ? arc_space_consume+0x4f/0x120 [zfs]
 kernel:  ? dbuf_create+0x3e8/0x580 [zfs]
 kernel:  ? mutex_lock+0x12/0x30
 kernel:  ? dbuf_find+0x111/0x1d0 [zfs]
 kernel:  ? _cond_resched+0x19/0x30
 kernel:  ? down_read+0x12/0xa0
 kernel:  dbuf_read+0x1b2/0x510 [zfs]
 kernel:  dmu_tx_check_ioerr+0x68/0xd0 [zfs]
 kernel:  dmu_tx_hold_free_impl+0x132/0x250 [zfs]
 kernel:  dmu_tx_hold_free+0x3c/0x50 [zfs]
 kernel:  dmu_free_long_range+0x22a/0x4d0 [zfs]
 kernel:  dmu_free_long_object+0x26/0xc0 [zfs]
 kernel:  receive_freeobjects.isra.15+0x8d/0x120 [zfs]
 kernel:  receive_writer_thread+0x46a/0xa00 [zfs]
 kernel:  ? spl_kmem_free+0x33/0x40 [spl]
 kernel:  ? receive_process_write_record+0x180/0x180 [zfs]
 kernel:  thread_generic_wrapper+0x74/0x90 [spl]
 kernel:  ? receive_process_write_record+0x180/0x180 [zfs]
 kernel:  ? thread_generic_wrapper+0x74/0x90 [spl]
 kernel:  kthread+0x120/0x140
 kernel:  ? __thread_exit+0x20/0x20 [spl]
 kernel:  ? kthread_park+0x90/0x90
 kernel:  ret_from_fork+0x35/0x40
 kernel: Modules linked in: xt_multiport binfmt_misc veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter softdog nfnetlink_log nfnetlink snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio intel_rapl_msr intel_rapl_common intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp i915 kvm_intel snd_hda_intel drm_kms_helper snd_intel_dspcfg kvm snd_hda_codec hci_uart irqbypass snd_hda_core punit_atom_debug btqca btrtl btbcm drm btintel crct10dif_pclmul i2c_algo_bit bluetooth crc32_pclmul fb_sys_fops syscopyarea sysfillrect ghash_clmulni_intel mei_hdcp snd_hwdep snd_pcm sysimgblt mei_txe ppdev ecdh_generic cryptd snd_timer ecc mei snd pcspkr parport_pc soundcore intel_cstate parport rfkill_gpio mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nct6775 hwmon_vid sunrpc coretemp ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O)
 kernel:  zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress hid_generic raid6_pq libcrc32c usbhid r8169 ahci xhci_pci libahci realtek i2c_i801 lpc_ich xhci_hcd video i2c_hid hid
 kernel: CR2: 0000000000000000
 kernel: ---[ end trace 737a4d6fa99905a7 ]---
 kernel: RIP: 0010:abd_verify+0x5/0x70 [zfs]
 kernel: Code: 00 00 00 66 90 0f 1f 44 00 00 55 8b 07 c1 e8 06 48 89 e5 83 e0 01 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 <8b> 07 a8 01 75 4f 55 48 89 e5 41 55 41 54 49 89 fc 53 a8 40 74 40
:
 kernel: RSP: 0018:ffffbd418d5b7930 EFLAGS: 00010246
 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
 kernel: RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
 kernel: RBP: ffffbd418d5b7948 R08: 0000000000000037 R09: 0000000000000000
 kernel: R10: 000000000000b1e5 R11: 0000000000000000 R12: 0000000000004000
 kernel: R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc06b19d0
 kernel: FS:  0000000000000000(0000) GS:ffff97e2efe80000(0000) knlGS:0000000000000000
 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 kernel: CR2: 0000000000000000 CR3: 00000003150e4000 CR4: 00000000001026e0
I too was suspecting that paralell execution of receives was causing the problem, but it turned out, that with serialized execution the problem still occurs (less often i think, but kind of hard to match a pattern here)

My problem is triggered by receiving snapshots into an encrypted dataset, i am not sending data anywhere from this node.

This machine is a consumer of snapshots for multiple machines (all different versions of ZFS). The kernel NULL pointer dereference always happens when a zfs receive runs, never when idle (most of the time there is a running receive, so this might not be valuable information). All sources are non-encrypted, the destination dataset is encrypted and opened with a key.

The commands being run are something along: zfs send -I dataset1@from dataset1@to |compression|buffering|ssh| zfs receive -s -F 'encrypted-dataset/backups/dataset1'

I have tried checking on /proc/spl/kstat/zfs/dbgmsg (zfs_flags = 256, turning it higher, and i can't manage the amout of data it produces) but nothing interesting showed up.

Perhaps someone could tell us, how to diagnose it further?

On a side note, it seems all three of us are using relatively old arcitectures, from the same era:

  • segdy: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0a 12/21/2015 - C220 released Q2'13
  • pquan: H81 Pro BTC R2.0 - H81 released Q3'13
  • me: Pentium J2900, Bay Trail-D released Q4'13

cserem avatar May 20 '21 18:05 cserem

Same problem here, on NixOS, Linux 5.4.114, ZFS 2.0.4, also (probably) when receiving into an encrypted dataset.

Kernel log: https://gist.github.com/lheckemann/1c11ab6865c44a192c9d2597d17ba72b

This is the second time these symptoms have shown up, though I didn't think to get logs the first time around.

I'm on a Celeron G3900 (Skylake, Q4'15).

EDIT: I'm receiving non-raw sends.

lheckemann avatar May 22 '21 09:05 lheckemann

  • segdy: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0a 12/21/2015 - C220 released Q2'13
  • pquan: H81 Pro BTC R2.0 - H81 released Q3'13
  • me: Pentium J2900, Bay Trail-D released Q4'13

I'm not sure if this is related to the age of the architecture. But add another of my systems: Intel(R) Xeon(R) CPU E3-1231 running on ASUS P9D-M board. This is relatively "new" when compared to the others.

Incidentally, I also have a X8SIL running an X3440 cpu and 32GB ram. That has not experienced the issue (yet). All these systems are, as you define them, consumers for ZFS snapshots (i.e. backup servers receiving the snapshots from production). The lockups seem related to load too. The more you receive, the more likely the issue occurs. Parallel streams seem trigger the issue more easily .

pquan avatar May 24 '21 10:05 pquan

Today I had a corruption on the receiving system. The load average was over 3000 when I had to hard-reset it. On boot, the zfs import took an unusual long time. zfs status is reporting a permanent data error

errors: Permanent errors have been detected in the following files:

        <0x1e39d>:<0x0>

But I'm not sure what that file is. If data corruption is involved, this would be a serious issue.

pquan avatar May 24 '21 10:05 pquan

Could you share the output of these two commands? cat /sys/module/icp/parameters/icp_aes_impl cat /sys/module/icp/parameters/icp_gcm_impl

I have a vague suspicion that all the CPUs mentioned here don't believe in aesni, and am wondering if there's something awry in the codepath they're taking.

(Note that I am not remotely an expert on this code, just a random person musing about what could be going so awry for you and yet not causing much more screaming by encryption users.)

(If you'd like to more aggressively experiment with this theory, I suspect the speed order for this goes aesni > x86_64 > generic, so if you are indeed using x86_64, you could try using the "generic" implementation and see if the problem still reproduces.)

rincebrain avatar May 24 '21 11:05 rincebrain

# cat /sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64 
# cat /sys/module/icp/parameters/icp_gcm_impl
cycle [fastest] generic pclmulqdq

I have switched both parameters to generic now. Will report back as things advance.

cserem avatar May 24 '21 11:05 cserem

One further thought - I don't see a way to validate which implementation it was currently using on "fastest", so it's also possible you were already using generic for one of both of those. So if it still NULL derefs, you could try explicitly setting x86_64 and pclmulqdq.

(I thought of this because if #12106 is indeed related to this, it definitely can't be using x86_64, so...)

rincebrain avatar May 24 '21 13:05 rincebrain

# cat /sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64 
# cat /sys/module/icp/parameters/icp_gcm_impl
cycle [fastest] generic pclmulqdq

I have switched both parameters to generic now. Will report back as things advance.

Same output here, on all CPU I use (aes-ni aware and in use not only for zfs but for many other things).

# cat /sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64 
# cat /sys/module/icp/parameters/icp_gcm_impl
cycle [fastest] generic pclmulqdq

Are we suggesting that the aes-ni implementation in the kernel is "buggy"? That is most certainly not the case.

Not sure if I pointed this out, but, in my case, I'm using different encryption keys on the sending and on the receiving side. Also, some of my senders are not encrypted, but the receiver is (via a parent fs which is encrypted).

pquan avatar May 24 '21 13:05 pquan

No, I was suggesting precisely the opposite - that the reason people here might be seeing an issue and others aren't is the absence of use of aesni.

edit: Just for reference, on my (Broadwell) Xeon:

$ for i in aes gcm;do cat /sys/module/icp/parameters/icp_${i}_impl;echo "";done;
cycle [fastest] generic x86_64 aesni
cycle [fastest] avx generic pclmulqdq

So I was hypothesizing the absence of use of one or both of those could be related to this, and was more suspicious of aesni than avx.

rincebrain avatar May 24 '21 13:05 rincebrain

Understood. Not sure how to test if zfs is using aes-ni, but a quick check on openssl with evp and non-evp api show a triple-fold increase with using evp, suggesting that aes-ni hardware is being exercised.

I'm going to turn explicitly on aesni and avx on one of the servers now. Just to test the opposite case of generic.

cycle fastest generic x86_64 [aesni]
cycle fastest [avx] generic pclmulqdq

pquan avatar May 24 '21 14:05 pquan

AFAICS from quickly skimming the code, there's no equivalent of vdev_raidz_bench for the crypto bits. Maybe I'll see how hard implementing one would be after this.

rincebrain avatar May 24 '21 14:05 rincebrain

Just to be clear I'm trying to reproduce the right thing, could somebody share which values of "encryption/keyformat" they're using on the dataset that things are being received under? (And of course, if they turn out to all be the same/different, that's another data point...)

rincebrain avatar May 24 '21 15:05 rincebrain

My consumer is:

rpool/backups  encryption                  aes-256-gcm                                    -
rpool/backups  keylocation                 file:///rpool/other-encrypted/zfs.KEY              local
rpool/backups  keyformat                   raw     

(After changing icp_[aes|gcm]_impl to "generic" a couple of hours ago, the next backup iteration caused the same null pointer/stack trace.

After restarting and re-applying the "generic" settings, has yet crashed again)

cserem avatar May 24 '21 16:05 cserem

Well, unless generic was mysteriously calculated to be faster before and you were always using that, there goes that theory. Rats.

edit: Well, I guess it's technically not impossible for both the non-aesni/avx implementations to cause similar misbehavior, but I'm going to assume for now that it's not specific to those...somehow.

rincebrain avatar May 24 '21 16:05 rincebrain

@rincebrain The fastest implementation is statically selected according to the available hardware. See here and here. To summarize:

GCM: generic < pclmulqdq < aes AES: generic < x86_64 < aesni

I'm still working on supporting more instruction set extensions for GCM and am thinking of adding a bench as part of that work. Currently the static selection should always select the fastest one.

Further it's quite unlikely that the implementations differ in behavior since ztest uses the cycle implementation, which cycles through all avialable implementations per encryption call/block, and should've catched any discrepancies. I think it's more likely that the load encryption puts on the system triggers some race condition. Just a guess though.

AttilaFueloep avatar May 24 '21 19:05 AttilaFueloep

@rincebrain The fastest implementation is statically selected according to the available hardware. See here and here. To summarize:

GCM: generic < pclmulqdq < aes AES: generic < x86_64 < aesni

I'm still working on supporting more instruction set extensions for GCM and am thinking of adding a bench as part of that work. Currently the static selection should always select the fastest one.

Further it's quite unlikely that the implementations differ in behavior since ztest uses the cycle implementation, which cycles through all avialable implementations per encryption call/block, and should've catched any discrepancies. I think it's more likely that the load encryption puts on the system triggers some race condition. Just a guess though.

Yeah, I'd buy that, the only reason I proposed the selection as an idea before was, as I said, that I noticed all of them happened to be using CPUs that did not believe in aesni, and the assumption that most people are not running into this regularly, so something about their setups is different.

I also, annoyingly, have not been able to reproduce this in a VM no matter how I've varied the amount of computational power or cores, so far. :/

rincebrain avatar May 25 '21 15:05 rincebrain

I must say, this is the first time someone talks about cpu beliefs. This is a serious issue. It happens on multiple cpu generations, mainboard etc. All my stuff is server grade using ecc memory and enterprise sata disks. No errors are logged on the disks, memory, cpu anywhere. Yet, now I'm starting to develop data errors on multiple servers. These seem related to errors in snapshots, even on "source" snapshots: i.e. those that were not transferred to a backup server, but that are native to a server. It started when I started using encrypted storage and OpenZFS v2.0.4 (specifially the proxmox kernel pve-kernel-5.4.114-1-pve)

It might be related to proxmox. I tought I've never seen it with the 0.8.5 zfs module, but turns out it was there. Now I have the same issue with 2.0.4. The thing that changed was encryption.

For now the data corruption regards only snapshots. I hope it won't start affecting live files.

It also seems there's some silent corruption goin on. On one of the servers with data errors, I never had this null pointer dereference, yet, there are corrupted snapshots, and may of them:

errors: Permanent errors have been detected in the following files:

        <0x16c01>:<0x0>
        <0x17506>:<0x1>
        <0x16220>:<0x0>
        <0x1742a>:<0x0>
        <0x1742a>:<0x1>
        <0x1742c>:<0x1>
        <0x1612f>:<0x0>
        <0x17435>:<0x0>
        <0x17439>:<0x0>
        <0x16151>:<0x0>
        <0x17451>:<0x0>
        <0x17451>:<0x1>
        <0x16d66>:<0x0>
        rpool/vm/vm-1234-disk-0@autosnap_2021-05-22_20:00:03_hourly:<0x0>
        <0x16280>:<0x0>
        <0x1678c>:<0x0>
        rpool/vm/vm-1234-disk-0@autosnap_2021-05-22_22:00:01_hourly:<0x0>
        <0x165ad>:<0x0>
        <0x165ad>:<0x1>
        <0x171bd>:<0x0>
        <0x15fcb>:<0x0>
        <0x165d8>:<0x0>
        <0x169f9>:<0x0>

Those hex ids are deleted snapshots (most likely). It can happen that a snapshot is created/deleted during the send/receive. They are asynchronous processes.

At the moment I'm running a scrub now on all servers. Will see if those errors go away or not (deleted snapshots). But the "vm-1234" snapshots above cannot be deleted. They're not visible when issuing zfs list -t snapshot, yet, they do have a name in the error list.

# zfs destroy rpool/vm/vm-1234-disk-0@autosnap_2021-05-22_22:00:01_hourly
could not find any snapshots to destroy; check snapshot names.

There's also more:

# zfs list -t snapshot rpool/vm/vm-1234-disk-0|grep 05-22
cannot iterate filesystems: I/O error
#

pquan avatar May 26 '21 09:05 pquan

Ok, I was waken up this morning by a (perhaps related) issue on one of the "sending" servers. The stack trace does not show the encryption part and seems more related to the arc. I haven't seen one of those for years, so being a very rare thing, I decided to post it here in hope it would help.

I was doing some things on this server tonight, namely:

  • transferring all filesystems from an encrypted parent to an unencrypted parent (i.e. from rpool/encrypted/filesystem{0..n} to rpool/non-encrypted/filesystem{0..n}
  • making hourly snapshots, some of which must have happened during the transfer, and deleting older expired snapshots.
  • making (very likely) other send transfers from the encrypted filesystems to a remote backup server.
  • scrubbing the rpool pool to verify for errors and hopefully remove those older snapshots which are now identified via a hex-id in the error table.
  • all this while running around 10 lxc containers and a couple of kvm virtual machines.

The server was completely locked up, load average around 6800 and the following stack trace:

[1214732.919868] VERIFY3(0 == remove_reference(hdr, NULL, tag)) failed (0 == 1)
[1214732.919930] PANIC at arc.c:3791:arc_buf_destroy()
[1214732.919985] Showing stack for process 1114
[1214732.920040] CPU: 2 PID: 1114 Comm: z_rd_int Tainted: P           O      5.4.114-1-pve #1
[1214732.920114] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.3.V1 02/04/2020
[1214732.920193] Call Trace:
[1214732.920251]  dump_stack+0x6d/0x8b
[1214732.920309]  spl_dumpstack+0x29/0x2b [spl]
[1214732.920365]  spl_panic+0xd3/0xfb [spl]
[1214732.920420]  ? __wake_up_common+0x78/0x140
[1214732.920521]  ? zei_add_range+0x140/0x140 [zfs]
[1214732.920576]  ? __wake_up+0x13/0x20
[1214732.920630]  ? __cv_broadcast+0x2a/0x40 [spl]
[1214732.920715]  ? zfs_zevent_post+0x223/0x280 [zfs]
[1214732.920792]  arc_buf_destroy+0x101/0x110 [zfs]
[1214732.920869]  arc_read_done+0x243/0x480 [zfs]
[1214732.920957]  zio_done+0x43f/0x1020 [zfs]
[1214732.921045]  zio_execute+0x99/0xf0 [zfs]
[1214732.921101]  taskq_thread+0x2f7/0x4e0 [spl]
[1214732.921157]  ? wake_up_q+0x80/0x80
[1214732.921243]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1214732.921302]  kthread+0x120/0x140
[1214732.921357]  ? task_done+0xb0/0xb0 [spl]
[1214732.921410]  ? kthread_park+0x90/0x90
[1214732.921465]  ret_from_fork+0x35/0x40
[1214871.316671] INFO: task z_rd_int:1114 blocked for more than 120 seconds.
[1214871.316733]       Tainted: P           O      5.4.114-1-pve #1
[1214871.316790] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214871.316864] z_rd_int        D    0  1114      2 0x80004000
[1214871.316921] Call Trace:
[1214871.316978]  __schedule+0x2e6/0x700
[1214871.317040]  schedule+0x33/0xa0
[1214871.317096]  spl_panic+0xf9/0xfb [spl]
[1214871.317195]  ? zei_add_range+0x140/0x140 [zfs]
[1214871.317252]  ? __wake_up+0x13/0x20
[1214871.317307]  ? __cv_broadcast+0x2a/0x40 [spl]
[1214871.317389]  ? zfs_zevent_post+0x223/0x280 [zfs]
[1214871.317466]  arc_buf_destroy+0x101/0x110 [zfs]
[1214871.317542]  arc_read_done+0x243/0x480 [zfs]
[1214871.317628]  zio_done+0x43f/0x1020 [zfs]
[1214871.317727]  zio_execute+0x99/0xf0 [zfs]
[1214871.317783]  taskq_thread+0x2f7/0x4e0 [spl]
[1214871.317839]  ? wake_up_q+0x80/0x80
[1214871.317925]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1214871.317984]  kthread+0x120/0x140
[1214871.318050]  ? task_done+0xb0/0xb0 [spl]
[1214871.318104]  ? kthread_park+0x90/0x90
[1214871.318157]  ret_from_fork+0x35/0x40
[1214871.318215] INFO: task dp_sync_taskq:1157 blocked for more than 120 seconds.
[1214871.318287]       Tainted: P           O      5.4.114-1-pve #1
[1214871.318344] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214871.318417] dp_sync_taskq   D    0  1157      2 0x80004000
[1214871.318473] Call Trace:
[1214871.318525]  __schedule+0x2e6/0x700
[1214871.318577]  schedule+0x33/0xa0
[1214871.318631]  cv_wait_common+0x104/0x130 [spl]
[1214871.318685]  ? wait_woken+0x80/0x80
[1214871.318751]  __cv_wait+0x15/0x20 [spl]
[1214871.318834]  dsl_scan_prefetch_thread+0xad/0x290 [zfs]
[1214871.318892]  taskq_thread+0x2f7/0x4e0 [spl]
[1214871.318947]  ? wake_up_q+0x80/0x80
[1214871.319001]  kthread+0x120/0x140
[1214871.319066]  ? task_done+0xb0/0xb0 [spl]
[1214871.319119]  ? kthread_park+0x90/0x90
[1214871.319173]  ret_from_fork+0x35/0x40
[1214871.319228] INFO: task txg_sync:1325 blocked for more than 120 seconds.
[1214871.319286]       Tainted: P           O      5.4.114-1-pve #1
[1214871.319342] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214871.319416] txg_sync        D    0  1325      2 0x80004000
[1214871.319472] Call Trace:
[1214871.319523]  __schedule+0x2e6/0x700
[1214871.319607]  ? vdev_mirror_io_start+0xb0/0x1a0 [zfs]
[1214871.319662]  schedule+0x33/0xa0
[1214871.319715]  schedule_timeout+0x152/0x330
[1214871.319778]  ? __next_timer_interrupt+0xd0/0xd0
[1214871.319833]  io_schedule_timeout+0x1e/0x50
[1214871.319888]  __cv_timedwait_common+0x138/0x170 [spl]
[1214871.319944]  ? wait_woken+0x80/0x80
[1214871.319998]  __cv_timedwait_io+0x19/0x20 [spl]
[1214871.320086]  zio_wait+0x139/0x280 [zfs]
[1214871.320162]  arc_read+0xd68/0x1210 [zfs]
[1214871.320249]  ? arc_hdr_set_compress+0x50/0x50 [zfs]
[1214871.320332]  dsl_scan_visitbp+0x77d/0xcd0 [zfs]
[1214871.320388]  ? __cv_broadcast+0x2a/0x40 [spl]
[1214871.320470]  ? dsl_scan_prefetch+0x222/0x260 [zfs]
[1214871.320552]  dsl_scan_visit_rootbp+0xe8/0x150 [zfs]
[1214871.320648]  dsl_scan_visitds+0x1ae/0x510 [zfs]
[1214871.320790]  ? dnode_rele_and_unlock+0xb6/0xe0 [zfs]
[1214871.320874]  ? dnode_rele+0x3b/0x40 [zfs]
[1214871.320951]  ? dbuf_rele_and_unlock+0x306/0x6a0 [zfs]
[1214871.321039]  ? dsl_dataset_hold_obj+0x68c/0x9e0 [zfs]
[1214871.321130]  ? dbuf_rele+0x3b/0x40 [zfs]
[1214871.321246]  dsl_scan_sync+0x90e/0x1320 [zfs]
[1214871.321337]  spa_sync+0x610/0xfe0 [zfs]
[1214871.321392]  ? mutex_lock+0x12/0x30
[1214871.321478]  ? spa_txg_history_init_io+0x104/0x110 [zfs]
[1214871.321568]  txg_sync_thread+0x2e1/0x4a0 [zfs]
[1214871.321656]  ? txg_thread_exit.isra.13+0x60/0x60 [zfs]
[1214871.321714]  thread_generic_wrapper+0x74/0x90 [spl]
[1214871.321770]  kthread+0x120/0x140
[1214871.321825]  ? __thread_exit+0x20/0x20 [spl]
[1214871.321879]  ? kthread_park+0x90/0x90
[1214871.321933]  ret_from_fork+0x35/0x40
[1214992.151145] INFO: task z_rd_int:1114 blocked for more than 241 seconds.
[1214992.151211]       Tainted: P           O      5.4.114-1-pve #1
[1214992.151268] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214992.151342] z_rd_int        D    0  1114      2 0x80004000
[1214992.151399] Call Trace:
[1214992.151456]  __schedule+0x2e6/0x700
[1214992.151518]  schedule+0x33/0xa0
[1214992.151584]  spl_panic+0xf9/0xfb [spl]
[1214992.151687]  ? zei_add_range+0x140/0x140 [zfs]
[1214992.151743]  ? __wake_up+0x13/0x20
[1214992.151798]  ? __cv_broadcast+0x2a/0x40 [spl]
[1214992.151882]  ? zfs_zevent_post+0x223/0x280 [zfs]
[1214992.151966]  arc_buf_destroy+0x101/0x110 [zfs]
[1214992.152042]  arc_read_done+0x243/0x480 [zfs]
[1214992.152128]  zio_done+0x43f/0x1020 [zfs]
[1214992.152213]  zio_execute+0x99/0xf0 [zfs]
[1214992.152281]  taskq_thread+0x2f7/0x4e0 [spl]
[1214992.152337]  ? wake_up_q+0x80/0x80
[1214992.152423]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1214992.152482]  kthread+0x120/0x140
[1214992.152537]  ? task_done+0xb0/0xb0 [spl]
[1214992.152591]  ? kthread_park+0x90/0x90
[1214992.152645]  ret_from_fork+0x35/0x40
[1214992.152702] INFO: task dp_sync_taskq:1157 blocked for more than 241 seconds.
[1214992.152774]       Tainted: P           O      5.4.114-1-pve #1
[1214992.152831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214992.152905] dp_sync_taskq   D    0  1157      2 0x80004000
[1214992.152968] Call Trace:
[1214992.153019]  __schedule+0x2e6/0x700
[1214992.153072]  schedule+0x33/0xa0
[1214992.153126]  cv_wait_common+0x104/0x130 [spl]
[1214992.153180]  ? wait_woken+0x80/0x80
[1214992.153235]  __cv_wait+0x15/0x20 [spl]
[1214992.153315]  dsl_scan_prefetch_thread+0xad/0x290 [zfs]
[1214992.153374]  taskq_thread+0x2f7/0x4e0 [spl]
[1214992.153428]  ? wake_up_q+0x80/0x80
[1214992.153482]  kthread+0x120/0x140
[1214992.153536]  ? task_done+0xb0/0xb0 [spl]
[1214992.153589]  ? kthread_park+0x90/0x90
[1214992.153654]  ret_from_fork+0x35/0x40
[1214992.153709] INFO: task txg_sync:1325 blocked for more than 241 seconds.
[1214992.153767]       Tainted: P           O      5.4.114-1-pve #1
[1214992.153823] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214992.153897] txg_sync        D    0  1325      2 0x80004000
[1214992.153953] Call Trace:
[1214992.154016]  __schedule+0x2e6/0x700
[1214992.154099]  ? vdev_mirror_io_start+0xb0/0x1a0 [zfs]
[1214992.159404]  schedule+0x33/0xa0
[1214992.159463]  schedule_timeout+0x152/0x330
[1214992.159521]  ? __next_timer_interrupt+0xd0/0xd0
[1214992.159579]  io_schedule_timeout+0x1e/0x50
[1214992.159640]  __cv_timedwait_common+0x138/0x170 [spl]
[1214992.159699]  ? wait_woken+0x80/0x80
[1214992.159754]  __cv_timedwait_io+0x19/0x20 [spl]
[1214992.159842]  zio_wait+0x139/0x280 [zfs]
[1214992.159918]  arc_read+0xd68/0x1210 [zfs]
[1214992.159994]  ? arc_hdr_set_compress+0x50/0x50 [zfs]
[1214992.160079]  dsl_scan_visitbp+0x77d/0xcd0 [zfs]
[1214992.160136]  ? __cv_broadcast+0x2a/0x40 [spl]
[1214992.160219]  ? dsl_scan_prefetch+0x222/0x260 [zfs]
[1214992.160304]  dsl_scan_visit_rootbp+0xe8/0x150 [zfs]
[1214992.160388]  dsl_scan_visitds+0x1ae/0x510 [zfs]
[1214992.160469]  ? dnode_rele_and_unlock+0xb6/0xe0 [zfs]
[1214992.160551]  ? dnode_rele+0x3b/0x40 [zfs]
[1214992.160628]  ? dbuf_rele_and_unlock+0x306/0x6a0 [zfs]
[1214992.160711]  ? dsl_dataset_hold_obj+0x68c/0x9e0 [zfs]
[1214992.160791]  ? dbuf_rele+0x3b/0x40 [zfs]
[1214992.160874]  dsl_scan_sync+0x90e/0x1320 [zfs]
[1214992.160959]  spa_sync+0x610/0xfe0 [zfs]
[1214992.161013]  ? mutex_lock+0x12/0x30
[1214992.161098]  ? spa_txg_history_init_io+0x104/0x110 [zfs]
[1214992.161186]  txg_sync_thread+0x2e1/0x4a0 [zfs]
[1214992.161273]  ? txg_thread_exit.isra.13+0x60/0x60 [zfs]
[1214992.161332]  thread_generic_wrapper+0x74/0x90 [spl]
[1214992.161388]  kthread+0x120/0x140
[1214992.161442]  ? __thread_exit+0x20/0x20 [spl]
[1214992.161497]  ? kthread_park+0x90/0x90
[1214992.161550]  ret_from_fork+0x35/0x40
[1215112.985599] INFO: task z_rd_int:1114 blocked for more than 362 seconds.
[1215112.985661]       Tainted: P           O      5.4.114-1-pve #1
[1215112.985718] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1215112.985792] z_rd_int        D    0  1114      2 0x80004000
[1215112.985849] Call Trace:
[1215112.985905]  __schedule+0x2e6/0x700
[1215112.985960]  schedule+0x33/0xa0
[1215112.986017]  spl_panic+0xf9/0xfb [spl]
[1215112.986119]  ? zei_add_range+0x140/0x140 [zfs]
[1215112.986175]  ? __wake_up+0x13/0x20
[1215112.986229]  ? __cv_broadcast+0x2a/0x40 [spl]
[1215112.986314]  ? zfs_zevent_post+0x223/0x280 [zfs]
[1215112.986395]  arc_buf_destroy+0x101/0x110 [zfs]
[1215112.986473]  arc_read_done+0x243/0x480 [zfs]
[1215112.986561]  zio_done+0x43f/0x1020 [zfs]
[1215112.986648]  zio_execute+0x99/0xf0 [zfs]
[1215112.986705]  taskq_thread+0x2f7/0x4e0 [spl]
[1215112.986761]  ? wake_up_q+0x80/0x80
[1215112.986847]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1215112.986906]  kthread+0x120/0x140
[1215112.986961]  ? task_done+0xb0/0xb0 [spl]
[1215112.987015]  ? kthread_park+0x90/0x90
[1215112.987069]  ret_from_fork+0x35/0x40
[1215112.987126] INFO: task dp_sync_taskq:1157 blocked for more than 362 seconds.
[1215112.987199]       Tainted: P           O      5.4.114-1-pve #1
[1215112.987255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1215112.987329] dp_sync_taskq   D    0  1157      2 0x80004000
[1215112.987385] Call Trace:
[1215112.987436]  __schedule+0x2e6/0x700
[1215112.987490]  schedule+0x33/0xa0
[1215112.987543]  cv_wait_common+0x104/0x130 [spl]
[1215112.987598]  ? wait_woken+0x80/0x80
[1215112.987653]  __cv_wait+0x15/0x20 [spl]
[1215112.987735]  dsl_scan_prefetch_thread+0xad/0x290 [zfs]
[1215112.987794]  taskq_thread+0x2f7/0x4e0 [spl]
[1215112.987849]  ? wake_up_q+0x80/0x80
[1215112.987902]  kthread+0x120/0x140
[1215112.987957]  ? task_done+0xb0/0xb0 [spl]
[1215112.988010]  ? kthread_park+0x90/0x90
[1215112.988064]  ret_from_fork+0x35/0x40
[1215233.820082] INFO: task z_rd_int:1114 blocked for more than 483 seconds.
[1215233.820144]       Tainted: P           O      5.4.114-1-pve #1
[1215233.820201] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1215233.820275] z_rd_int        D    0  1114      2 0x80004000
[1215233.820332] Call Trace:
[1215233.820389]  __schedule+0x2e6/0x700
[1215233.820443]  schedule+0x33/0xa0
[1215233.820501]  spl_panic+0xf9/0xfb [spl]
[1215233.820603]  ? zei_add_range+0x140/0x140 [zfs]
[1215233.820660]  ? __wake_up+0x13/0x20
[1215233.820714]  ? __cv_broadcast+0x2a/0x40 [spl]
[1215233.820799]  ? zfs_zevent_post+0x223/0x280 [zfs]
[1215233.820876]  arc_buf_destroy+0x101/0x110 [zfs]
[1215233.820953]  arc_read_done+0x243/0x480 [zfs]
[1215233.821041]  zio_done+0x43f/0x1020 [zfs]
[1215233.821129]  zio_execute+0x99/0xf0 [zfs]
[1215233.821185]  taskq_thread+0x2f7/0x4e0 [spl]
[1215233.821241]  ? wake_up_q+0x80/0x80
[1215233.821327]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1215233.821386]  kthread+0x120/0x140
[1215233.821441]  ? task_done+0xb0/0xb0 [spl]
[1215233.821494]  ? kthread_park+0x90/0x90
[1215233.821549]  ret_from_fork+0x35/0x40
[1215233.821606] INFO: task dp_sync_taskq:1157 blocked for more than 483 seconds.
[1215233.821679]       Tainted: P           O      5.4.114-1-pve #1
[1215233.821735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1215233.821809] dp_sync_taskq   D    0  1157      2 0x80004000
[1215233.821865] Call Trace:
[1215233.821917]  __schedule+0x2e6/0x700
[1215233.821970]  schedule+0x33/0xa0
[1215233.822024]  cv_wait_common+0x104/0x130 [spl]
[1215233.822078]  ? wait_woken+0x80/0x80
[1215233.822133]  __cv_wait+0x15/0x20 [spl]
[1215233.822216]  dsl_scan_prefetch_thread+0xad/0x290 [zfs]
[1215233.822274]  taskq_thread+0x2f7/0x4e0 [spl]
[1215233.822329]  ? wake_up_q+0x80/0x80
[1215233.822383]  kthread+0x120/0x140
[1215233.822437]  ? task_done+0xb0/0xb0 [spl]
[1215233.822491]  ? kthread_park+0x90/0x90
[1215233.822545]  ret_from_fork+0x35/0x40

This is an Intel(R) Xeon(R) CPU E5-1620 v2 (4 core, 8 threads) cpu, 32GB of ECC ram running on a Supermicro X9SRE-3F motherboard.

No other errors logged. The system was responsive due to a separated ext4 root unrelated to zfs. I was unable to unmount, mount, touch anything zfs-related. I had to issue a unmount/sync/reboot magic sysreq to reboot it.

Maybe the reason we're able to trigger this issue is because these are server systems doing heavy lifting on real parallel threads as opposed to "simulated" under a VM?

I hope this helps. Let me know if I can do anything to help diagnose further.

Edit: I'm reading he metrics on the server and can give some more details. The issue started around 2 o'clock in the morning, The total lockup of the server arrived around 4 o'clock, 2 hours later. I had the following symptoms:

  • iowait climbed from 10% to 100% steady, then at 4 o'clock iowait went to 50%, system time went from 0-5% to 50%
  • Disk io request went from 200/sec to 0 at 2 o'clock and staid like this until the reset.
  • swap usage was around 4GB the whole time (100%), with used swap climbing from 0 to ~512Mb starting at 2 o'clock
  • Swap io bytes were high. There was active swap in/out going on. This is not normal. Usually this server has 0% swapping going around.

All other parameters are ok. This last lock-up seems unrelated to the crypt path above. It looks more like the issues of ram starvation like we used to have back a few years ago.

pquan avatar May 27 '21 09:05 pquan

A data point: I left my Xeon D-1540 (8c16t, Broadwell) (Debian buster, 4.19.0-16-amd64, 2.0.3-1~bpo10+1) running a loop for 24 hours of running 16 zfs receives into an encrypted parent of an unencrypted source send (no flags) of a ~39GiB dataset I have, with no VM, just in case it was indeed something that couldn't be triggered for that reason, then destroying them, and so on. No excitement ensued.

Can you share the output of zfs get all [one of these source datasets] and zpool get all [source pool], and the same for the destination pool and dataset? I'd like to eliminate as many possible differences as I can.

rincebrain avatar May 29 '21 07:05 rincebrain

I have randomly picked two sources with two datasets, and the destination: zfs get all output It might be worth noting:

  • I have yet to encounter data corruption (scrubs are monthly, and the issue has been persistent around for about a year now (see target pool creation date)

  • The sends are incremental

  • Example command: /opt/sanoid-2.0.3/syncoid root@source-machine:rpool-ssd/data rpool/backups/source-machine/rpool-ssd/data --skip-parent -r --compress pigz-fast -o ServerAliveCountMax=3 -o ServerAliveInterval=30 --no-sync-snap --debug (there are machines where i make the sync-snap so the --no-sync-snap is omitted)

  • I have summarized the available crash data:

single thread, icp_aes_impl icp_gcm_impl on fastest Apr 02 08:16:26 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Apr 14 09:05:41 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Apr 17 04:33:22 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 May 01 16:02:09 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 May 06 11:46:07 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 May 12 05:49:43 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 May 12 21:07:13 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000

(Since the last two crashes where so close to eachother, I've implemented a watchdog and automatic unlocking, and went full berserk, re-enabling concurrency) 4 thread, icp_aes_impl icp_gcm_impl on fastest: May 21 23:25:21 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Uptime: 39:46:02 May 24 03:31:06 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Uptime: 52:05:45

4 threads icp_aes_impl icp_gcm_impl on generic: May 24 14:00:40 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Uptime: 10:29:34 May 26 03:29:24 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Uptime: 37:28:44

4 threads icp_aes_impl icp_gcm_impl on x86_64 / pclmulqdq No crash yet, uptime is 3 days, 7:47. (Not even close to the ~2 week best uptime with single thread, but with 4 threads it is the best so far)

cserem avatar May 29 '21 09:05 cserem

This time, I left a loop of 4 sends and rollbacks running, after creating an incremental by (on the same source I used for my prior send loop:

  • truncating half the files to 0b and deleting the other half
  • setting src and dst to compression=gzip-9 since the examples posted by cserem all had that
  • rsync a --inplace all the data back over
  • take a new snapshot

Then zfs send -I snap1 snap2 > incsend Then a loop of while true; for i in $(seq 1 4); do dd if=incsend bs=1M | zfs recv encroot/$i & done; wait; for i in $(seq 1 4);do zfs rollback encroot/$i@snap1;done; done; (boy, my server loved 4 gzip-9 streams at once)

Unfortunately, as you might guess from my not leading with a success statement, it did not reproduce after 24 hours of looping.

I'll probably explore including a zvol next, unless someone else comes up with a brighter idea.

rincebrain avatar May 30 '21 20:05 rincebrain

I think I stated before, my setup, but I'll try to explain my setup, in a deeper fashion:

  • I'm running encrypted to encrypted, recursive, incremental transfer. The keys on the two servers differ.
  • I'm using resumable transfers because of the size of each of them
  • The size of the dataset is easily 200-1TB each. Some of them are mail pools with millions of small files (in case this matters).
  • While doing the transfers, I'm also snapshotting these same datasets every hour, possibly having several snapshots occur during the transfer.
  • And, I'm deleting expired snapshots on both the source and destination server. Those expired snapshots are not interested in the transfer.
  • There are around 60 to 100 different snapshots in each dataset: 48 hourly, 30 daily, 12 monthly and an yearly snapshot, so some of the data is quite old.
  • Both servers use a 2x2 mirrors (2 stripes of two mirrors), spinning disks (2TB each, , and thus awfully slow, 512bytes physical but ashift is 12).

I do not lock easily. Once in a week/10 days.

Now, some meat. In the past days, I detected some corruption on one of the source servers. ZFS could mount the dataset but the data in it was inconsistent (missing most directories under the root). Also I could not unmount it once mounted. I had this issue on two related datasets: Same LXC root and data (1TB mail server data).

Weirdly enough, sending this corrupted dataset to another dataset on the same machine seemed to fix the issue on one of them. The other (the mail spool), I had to roll back to a known good state, then I had to resync all missing mails form an off-site server before putting it back in production (this is a multisite, geo-replicated mail spool setup, so I had some live data to sync in addition).

I was able to delete the corrupted datasets. A full scrub fixed the data errors in the already deleted snapshots.

I know all the above does not give any hints, other than anecdotal evidence of a possible problem. I'm only sure there was corruption to start with and it was silent. It probably got to break some algo somewhere. I'd start with a full scrub on all affected pools.

Here's some get all statistics. Please note, despite the name of the rpool (proxmox root pool) this is not an actual root pool containing the boot files. I just use the same name for historic reasons, but the pool is mounted inside an "ext4" root.

First: sending server pool:

# zpool get all rpool
NAME   PROPERTY                       VALUE                          SOURCE
rpool  size                           3.38T                          -
rpool  capacity                       49%                            -
rpool  altroot                        -                              default
rpool  health                         ONLINE                         -
rpool  guid                           10861474846250979260           -
rpool  version                        -                              default
rpool  bootfs                         -                              default
rpool  delegation                     on                             default
rpool  autoreplace                    off                            default
rpool  cachefile                      -                              default
rpool  failmode                       wait                           default
rpool  listsnapshots                  off                            default
rpool  autoexpand                     off                            default
rpool  dedupratio                     1.00x                          -
rpool  free                           1.71T                          -
rpool  allocated                      1.67T                          -
rpool  readonly                       off                            -
rpool  ashift                         12                             local
rpool  comment                        -                              default
rpool  expandsize                     -                              -
rpool  freeing                        0                              -
rpool  fragmentation                  35%                            -
rpool  leaked                         0                              -
rpool  multihost                      off                            default
rpool  checkpoint                     -                              -
rpool  load_guid                      8733562770359337761            -
rpool  autotrim                       off                            default
rpool  feature@async_destroy          enabled                        local
rpool  feature@empty_bpobj            active                         local
rpool  feature@lz4_compress           active                         local
rpool  feature@multi_vdev_crash_dump  enabled                        local
rpool  feature@spacemap_histogram     active                         local
rpool  feature@enabled_txg            active                         local
rpool  feature@hole_birth             active                         local
rpool  feature@extensible_dataset     active                         local
rpool  feature@embedded_data          active                         local
rpool  feature@bookmarks              enabled                        local
rpool  feature@filesystem_limits      enabled                        local
rpool  feature@large_blocks           enabled                        local
rpool  feature@large_dnode            active                         local
rpool  feature@sha512                 enabled                        local
rpool  feature@skein                  enabled                        local
rpool  feature@edonr                  enabled                        local
rpool  feature@userobj_accounting     active                         local
rpool  feature@encryption             active                         local
rpool  feature@project_quota          active                         local
rpool  feature@device_removal         enabled                        local
rpool  feature@obsolete_counts        enabled                        local
rpool  feature@zpool_checkpoint       enabled                        local
rpool  feature@spacemap_v2            active                         local
rpool  feature@allocation_classes     enabled                        local
rpool  feature@resilver_defer         enabled                        local
rpool  feature@bookmark_v2            enabled                        local
rpool  feature@redaction_bookmarks    disabled                       local
rpool  feature@redacted_datasets      disabled                       local
rpool  feature@bookmark_written       disabled                       local
rpool  feature@log_spacemap           disabled                       local
rpool  feature@livelist               disabled                       local
rpool  feature@device_rebuild         disabled                       local
rpool  feature@zstd_compress          disabled                       local

Sending server dataset. This is an example. All my datasets are configured the same way (lz4, xattr=sa etc).

# zfs get all rpool/vm/subvol-4242-disk-0
NAME                         PROPERTY              VALUE                         SOURCE
rpool/vm/subvol-4242-disk-0  type                  filesystem                    -
rpool/vm/subvol-4242-disk-0  creation              Wed Mar 10  8:45 2021         -
rpool/vm/subvol-4242-disk-0  used                  11.7G                         -
rpool/vm/subvol-4242-disk-0  available             1.60T                         -
rpool/vm/subvol-4242-disk-0  referenced            5.54G                         -
rpool/vm/subvol-4242-disk-0  compressratio         2.30x                         -
rpool/vm/subvol-4242-disk-0  mounted               yes                           -
rpool/vm/subvol-4242-disk-0  quota                 none                          default
rpool/vm/subvol-4242-disk-0  reservation           none                          default
rpool/vm/subvol-4242-disk-0  recordsize            128K                          default
rpool/vm/subvol-4242-disk-0  mountpoint            /rpool/vm/subvol-4242-disk-0  default
rpool/vm/subvol-4242-disk-0  sharenfs              off                           default
rpool/vm/subvol-4242-disk-0  checksum              on                            default
rpool/vm/subvol-4242-disk-0  compression           lz4                           inherited from rpool
rpool/vm/subvol-4242-disk-0  atime                 off                           inherited from rpool
rpool/vm/subvol-4242-disk-0  devices               on                            default
rpool/vm/subvol-4242-disk-0  exec                  on                            default
rpool/vm/subvol-4242-disk-0  setuid                on                            default
rpool/vm/subvol-4242-disk-0  readonly              off                           default
rpool/vm/subvol-4242-disk-0  zoned                 off                           default
rpool/vm/subvol-4242-disk-0  snapdir               hidden                        default
rpool/vm/subvol-4242-disk-0  aclmode               discard                       default
rpool/vm/subvol-4242-disk-0  aclinherit            restricted                    default
rpool/vm/subvol-4242-disk-0  createtxg             3675833                       -
rpool/vm/subvol-4242-disk-0  canmount              on                            default
rpool/vm/subvol-4242-disk-0  xattr                 sa                            inherited from rpool
rpool/vm/subvol-4242-disk-0  copies                1                             default
rpool/vm/subvol-4242-disk-0  version               5                             -
rpool/vm/subvol-4242-disk-0  utf8only              off                           -
rpool/vm/subvol-4242-disk-0  normalization         none                          -
rpool/vm/subvol-4242-disk-0  casesensitivity       sensitive                     -
rpool/vm/subvol-4242-disk-0  vscan                 off                           default
rpool/vm/subvol-4242-disk-0  nbmand                off                           default
rpool/vm/subvol-4242-disk-0  sharesmb              off                           default
rpool/vm/subvol-4242-disk-0  refquota              none                          default
rpool/vm/subvol-4242-disk-0  refreservation        none                          default
rpool/vm/subvol-4242-disk-0  guid                  13076753325802082524          -
rpool/vm/subvol-4242-disk-0  primarycache          all                           default
rpool/vm/subvol-4242-disk-0  secondarycache        all                           default
rpool/vm/subvol-4242-disk-0  usedbysnapshots       6.13G                         -
rpool/vm/subvol-4242-disk-0  usedbydataset         5.54G                         -
rpool/vm/subvol-4242-disk-0  usedbychildren        0B                            -
rpool/vm/subvol-4242-disk-0  usedbyrefreservation  0B                            -
rpool/vm/subvol-4242-disk-0  logbias               latency                       default
rpool/vm/subvol-4242-disk-0  objsetid              53904                         -
rpool/vm/subvol-4242-disk-0  dedup                 off                           default
rpool/vm/subvol-4242-disk-0  mlslabel              none                          default
rpool/vm/subvol-4242-disk-0  sync                  standard                      inherited from rpool/vm
rpool/vm/subvol-4242-disk-0  dnodesize             auto                          inherited from rpool
rpool/vm/subvol-4242-disk-0  refcompressratio      2.06x                         -
rpool/vm/subvol-4242-disk-0  written               8K                            -
rpool/vm/subvol-4242-disk-0  logicalused           24.6G                         -
rpool/vm/subvol-4242-disk-0  logicalreferenced     10.8G                         -
rpool/vm/subvol-4242-disk-0  volmode               default                       default
rpool/vm/subvol-4242-disk-0  filesystem_limit      none                          default
rpool/vm/subvol-4242-disk-0  snapshot_limit        none                          default
rpool/vm/subvol-4242-disk-0  filesystem_count      none                          default
rpool/vm/subvol-4242-disk-0  snapshot_count        none                          default
rpool/vm/subvol-4242-disk-0  snapdev               hidden                        default
rpool/vm/subvol-4242-disk-0  acltype               posix                         inherited from rpool
rpool/vm/subvol-4242-disk-0  context               none                          default
rpool/vm/subvol-4242-disk-0  fscontext             none                          default
rpool/vm/subvol-4242-disk-0  defcontext            none                          default
rpool/vm/subvol-4242-disk-0  rootcontext           none                          default
rpool/vm/subvol-4242-disk-0  relatime              off                           default
rpool/vm/subvol-4242-disk-0  redundant_metadata    all                           default
rpool/vm/subvol-4242-disk-0  overlay               on                            default
rpool/vm/subvol-4242-disk-0  encryption            aes-256-gcm                   -
rpool/vm/subvol-4242-disk-0  keylocation           none                          default
rpool/vm/subvol-4242-disk-0  keyformat             passphrase                    -
rpool/vm/subvol-4242-disk-0  pbkdf2iters           350000                        -
rpool/vm/subvol-4242-disk-0  encryptionroot        rpool                         -
rpool/vm/subvol-4242-disk-0  keystatus             available                     -
rpool/vm/subvol-4242-disk-0  special_small_blocks  0                             default

Here's a receiver get all :

# zpool get all rpool
NAME   PROPERTY                       VALUE                          SOURCE
rpool  size                           3.62T                          -
rpool  capacity                       70%                            -
rpool  altroot                        -                              default
rpool  health                         ONLINE                         -
rpool  guid                           16014196203982227614           -
rpool  version                        -                              default
rpool  bootfs                         rpool/ROOT/pve-1               local
rpool  delegation                     on                             default
rpool  autoreplace                    off                            default
rpool  cachefile                      -                              default
rpool  failmode                       wait                           default
rpool  listsnapshots                  off                            default
rpool  autoexpand                     off                            default
rpool  dedupratio                     1.00x                          -
rpool  free                           1.07T                          -
rpool  allocated                      2.56T                          -
rpool  readonly                       off                            -
rpool  ashift                         12                             local
rpool  comment                        -                              default
rpool  expandsize                     -                              -
rpool  freeing                        0                              -
rpool  fragmentation                  29%                            -
rpool  leaked                         0                              -
rpool  multihost                      off                            default
rpool  checkpoint                     -                              -
rpool  load_guid                      9824040886009751730            -
rpool  autotrim                       off                            default
rpool  feature@async_destroy          enabled                        local
rpool  feature@empty_bpobj            active                         local
rpool  feature@lz4_compress           active                         local
rpool  feature@multi_vdev_crash_dump  enabled                        local
rpool  feature@spacemap_histogram     active                         local
rpool  feature@enabled_txg            active                         local
rpool  feature@hole_birth             active                         local
rpool  feature@extensible_dataset     active                         local
rpool  feature@embedded_data          active                         local
rpool  feature@bookmarks              enabled                        local
rpool  feature@filesystem_limits      enabled                        local
rpool  feature@large_blocks           enabled                        local
rpool  feature@large_dnode            active                         local
rpool  feature@sha512                 enabled                        local
rpool  feature@skein                  enabled                        local
rpool  feature@edonr                  enabled                        local
rpool  feature@userobj_accounting     active                         local
rpool  feature@encryption             active                         local
rpool  feature@project_quota          active                         local
rpool  feature@device_removal         enabled                        local
rpool  feature@obsolete_counts        enabled                        local
rpool  feature@zpool_checkpoint       enabled                        local
rpool  feature@spacemap_v2            active                         local
rpool  feature@allocation_classes     enabled                        local
rpool  feature@resilver_defer         enabled                        local
rpool  feature@bookmark_v2            enabled                        local
rpool  feature@redaction_bookmarks    disabled                       local
rpool  feature@redacted_datasets      disabled                       local
rpool  feature@bookmark_written       disabled                       local
rpool  feature@log_spacemap           disabled                       local
rpool  feature@livelist               disabled                       local
rpool  feature@device_rebuild         disabled                       local
rpool  feature@zstd_compress          disabled                       local

And a dataset:

NAME                                       PROPERTY              VALUE                                       SOURCE
rpool/backup/vm/subvol-3140-disk-1  type                  filesystem                                  -
rpool/backup/vm/subvol-3140-disk-1  creation              Thu Apr  8 10:28 2021                       -
rpool/backup/vm/subvol-3140-disk-1  used                  20.6G                                       -
rpool/backup/vm/subvol-3140-disk-1  available             970G                                        -
rpool/backup/vm/subvol-3140-disk-1  referenced            7.66G                                       -
rpool/backup/vm/subvol-3140-disk-1  compressratio         2.13x                                       -
rpool/backup/vm/subvol-3140-disk-1  mounted               no                                          -
rpool/backup/vm/subvol-3140-disk-1  quota                 none                                        default
rpool/backup/vm/subvol-3140-disk-1  reservation           none                                        default
rpool/backup/vm/subvol-3140-disk-1  recordsize            128K                                        default
rpool/backup/vm/subvol-3140-disk-1  mountpoint            /rpool/backup/vm/subvol-3140-disk-1  default
rpool/backup/vm/subvol-3140-disk-1  sharenfs              off                                         default
rpool/backup/vm/subvol-3140-disk-1  checksum              on                                          default
rpool/backup/vm/subvol-3140-disk-1  compression           lz4                                         inherited from rpool
rpool/backup/vm/subvol-3140-disk-1  atime                 off                                         inherited from rpool
rpool/backup/vm/subvol-3140-disk-1  devices               on                                          default
rpool/backup/vm/subvol-3140-disk-1  exec                  on                                          default
rpool/backup/vm/subvol-3140-disk-1  setuid                on                                          default
rpool/backup/vm/subvol-3140-disk-1  readonly              off                                         default
rpool/backup/vm/subvol-3140-disk-1  zoned                 off                                         default
rpool/backup/vm/subvol-3140-disk-1  snapdir               hidden                                      default
rpool/backup/vm/subvol-3140-disk-1  aclmode               discard                                     default
rpool/backup/vm/subvol-3140-disk-1  aclinherit            restricted                                  default
rpool/backup/vm/subvol-3140-disk-1  createtxg             10724                                       -
rpool/backup/vm/subvol-3140-disk-1  canmount              on                                          default
rpool/backup/vm/subvol-3140-disk-1  xattr                 on                                          default
rpool/backup/vm/subvol-3140-disk-1  copies                1                                           default
rpool/backup/vm/subvol-3140-disk-1  version               5                                           -
rpool/backup/vm/subvol-3140-disk-1  utf8only              off                                         -
rpool/backup/vm/subvol-3140-disk-1  normalization         none                                        -
rpool/backup/vm/subvol-3140-disk-1  casesensitivity       sensitive                                   -
rpool/backup/vm/subvol-3140-disk-1  vscan                 off                                         default
rpool/backup/vm/subvol-3140-disk-1  nbmand                off                                         default
rpool/backup/vm/subvol-3140-disk-1  sharesmb              off                                         default
rpool/backup/vm/subvol-3140-disk-1  refquota              none                                        default
rpool/backup/vm/subvol-3140-disk-1  refreservation        none                                        default
rpool/backup/vm/subvol-3140-disk-1  guid                  5702550096622166170                         -
rpool/backup/vm/subvol-3140-disk-1  primarycache          all                                         default
rpool/backup/vm/subvol-3140-disk-1  secondarycache        all                                         default
rpool/backup/vm/subvol-3140-disk-1  usedbysnapshots       12.9G                                       -
rpool/backup/vm/subvol-3140-disk-1  usedbydataset         7.66G                                       -
rpool/backup/vm/subvol-3140-disk-1  usedbychildren        0B                                          -
rpool/backup/vm/subvol-3140-disk-1  usedbyrefreservation  0B                                          -
rpool/backup/vm/subvol-3140-disk-1  logbias               latency                                     default
rpool/backup/vm/subvol-3140-disk-1  objsetid              185                                         -
rpool/backup/vm/subvol-3140-disk-1  dedup                 off                                         default
rpool/backup/vm/subvol-3140-disk-1  mlslabel              none                                        default
rpool/backup/vm/subvol-3140-disk-1  sync                  standard                                    inherited from rpool
rpool/backup/vm/subvol-3140-disk-1  dnodesize             legacy                                      default
rpool/backup/vm/subvol-3140-disk-1  refcompressratio      1.19x                                       -
rpool/backup/vm/subvol-3140-disk-1  written               0                                           -
rpool/backup/vm/subvol-3140-disk-1  logicalused           43.2G                                       -
rpool/backup/vm/subvol-3140-disk-1  logicalreferenced     8.96G                                       -
rpool/backup/vm/subvol-3140-disk-1  volmode               default                                     default
rpool/backup/vm/subvol-3140-disk-1  filesystem_limit      none                                        default
rpool/backup/vm/subvol-3140-disk-1  snapshot_limit        none                                        default
rpool/backup/vm/subvol-3140-disk-1  filesystem_count      none                                        default
rpool/backup/vm/subvol-3140-disk-1  snapshot_count        none                                        default
rpool/backup/vm/subvol-3140-disk-1  snapdev               hidden                                      default
rpool/backup/vm/subvol-3140-disk-1  acltype               off                                         default
rpool/backup/vm/subvol-3140-disk-1  context               none                                        default
rpool/backup/vm/subvol-3140-disk-1  fscontext             none                                        default
rpool/backup/vm/subvol-3140-disk-1  defcontext            none                                        default
rpool/backup/vm/subvol-3140-disk-1  rootcontext           none                                        default
rpool/backup/vm/subvol-3140-disk-1  relatime              off                                         default
rpool/backup/vm/subvol-3140-disk-1  redundant_metadata    all                                         default
rpool/backup/vm/subvol-3140-disk-1  overlay               on                                          default
rpool/backup/vm/subvol-3140-disk-1  encryption            aes-256-gcm                                 -
rpool/backup/vm/subvol-3140-disk-1  keylocation           none                                        default
rpool/backup/vm/subvol-3140-disk-1  keyformat             passphrase                                  -
rpool/backup/vm/subvol-3140-disk-1  pbkdf2iters           350000                                      -
rpool/backup/vm/subvol-3140-disk-1  encryptionroot        rpool/backup                          -
rpool/backup/vm/subvol-3140-disk-1  keystatus             available                                   -
rpool/backup/vm/subvol-3140-disk-1  special_small_blocks  0                                           default

And a last thing I almost forgot: On some of the systems I'm having a misaligned write error. In the past this seemed related to hardware problems with bad sata cables and/or controller issues. I'm getting this issue on servers with onboard sata ports with very short cables. Again all this started after the upgrade to 0.8.x and most prominently I'm finding it on v2.0.4. This happens on both SSD and SATA disks. I'll reduce the speed on the sata disks to 1.5GBit as a check but doing so on the SSDs would not be feasible.

Sorry to drop all this meat on the table. I hope I'm not causing confusion. I've not been able to isolate the issue in a better way.

pquan avatar May 31 '21 14:05 pquan

Update: 4 threads icp_aes_impl icp_gcm_impl on x86_64 / pclmulqdq crashed after: 5 days, 14:47 Trying again with the above parameters.

I was able to get the zfs related processes:

root     20454  0.0  0.0   2384   700 ?        S    18:25   0:00 sh -c ssh  -o ServerAliveCountMax=3 -o ServerAliveInterval=30   -S /tmp/syncoid-root@source1-1622478259 root@source1 ' zfs send  -I '"'"'rpool/rpool-ssd-backup/subvol-120-disk-2'"'"'@'"'"'2021-05-31-180000'"'"' '"'"'rpool/rpool-ssd-backup/subvol-120-disk-2'"'"'@'"'"'2021-05-31-181500'"'"' | pigz -3 | mbuffer  -q -s 128k -m 16M 2>/dev/null' | mbuffer  -q -s 128k -m 16M 2>/dev/null | pigz -dc | pv -p -t -e -r -b -s 52098832 |  zfs receive  -s -F 'rpool/backups/SOURCE1/rpool/rpool-ssd-backup/subvol-120-disk-2' 2>&1
root     20459  0.0  0.0  15708  4172 ?        S    18:25   0:00 zfs receive -s -F rpool/backups/SOURCE1/rpool/rpool-ssd-backup/subvol-120-disk-2
root     28671  0.0  0.0   2384   764 ?        S    18:25   0:00 sh -c ssh  -o ServerAliveCountMax=3 -o ServerAliveInterval=30   -S /tmp/syncoid-root@SOURCE2-1622478239 root@SOURCE2 ' zfs send  -I '"'"'rpool/data/subvol-117-disk-0'"'"'@'"'"'syncoid_pve_2021-05-31:18:12:10-GMT02:00'"'"' '"'"'rpool/data/subvol-117-disk-0'"'"'@'"'"'syncoid_pve_2021-05-31:18:25:51-GMT02:00'"'"' | pigz -3 | mbuffer  -q -s 128k -m 16M 2>/dev/null' | mbuffer  -q -s 128k -m 16M 2>/dev/null | pigz -dc | pv -p -t -e -r -b -s 495575408 |  zfs receive  -s -F 'rpool/backups/SOURCE2/rpool/data/subvol-117-disk-0' 2>&1
root     28680  0.3  0.0  15708  4424 ?        D    18:25   0:00 zfs receive -s -F rpool/backups/SOURCE2/rpool/data/subvol-117-disk-0
root     31831  0.0  0.0   2384   760 ?        S    18:26   0:00 sh -c zfs rollback -R 'rpool/backups/SOURCE3/rpool-ssd/data/subvol-104-disk-1'@'2021-05-31-180000'
root     31832  0.0  0.0  15704  4144 ?        D    18:26   0:00 zfs rollback -R rpool/backups/SOURCE3/rpool-ssd/data/subvol-104-disk-1@2021-05-31-180000
root     31863  0.0  0.0   2384   764 ?        S    18:26   0:00 sh -c zfs destroy 'rpool/backups/SOURCE4/rpool/data/subvol-106-disk-2'@syncoid_pve_2021-05-31:18:12:03-GMT02:00
root     31865  0.0  0.0  15708  4012 ?        S    18:26   0:00 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-106-disk-2@syncoid_pve_2021-05-31:18:12:03-GMT02:00

Since the destroys (thoese are some of the receives where I deliberately make a snapshot for syncing, and they are being cleaned up after the transfer) have the highest pid, perhaps the crash is related to that? Then again the rollback is 'D'... I will hopefully have more ps outputs in the future.

cserem avatar May 31 '21 17:05 cserem

4 threads icp_aes_impl icp_gcm_impl on x86_64 / pclmulqdq crashed after: 1 day, 6:44 I am switching back to default (fastest)

Unfortunately i couldn't capture a process list this time.

Here are some events from previous crashes up to the point the system is rebooted and the pool is imported:

zpool history
2021-05-24.03:30:46 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-112-disk-1@syncoid_pve_2021-05-24:03:15:33-GMT02:00
2021-05-24.03:30:51 zfs rollback -R rpool/backups/SOURCE8/rpool/data/subvol-106-disk-2@syncoid_pve_2021-05-24:03:15:50-GMT02:00
2021-05-24.03:30:52 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1@syncoid_pve_2021-05-24:03:15:46-GMT02:00
2021-05-24.03:30:55 zfs receive -s -F rpool/backups/SOURCE8/rpool/data/subvol-106-disk-2
2021-05-24.03:30:57 zfs destroy rpool/backups/SOURCE8/rpool/data/subvol-106-disk-2@syncoid_pve_2021-05-24:03:15:50-GMT02:00
2021-05-24.03:31:00 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1
2021-05-24.03:31:02 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1@syncoid_pve_2021-05-24:03:15:46-GMT02:00
2021-05-24.03:31:03 zfs receive -s -F rpool/backups/SOURCE3/rpool/rpool-ssd-backup/subvol-120-disk-2
2021-05-24.03:31:04 zfs receive -s -F rpool/backups/SOURCE7/rpool-hdd/data/vm-102-disk-5
2021-05-24.03:31:04 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/ROOT/pve-1@2021-05-24-030000
>>>>May 24 03:31:06 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
2021-05-24.07:11:22 zpool import -N rpool
--


2021-05-24.14:00:33 zfs receive -s -F rpool/backups/SOURCE7/rpool-hdd/data/vm-101-disk-0
2021-05-24.14:00:34 zfs receive -F rpool/backups/SOURCE5/rpool/data/vm-100-disk-2
2021-05-24.14:00:34 zfs receive -s -F rpool/backups/SOURCE4/rpool/ROOT/pve-1
2021-05-24.14:00:35 zfs destroy rpool/backups/SOURCE5/rpool/data/vm-100-disk-2@syncoid_pve_2021-05-24:13:51:04-GMT02:00
2021-05-24.14:00:35 zfs rollback -R rpool/backups/SOURCE7/rpool-hdd/data/vm-102-disk-0@2021-05-24-134500
2021-05-24.14:00:36 zfs receive -s -F rpool/backups/SOURCE6/rpool/data/vm-101-disk-0
2021-05-24.14:00:37 zfs destroy rpool/backups/SOURCE6/rpool/data/vm-101-disk-0@syncoid_pve_2021-05-24:13:51:14-GMT02:00
2021-05-24.14:00:37 zfs destroy rpool/backups/SOURCE4/rpool/ROOT/pve-1@syncoid_pve_2021-05-24:13:50:54-GMT02:00
2021-05-24.14:00:38 zfs rollback -R rpool/backups/SOURCE6/rpool/data/vm-102-disk-0@syncoid_pve_2021-05-24:13:51:24-GMT02:00
2021-05-24.14:00:39 zfs rollback -R rpool/backups/SOURCE3/rpool/ROOT/pve-1@2021-05-24-134500
>>>>May 24 14:00:40 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
2021-05-24.16:03:19 zpool import -N rpool
--


2021-05-26.03:29:07 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-101-disk-2
2021-05-26.03:29:10 zfs rollback -R rpool/backups/SOURCE3/rpool/rpool-ssd-backup/subvol-120-disk-1@2021-05-26-030000
2021-05-26.03:29:10 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-101-disk-2@syncoid_pve_2021-05-26:03:14:09-GMT02:00
2021-05-26.03:29:11 zfs receive -s -F rpool/backups/SOURCE3/rpool/rpool-ssd-backup/subvol-120-disk-1
2021-05-26.03:29:13 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-102-disk-1@syncoid_pve_2021-05-26:03:14:20-GMT02:00
2021-05-26.03:29:14 zfs rollback -R rpool/backups/SOURCE8/rpool/ROOT/pve-1@syncoid_pve_2021-05-26:03:14:38-GMT02:00
2021-05-26.03:29:16 zfs rollback -R rpool/backups/SOURCE3/rpool/rpool-ssd-backup/subvol-120-disk-2@2021-05-26-030000
2021-05-26.03:29:20 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-102-disk-1
2021-05-26.03:29:22 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-102-disk-1@syncoid_pve_2021-05-26:03:14:20-GMT02:00
>>>>May 26 03:29:24 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
2021-05-26.03:29:26 zfs receive -s -F rpool/backups/SOURCE8/rpool/ROOT/pve-1
2021-05-26.03:40:17 zpool import -N rpool
--


2021-05-31.18:25:37 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-112-disk-1@syncoid_pve_2021-05-31:18:11:48-GMT02:00
2021-05-31.18:25:38 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-100-disk-1@2021-05-31-180000
2021-05-31.18:25:42 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-112-disk-1
2021-05-31.18:25:42 zfs receive -s -F rpool/backups/SOURCE8/rpool/data/subvol-106-disk-0
2021-05-31.18:25:43 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-100-disk-1
2021-05-31.18:25:43 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-112-disk-1@syncoid_pve_2021-05-31:18:11:48-GMT02:00
2021-05-31.18:25:44 zfs destroy rpool/backups/SOURCE8/rpool/data/subvol-106-disk-0@syncoid_pve_2021-05-31:18:11:39-GMT02:00
>>>>May 31 18:25:44 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
2021-05-31.18:25:45 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-100-disk-2@2021-05-31-180000
2021-05-31.18:25:47 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1@syncoid_pve_2021-05-31:18:12:00-GMT02:00
2021-05-31.18:25:47 zfs rollback -R rpool/backups/SOURCE8/rpool/data/subvol-106-disk-1@syncoid_pve_2021-05-31:18:11:51-GMT02:00
2021-05-31.18:25:48 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-100-disk-2
2021-05-31.18:25:49 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-0@2021-05-31-180000
2021-05-31.18:25:49 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1
2021-05-31.18:25:50 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-0
2021-05-31.18:25:51 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1@syncoid_pve_2021-05-31:18:12:00-GMT02:00
2021-05-31.18:25:51 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-1@2021-05-31-180000
2021-05-31.18:25:52 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-1
2021-05-31.18:25:54 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-117-disk-0@syncoid_pve_2021-05-31:18:12:10-GMT02:00
2021-05-31.18:25:54 zfs receive -s -F rpool/backups/SOURCE8/rpool/data/subvol-106-disk-1
2021-05-31.18:25:55 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-2@2021-05-31-180000
2021-05-31.18:26:04 zfs destroy rpool/backups/SOURCE8/rpool/data/subvol-106-disk-1@syncoid_pve_2021-05-31:18:11:51-GMT02:00
2021-05-31.18:26:05 zfs rollback -R rpool/backups/SOURCE8/rpool/data/subvol-106-disk-2@syncoid_pve_2021-05-31:18:12:03-GMT02:00
2021-05-31.18:26:09 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-2
2021-05-31.18:26:11 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-104-disk-0@2021-05-31-180000
2021-05-31.18:26:12 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-104-disk-0
2021-05-31.18:37:02 zpool import -N rpool

--


2021-06-02.01:07:35 zfs destroy rpool/backups/SOURCE4/rpool/ROOT@syncoid_pve_2021-06-02:00:39:31-GMT02:00
2021-06-02.01:07:37 zfs receive -F rpool/backups/SOURCE5/rpool/data/subvol-101-disk-1
2021-06-02.01:07:37 zfs destroy rpool/backups/SOURCE5/rpool/data/subvol-101-disk-1@syncoid_pve_2021-06-02:00:39:37-GMT02:00
2021-06-02.01:07:39 zfs rollback -R rpool/backups/SOURCE4/rpool/ROOT/pve-1@syncoid_pve_2021-06-02:00:39:44-GMT02:00
2021-06-02.01:07:41 zfs receive -s -F rpool/backups/SOURCE7/rpool-hdd/data/subvol-103-disk-0
2021-06-02.01:07:42 zfs rollback -R rpool/backups/SOURCE7/rpool-hdd/data/vm-101-disk-0@2021-06-02-003000
2021-06-02.01:07:45 zfs rollback -R rpool/backups/SOURCE6/rpool/data/vm-100-disk-0@syncoid_pve_2021-06-02:00:40:18-GMT02:00
2021-06-02.01:07:45 zfs rollback -R rpool/backups/SOURCE5/rpool/data/vm-100-disk-1@syncoid_pve_2021-06-02:00:39:46-GMT02:00
>>>>Jun 02 01:07:46 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
2021-06-02.01:07:48 zfs receive -s -F rpool/backups/SOURCE7/rpool-hdd/data/vm-101-disk-0
2021-06-02.01:07:49 zfs rollback -R rpool/backups/SOURCE7/rpool-hdd/data/vm-102-disk-0@2021-06-02-003000
2021-06-02.01:21:17 zpool import -N rpool

Aren't there some commands that I could run after the exception, to gather more useful data?

cserem avatar Jun 02 '21 07:06 cserem

It seems that "reflowing" some of the problematic datasets on different mount point in the same pool has had some beneficial effects in my case. I mean just moving datasets with syncoid from rpool/mount1/dataset[x] -> rpool/mount2/dataset[x] and deleting some older snapshots and scrubbing the pool. I'm moving said datasets back to their original position now to see if the fix is permanent. This is all empiric work, but it would support the theory of some type of corruption occurring undetected. It does not seem to affect data tough, just metadata (I hope).

BTW: zfs send/receive is so painfully slow. 2x2 mirrors spinning disk often manage only 20-30MB on average, with lots of time spent under the 5-10MB region.

pquan avatar Jun 04 '21 07:06 pquan

You might benefit, if you're not using it, from zfs send -c, since you're using gzip-9, and I would find it difficult to overstate how slow it is.

edit: I'm terribly sorry, I was thinking of someone else in the thread who is using gzip-9. My suggestion remains as stated, but assuming its properties work for you, you might want to explore using -w to avoid paying the de+re-encryption overhead.

Because, for example, I get much faster throughput than that on my single-disk pools sans encryption.

You might also want to use e.g. perf top and see if it is indeed spending a whole bunch of CPU cycles in the encryption code, or somewhere else.

rincebrain avatar Jun 04 '21 08:06 rincebrain

edit: I'm terribly sorry, I was thinking of someone else in the thread who is using gzip-9. My suggestion remains as stated, but assuming its properties work for you, you might want to explore using -w to avoid paying the de+re-encryption overhead.

Because, for example, I get much faster throughput than that on my single-disk pools sans encryption.

You might also want to use e.g. perf top and see if it is indeed spending a whole bunch of CPU cycles in the encryption code, or somewhere else.

I'm spending ~50% of cpu time in encryption/decryption and lz4 compression/decompression. However, running with -w increases the speed only marginally. Guess I'm IO bound.

pquan avatar Jun 06 '21 14:06 pquan

While doing the above test, I managed to lock up my server, transferring the single dataset from one place to another. Below is the stack trace. I was running a zfs send/receive with -w, so no encryption/decryption was taking place. There were some snapshotting, listing, deleting mixed during the transfer, for sure. Other than that, just 90-100% io usage for a few hours. I might want to mention that the dataset is around 800GB of size. It contains tens of millions of small files (maildir storage) and about 100 snapshots.

Am I pushing ZFS too much ?

[822539.204672] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.3.V1 02                                                         /04/2020
[822539.204756] Call Trace:
[822539.204813]  dump_stack+0x6d/0x8b
[822539.204874]  spl_dumpstack+0x29/0x2b [spl]
[822539.204931]  spl_panic+0xd3/0xfb [spl]
[822539.205027]  ? zfs_btree_insert_leaf_impl.isra.9+0x4c/0x60 [zfs]
[822539.205121]  ? abd_iter_advance+0x2d/0x60 [zfs]
[822539.205200]  ? zfs_btree_find_parent_idx+0x80/0xd0 [zfs]
[822539.205291]  ? zfs_btree_find+0x160/0x2d0 [zfs]
[822539.205381]  range_tree_remove_impl+0x67e/0x1000 [zfs]
[822539.205469]  ? metaslab_df_alloc+0xc6/0x5e0 [zfs]
[822539.205527]  ? ktime_get_raw_ts64+0x38/0xd0
[822539.205614]  ? multilist_sublist_unlock+0x2b/0x40 [zfs]
[822539.205701]  range_tree_remove+0x10/0x20 [zfs]
[822539.205787]  metaslab_alloc_dva+0x264/0x1170 [zfs]
[822539.205873]  metaslab_alloc+0xb4/0x240 [zfs]
[822539.205963]  zio_dva_allocate+0xd5/0x820 [zfs]
[822539.206019]  ? _cond_resched+0x19/0x30
[822539.206074]  ? mutex_lock+0x12/0x30
[822539.206169]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822539.206230]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822539.206326]  zio_execute+0x99/0xf0 [zfs]
[822539.206382]  taskq_thread+0x2f7/0x4e0 [spl]
[822539.206438]  ? wake_up_q+0x80/0x80
[822539.206523]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822539.206584]  kthread+0x120/0x140
[822539.206640]  ? task_done+0xb0/0xb0 [spl]
[822539.206694]  ? kthread_park+0x90/0x90
[822539.206747]  ret_from_fork+0x35/0x40
[822765.000993] INFO: task z_wr_iss:761 blocked for more than 120 seconds.
[822765.001055]       Tainted: P           O      5.4.114-1-pve #1
[822765.001111] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.001185] z_wr_iss        D    0   761      2 0x80004000
[822765.001242] Call Trace:
[822765.001299]  __schedule+0x2e6/0x700
[822765.001352]  schedule+0x33/0xa0
[822765.001404]  schedule_preempt_disabled+0xe/0x10
[822765.001459]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.001513]  __mutex_lock_slowpath+0x13/0x20
[822765.001567]  mutex_lock+0x2c/0x30
[822765.001672]  metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.001759]  metaslab_alloc+0xb4/0x240 [zfs]
[822765.001847]  zio_dva_allocate+0xd5/0x820 [zfs]
[822765.001934]  ? zio_crypt_do_indirect_mac_checksum_abd+0x5d/0x70 [zfs]
[822765.001993]  ? _cond_resched+0x19/0x30
[822765.002046]  ? mutex_lock+0x12/0x30
[822765.002129]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.002190]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.002279]  zio_execute+0x99/0xf0 [zfs]
[822765.002335]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.002391]  ? wake_up_q+0x80/0x80
[822765.002477]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.002536]  kthread+0x120/0x140
[822765.002590]  ? task_done+0xb0/0xb0 [spl]
[822765.002644]  ? kthread_park+0x90/0x90
[822765.002697]  ret_from_fork+0x35/0x40
[822765.002751] INFO: task z_wr_iss:762 blocked for more than 120 seconds.
[822765.002809]       Tainted: P           O      5.4.114-1-pve #1
[822765.002865] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.002938] z_wr_iss        D    0   762      2 0x80004000
[822765.002995] Call Trace:
[822765.003046]  __schedule+0x2e6/0x700
[822765.003099]  schedule+0x33/0xa0
[822765.003151]  schedule_preempt_disabled+0xe/0x10
[822765.003205]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.003262]  __mutex_lock_slowpath+0x13/0x20
[822765.003317]  mutex_lock+0x2c/0x30
[822765.003400]  metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.003485]  metaslab_alloc+0xb4/0x240 [zfs]
[822765.003573]  zio_dva_allocate+0xd5/0x820 [zfs]
[822765.003628]  ? _cond_resched+0x19/0x30
[822765.003681]  ? mutex_lock+0x12/0x30
[822765.003763]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.003822]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.003911]  zio_execute+0x99/0xf0 [zfs]
[822765.003966]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.004021]  ? wake_up_q+0x80/0x80
[822765.004107]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.004165]  kthread+0x120/0x140
[822765.004219]  ? task_done+0xb0/0xb0 [spl]
[822765.004273]  ? kthread_park+0x90/0x90
[822765.004326]  ret_from_fork+0x35/0x40
[822765.004380] INFO: task z_wr_iss:763 blocked for more than 120 seconds.
[822765.004437]       Tainted: P           O      5.4.114-1-pve #1
[822765.004493] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.004568] z_wr_iss        D    0   763      2 0x80004000
[822765.004624] Call Trace:
[822765.004676]  __schedule+0x2e6/0x700
[822765.004728]  schedule+0x33/0xa0
[822765.004781]  schedule_preempt_disabled+0xe/0x10
[822765.004836]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.004890]  __mutex_lock_slowpath+0x13/0x20
[822765.004944]  mutex_lock+0x2c/0x30
[822765.005103]  metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.005206]  metaslab_alloc+0xb4/0x240 [zfs]
[822765.005305]  zio_dva_allocate+0xd5/0x820 [zfs]
[822765.005406]  ? zio_crypt_do_indirect_mac_checksum_abd+0x5d/0x70 [zfs]
[822765.005468]  ? _cond_resched+0x19/0x30
[822765.005525]  ? mutex_lock+0x12/0x30
[822765.005623]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.005688]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.005791]  zio_execute+0x99/0xf0 [zfs]
[822765.005852]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.005910]  ? wake_up_q+0x80/0x80
[822765.006010]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.006072]  kthread+0x120/0x140
[822765.006132]  ? task_done+0xb0/0xb0 [spl]
[822765.006188]  ? kthread_park+0x90/0x90
[822765.006245]  ret_from_fork+0x35/0x40
[822765.006302] INFO: task z_wr_iss:764 blocked for more than 120 seconds.
[822765.006363]       Tainted: P           O      5.4.114-1-pve #1
[822765.006422] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.006498] z_wr_iss        D    0   764      2 0x80004000
[822765.006558] Call Trace:
[822765.006613]  __schedule+0x2e6/0x700
[822765.006670]  schedule+0x33/0xa0
[822765.006726]  schedule_preempt_disabled+0xe/0x10
[822765.006784]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.006843]  __mutex_lock_slowpath+0x13/0x20
[822765.006900]  mutex_lock+0x2c/0x30
[822765.006996]  metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.007097]  metaslab_alloc+0xb4/0x240 [zfs]
[822765.007201]  zio_dva_allocate+0xd5/0x820 [zfs]
[822765.007259]  ? _cond_resched+0x19/0x30
[822765.007316]  ? mutex_lock+0x12/0x30
[822765.007412]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.007475]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.007579]  zio_execute+0x99/0xf0 [zfs]
[822765.012866]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.012929]  ? wake_up_q+0x80/0x80
[822765.013056]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.013118]  kthread+0x120/0x140
[822765.013177]  ? task_done+0xb0/0xb0 [spl]
[822765.013234]  ? kthread_park+0x90/0x90
[822765.013292]  ret_from_fork+0x35/0x40
[822765.013346] INFO: task z_wr_iss:765 blocked for more than 120 seconds.
[822765.013404]       Tainted: P           O      5.4.114-1-pve #1
[822765.013460] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.013534] z_wr_iss        D    0   765      2 0x80004000
[822765.013590] Call Trace:
[822765.013642]  __schedule+0x2e6/0x700
[822765.013695]  schedule+0x33/0xa0
[822765.013747]  schedule_preempt_disabled+0xe/0x10
[822765.013802]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.013856]  __mutex_lock_slowpath+0x13/0x20
[822765.013910]  mutex_lock+0x2c/0x30
[822765.013994]  metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.014081]  metaslab_alloc+0xb4/0x240 [zfs]
[822765.014169]  zio_dva_allocate+0xd5/0x820 [zfs]
[822765.014224]  ? _cond_resched+0x19/0x30
[822765.014277]  ? mutex_lock+0x12/0x30
[822765.014360]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.014419]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.014508]  zio_execute+0x99/0xf0 [zfs]
[822765.014563]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.014618]  ? wake_up_q+0x80/0x80
[822765.014704]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.014763]  kthread+0x120/0x140
[822765.014817]  ? task_done+0xb0/0xb0 [spl]
[822765.014871]  ? kthread_park+0x90/0x90
[822765.014924]  ret_from_fork+0x35/0x40
[822765.014978] INFO: task z_wr_iss:766 blocked for more than 120 seconds.
[822765.015036]       Tainted: P           O      5.4.114-1-pve #1
[822765.015092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.015165] z_wr_iss        D    0   766      2 0x80004000
[822765.015222] Call Trace:
[822765.015273]  __schedule+0x2e6/0x700
[822765.015326]  schedule+0x33/0xa0
[822765.015380]  spl_panic+0xf9/0xfb [spl]
[822765.015467]  ? abd_iter_advance+0x2d/0x60 [zfs]
[822765.015544]  ? zfs_btree_find_parent_idx+0x80/0xd0 [zfs]
[822765.015624]  ? zfs_btree_find+0x160/0x2d0 [zfs]
[822765.015709]  range_tree_remove_impl+0x67e/0x1000 [zfs]
[822765.015794]  ? metaslab_df_alloc+0xc6/0x5e0 [zfs]
[822765.015850]  ? ktime_get_raw_ts64+0x38/0xd0
[822765.015934]  ? multilist_sublist_unlock+0x2b/0x40 [zfs]
[822765.016020]  range_tree_remove+0x10/0x20 [zfs]
[822765.016104]  metaslab_alloc_dva+0x264/0x1170 [zfs]
[822765.016201]  metaslab_alloc+0xb4/0x240 [zfs]
[822765.016289]  zio_dva_allocate+0xd5/0x820 [zfs]
[822765.016345]  ? _cond_resched+0x19/0x30
[822765.016398]  ? mutex_lock+0x12/0x30
[822765.016485]  ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.016548]  ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.016638]  zio_execute+0x99/0xf0 [zfs]
[822765.016694]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.016749]  ? wake_up_q+0x80/0x80
[822765.016835]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.016893]  kthread+0x120/0x140
[822765.016948]  ? task_done+0xb0/0xb0 [spl]
[822765.017037]  ? kthread_park+0x90/0x90
[822765.017096]  ret_from_fork+0x35/0x40
[822765.017156] INFO: task z_wr_int:769 blocked for more than 120 seconds.
[822765.017216]       Tainted: P           O      5.4.114-1-pve #1
[822765.017282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.017359] z_wr_int        D    0   769      2 0x80004000
[822765.017420] Call Trace:
[822765.017476]  __schedule+0x2e6/0x700
[822765.017533]  schedule+0x33/0xa0
[822765.017589]  schedule_preempt_disabled+0xe/0x10
[822765.017647]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.017706]  __mutex_lock_slowpath+0x13/0x20
[822765.017763]  mutex_lock+0x2c/0x30
[822765.017869]  metaslab_free_concrete+0xb3/0x240 [zfs]
[822765.017970]  metaslab_free_impl+0xad/0xe0 [zfs]
[822765.018070]  metaslab_free_dva+0x5e/0x80 [zfs]
[822765.018170]  metaslab_free+0x196/0x1c0 [zfs]
[822765.018270]  zio_free_sync+0xe4/0x100 [zfs]
[822765.018374]  zio_free+0xb9/0xf0 [zfs]
[822765.018472]  dsl_free+0x11/0x20 [zfs]
[822765.018572]  dsl_dataset_block_kill+0x2a9/0x4f0 [zfs]
[822765.018665]  dbuf_write_done+0x1ff/0x220 [zfs]
[822765.018752]  arc_write_done+0x8f/0x410 [zfs]
[822765.018855]  zio_done+0x43f/0x1020 [zfs]
[822765.018958]  zio_execute+0x99/0xf0 [zfs]
[822765.019019]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.019078]  ? wake_up_q+0x80/0x80
[822765.019178]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.019240]  kthread+0x120/0x140
[822765.019299]  ? task_done+0xb0/0xb0 [spl]
[822765.019356]  ? kthread_park+0x90/0x90
[822765.019412]  ret_from_fork+0x35/0x40
[822765.019470] INFO: task z_wr_int:772 blocked for more than 120 seconds.
[822765.019530]       Tainted: P           O      5.4.114-1-pve #1
[822765.019589] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.019666] z_wr_int        D    0   772      2 0x80004000
[822765.019725] Call Trace:
[822765.019781]  __schedule+0x2e6/0x700
[822765.019838]  schedule+0x33/0xa0
[822765.019893]  schedule_preempt_disabled+0xe/0x10
[822765.019952]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.020011]  __mutex_lock_slowpath+0x13/0x20
[822765.020068]  mutex_lock+0x2c/0x30
[822765.020163]  metaslab_free_concrete+0xb3/0x240 [zfs]
[822765.020264]  metaslab_free_impl+0xad/0xe0 [zfs]
[822765.020365]  metaslab_free_dva+0x5e/0x80 [zfs]
[822765.020464]  metaslab_free+0x196/0x1c0 [zfs]
[822765.020567]  zio_free_sync+0xe4/0x100 [zfs]
[822765.020671]  zio_free+0xb9/0xf0 [zfs]
[822765.020768]  dsl_free+0x11/0x20 [zfs]
[822765.020863]  dsl_dataset_block_kill+0x2a9/0x4f0 [zfs]
[822765.020956]  dbuf_write_done+0x1ff/0x220 [zfs]
[822765.021076]  arc_write_done+0x8f/0x410 [zfs]
[822765.021180]  zio_done+0x43f/0x1020 [zfs]
[822765.021283]  zio_execute+0x99/0xf0 [zfs]
[822765.021344]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.021403]  ? wake_up_q+0x80/0x80
[822765.021504]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.021565]  kthread+0x120/0x140
[822765.021624]  ? task_done+0xb0/0xb0 [spl]
[822765.021681]  ? kthread_park+0x90/0x90
[822765.021738]  ret_from_fork+0x35/0x40
[822765.021797] INFO: task z_wr_int:775 blocked for more than 120 seconds.
[822765.021858]       Tainted: P           O      5.4.114-1-pve #1
[822765.021917] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.021993] z_wr_int        D    0   775      2 0x80004000
[822765.022053] Call Trace:
[822765.022108]  __schedule+0x2e6/0x700
[822765.022164]  schedule+0x33/0xa0
[822765.022220]  schedule_preempt_disabled+0xe/0x10
[822765.022278]  __mutex_lock.isra.10+0x2c9/0x4c0
[822765.022337]  __mutex_lock_slowpath+0x13/0x20
[822765.022394]  mutex_lock+0x2c/0x30
[822765.022492]  metaslab_free_concrete+0xb3/0x240 [zfs]
[822765.022590]  metaslab_free_impl+0xad/0xe0 [zfs]
[822765.022687]  metaslab_free_dva+0x5e/0x80 [zfs]
[822765.022787]  metaslab_free+0x196/0x1c0 [zfs]
[822765.022890]  zio_free_sync+0xe4/0x100 [zfs]
[822765.022993]  zio_free+0xb9/0xf0 [zfs]
[822765.023090]  dsl_free+0x11/0x20 [zfs]
[822765.023182]  dsl_dataset_block_kill+0x2a9/0x4f0 [zfs]
[822765.023276]  dbuf_write_done+0x1ff/0x220 [zfs]
[822765.023366]  arc_write_done+0x8f/0x410 [zfs]
[822765.023469]  zio_done+0x43f/0x1020 [zfs]
[822765.023572]  zio_execute+0x99/0xf0 [zfs]
[822765.023632]  taskq_thread+0x2f7/0x4e0 [spl]
[822765.023691]  ? wake_up_q+0x80/0x80
[822765.023791]  ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.023853]  kthread+0x120/0x140
[822765.023913]  ? task_done+0xb0/0xb0 [spl]
[822765.023969]  ? kthread_park+0x90/0x90
[822765.024026]  ret_from_fork+0x35/0x40
[822765.024089] INFO: task txg_sync:1350 blocked for more than 120 seconds.
[822765.024150]       Tainted: P           O      5.4.114-1-pve #1
[822765.024209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.024286] txg_sync        D    0  1350      2 0x80004000
[822765.024346] Call Trace:
[822765.024401]  __schedule+0x2e6/0x700
[822765.024458]  schedule+0x33/0xa0
[822765.024514]  schedule_timeout+0x152/0x330
[822765.024573]  ? __next_timer_interrupt+0xd0/0xd0
[822765.024631]  io_schedule_timeout+0x1e/0x50
[822765.024690]  __cv_timedwait_common+0x138/0x170 [spl]
[822765.024750]  ? wait_woken+0x80/0x80
[822765.024808]  __cv_timedwait_io+0x19/0x20 [spl]
[822765.024912]  zio_wait+0x139/0x280 [zfs]
[822765.025009]  ? _cond_resched+0x19/0x30
[822765.025109]  dsl_pool_sync+0xdc/0x510 [zfs]
[822765.025211]  spa_sync+0x5a4/0xfe0 [zfs]
[822765.025268]  ? mutex_lock+0x12/0x30
[822765.025368]  ? spa_txg_history_init_io+0x104/0x110 [zfs]
[822765.025472]  txg_sync_thread+0x2e1/0x4a0 [zfs]
[822765.030783]  ? txg_thread_exit.isra.13+0x60/0x60 [zfs]
[822765.030846]  thread_generic_wrapper+0x74/0x90 [spl]
[822765.030905]  kthread+0x120/0x140
[822765.030960]  ? __thread_exit+0x20/0x20 [spl]
[822765.031014]  ? kthread_park+0x90/0x90
[822765.031068]  ret_from_fork+0x35/0x40

pquan avatar Jun 06 '21 14:06 pquan

In general: no, you're not pushing it too much. I was passing around a 2 TB snapshot stream while I was trying to reproduce this with a bunch of gzip-9 sends on loop.

It would be useful if you could share the lines immediately preceding what you pasted, as they should include the specific panic message (which I'm guessing is different, since the stack is).

I have been periodically spending time considering what I'd like to know for this and how to get it; I know the former but not the latter. (What I'd like to know, at the moment, is where the NULLs that are being dereferenced in the initial panics are coming from; short of somewhat invasive patching, I don't see a good way to answer that question.)

Of course, this isn't helped by my not being at all familiar with these codepaths. Someone more familiar might be able to pull out an answer much quicker.

Here's another question - did you turn on encryption while your ZFS version was 0.8.X? I just noticed both your sending and receiving pool are on the feature flags from there, and am wondering if the reason I can't reproduce it is that my pool only had encryption turned on recently...

(One more experiment you could play with, if you wish, would be setting zfs_abd_scatter_enabled to 0 - it may help or hurt performance to doso, but it's just a thought; it's possible it won't help at all.)

rincebrain avatar Jun 06 '21 15:06 rincebrain