zfs
zfs copied to clipboard
ZFS on Linux null pointer dereference
System information
Type | Version/Name |
---|---|
Distribution Name | Debian |
Distribution Version | 10 buster |
Linux Kernel | 4.19.0-14 (SMP Debian 4.19.171-2) |
Architecture | amd64 |
ZFS Version | 2.0.3-1~bpo10+1 |
SPL Version | 2.0.3-1~bpo10+1 |
Describe the problem you're observing
When I start sending raw ZFS snapshots to a different system, my Linux systen (4.19.0-14-amd64) starts to hang completely. I can ping it, I can start a very commands (such as dmesg) but most commands hang (incl zfs, zpool, htop, ps, ...). The entire systems hangs completely.
Dmesg shows the following entries at the time of the occurance:
[ 2293.134071] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 2293.149707] PGD 0 P4D 0
[ 2293.154752] Oops: 0000 [#1] SMP PTI
[ 2293.161701] CPU: 1 PID: 12576 Comm: receive_writer Tainted: P OE 4.19.0-14-amd64 #1 Debian 4.19.171-2
[ 2293.182517] Hardware name: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0a 12/21/2015
[ 2293.196819] RIP: 0010:abd_verify+0x5/0x60 [zfs]
[ 2293.205865] Code: 0f 1f 44 00 00 0f 1f 44 00 00 8b 07 c1 e8 05 83 e0 01 c3 66 90 0f 1f 44 00 00 8b 07 c1 e8 06 83 e0 01 c3 66 90 0f 1f 44 00 00 <8b> 07 a8 01 74 01 c3 a8 40 74 43 41 54 4c 8d 67 68 55 53 48 8b 47
[ 2293.243325] RSP: 0018:ffffb12e4b6d7a28 EFLAGS: 00010246
[ 2293.253741] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2293.267974] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[ 2293.282205] RBP: 0000000000004000 R08: ffff935ec10b70b0 R09: 0000000000000000
[ 2293.296434] R10: 0000000000007130 R11: ffff935d75f984e0 R12: 0000000000004000
[ 2293.310664] R13: 0000000000000000 R14: ffffffffc0fea550 R15: 0000000000000020
[ 2293.324900] FS: 0000000000000000(0000) GS:ffff935ecfb00000(0000) knlGS:0000000000000000
[ 2293.341053] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2293.352510] CR2: 0000000000000000 CR3: 000000001340a001 CR4: 00000000000606e0
[ 2293.366743] Call Trace:
[ 2293.371704] abd_borrow_buf+0x12/0x40 [zfs]
[ 2293.380104] abd_borrow_buf_copy+0x28/0x70 [zfs]
[ 2293.389377] zio_crypt_copy_dnode_bonus+0x36/0x130 [zfs]
[ 2293.400041] arc_buf_fill+0x3ff/0xb60 [zfs]
[ 2293.408449] ? zfs_btree_add_idx+0xd0/0x200 [zfs]
[ 2293.417889] arc_untransform+0x1c/0x70 [zfs]
[ 2293.426461] dbuf_read_verify_dnode_crypt+0xec/0x160 [zfs]
[ 2293.437466] dbuf_read_impl.constprop.29+0x4ad/0x6b0 [zfs]
[ 2293.448423] ? kmem_cache_alloc+0x167/0x1d0
[ 2293.456776] ? __cv_init+0x3d/0x60 [spl]
[ 2293.464671] ? dbuf_cons+0xa7/0xc0 [zfs]
[ 2293.472497] ? spl_kmem_cache_alloc+0x108/0x7a0 [spl]
[ 2293.482583] ? _cond_resched+0x15/0x30
[ 2293.490071] ? _cond_resched+0x15/0x30
[ 2293.497542] ? mutex_lock+0xe/0x30
[ 2293.504402] ? aggsum_add+0x17a/0x190 [zfs]
[ 2293.512810] dbuf_read+0x1b2/0x520 [zfs]
[ 2293.520672] ? dnode_hold_impl+0x350/0xc20 [zfs]
[ 2293.529904] dmu_bonus_hold_by_dnode+0x126/0x1a0 [zfs]
[ 2293.540186] receive_object+0x403/0xc70 [zfs]
[ 2293.548906] ? receive_freeobjects.isra.10+0x9d/0x120 [zfs]
[ 2293.560049] receive_writer_thread+0x279/0xa00 [zfs]
[ 2293.569962] ? set_curr_task_fair+0x26/0x50
[ 2293.578319] ? receive_process_write_record+0x190/0x190 [zfs]
[ 2293.589793] ? __thread_exit+0x20/0x20 [spl]
[ 2293.598317] ? thread_generic_wrapper+0x6f/0x80 [spl]
[ 2293.608410] ? receive_process_write_record+0x190/0x190 [zfs]
[ 2293.619882] thread_generic_wrapper+0x6f/0x80 [spl]
[ 2293.629609] kthread+0x112/0x130
[ 2293.636053] ? kthread_bind+0x30/0x30
[ 2293.643351] ret_from_fork+0x35/0x40
[ 2293.650473] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter veth pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) nf_tables nfnetlink vboxdrv(OE) bridge binfmt_misc zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) spl(OE) intel_rapl x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm irqbypass crct10dif_pclmul ib_iser joydev crc32_pclmul rdma_cm ghash_clmulni_intel iw_cm intel_cstate ib_cm intel_uncore ib_core intel_rapl_perf configfs ipmi_si sg ipmi_devintf iTCO_wdt iTCO_vendor_support pcc_cpufreq intel_pch_thermal iscsi_tcp ipmi_msghandler libiscsi_tcp libiscsi evdev scsi_transport_iscsi pcspkr tun nfsd auth_rpcgss nfs_acl lockd grace sunrpc lm85 dme1737 hwmon_vid iptable_nat ipt_MASQUERADE nf_nat_ipv4 nf_nat nf_conntrack
[ 2293.793008] nf_defrag_ipv6 nf_defrag_ipv4 fuse loop 8021q garp stp mrp llc ecryptfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd glue_helper aes_x86_64 raid10 uas usb_storage raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid0 multipath linear hid_generic raid1 usbhid hid md_mod sd_mod ast ahci ttm libahci libata drm_kms_helper drm crc32c_intel igb i2c_i801 dca i2c_algo_bit scsi_mod lpc_ich mfd_core e1000e xhci_pci ehci_pci xhci_hcd ehci_hcd usbcore usb_common thermal fan video button
[ 2293.895677] CR2: 0000000000000000
[ 2293.902280] ---[ end trace 164c64ca87be80af ]---
[ 2294.020926] RIP: 0010:abd_verify+0x5/0x60 [zfs]
[ 2294.029975] Code: 0f 1f 44 00 00 0f 1f 44 00 00 8b 07 c1 e8 05 83 e0 01 c3 66 90 0f 1f 44 00 00 8b 07 c1 e8 06 83 e0 01 c3 66 90 0f 1f 44 00 00 <8b> 07 a8 01 74 01 c3 a8 40 74 43 41 54 4c 8d 67 68 55 53 48 8b 47
[ 2294.067433] RSP: 0018:ffffb12e4b6d7a28 EFLAGS: 00010246
[ 2294.077850] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 2294.092082] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[ 2294.106312] RBP: 0000000000004000 R08: ffff935ec10b70b0 R09: 0000000000000000
[ 2294.120542] R10: 0000000000007130 R11: ffff935d75f984e0 R12: 0000000000004000
[ 2294.134774] R13: 0000000000000000 R14: ffffffffc0fea550 R15: 0000000000000020
[ 2294.149006] FS: 0000000000000000(0000) GS:ffff935ecfb00000(0000) knlGS:0000000000000000
[ 2294.165144] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2294.176600] CR2: 0000000000000000 CR3: 000000001340a001 CR4: 00000000000606e0
Interestingly, the transfer continues happily but just everything else in the system hangs. The only way to recover is resetting the machine (since not even reboot works).
Describe how to reproduce the problem
It's a tough one. It seems to me that the issue might be load related in some sense since it only occurs if I have two zfs send's (via syncoid) running in parallel that have to do with encrypted datasets.
Transfer 1
The first one sends datasets from an unecrypted dataset into an encrypted one (I migrate to encryption).
I use syncoid and use the command:
syncoid -r --skip-parent --no-sync-snap zpradix1imain/sys/vz zpradix1imain/sys/vz_enc
This translates into
zfs send -I 'zpradix1imain/sys/vz/main'@'zfs-auto-snap_hourly-2021-03-02-1917' 'zpradix1imain/sys/vz/main'@'zfs-auto-snap_frequent-2021-03-02-1932' | mbuffer -q -s 128k -m 16M 2>/dev/null | pv -s 16392592 | zfs receive -s -F 'zpradix1imain/sys/vz_enc/main'
Transfer 2
I transfer data from an encrypted dataset raw to a secondary server. The syncoid command is:
syncoid -r --skip-parent --no-sync-snap --sendoptions=w --exclude=zfs-auto-snap_hourly --exclude=zfs-auto-snap_frequent zpradix1imain/data [email protected]:zpzetta/radix/data
This translates into:
zfs send -w 'zpradix1imain/data/home'@'vicari-prev' | pv -s 179222507064 | lzop | mbuffer -q -s 128k -m 16M 2>/dev/null | ssh ...
In summary:
- Both Transfer 1 and Transfer 2 have to be running in parallel
- The issue only appears a few minutes/seconds after I started the transfer
I am experiencing the same or similar issue. A hang-up during zfs send/receive of encrypted filesystem. The two filesystems are encrypted with different keys. I'm also using syncoid. My stack trace is slightly different, but the null pointer dereference is in the same region of abd_borrow_buf_copy
My system is a standard proxmox install 6.3
Os: Debian Buster 10 Kernel: 5.4.78-2-pve ZFS Module Version: v0.8.5-pve1 Pool: 4x2 two-drive mirrors of 2TB and 4TB vdev devices. zpool scrub returns no errors.
[54105.343179] BUG: kernel NULL pointer dereference, address: 0000000000000000
[54105.343185] #PF: supervisor read access in kernel mode
[54105.343187] #PF: error_code(0x0000) - not-present page
[54105.343188] PGD 0 P4D 0
[54105.343190] Oops: 0000 [#1] SMP PTI
[54105.343193] CPU: 1 PID: 13831 Comm: receive_writer Tainted: P O 5.4.78-2-pve #1
[54105.343195] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H81 Pro BTC R2.0, BIOS P1.40 03/09/2018
[54105.343253] RIP: 0010:abd_borrow_buf+0x13/0x70 [zfs]
[54105.343255] Code: 49 8b 44 24 18 41 5c 41 5d 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 f5 41 54 53 <44> 8b 27 48 89 fb 41 83 e4 01 75 30 44 8b 77 1c 48 8b 47 20 4d 85
[54105.343258] RSP: 0018:ffff96ae063a39d8 EFLAGS: 00010246
[54105.343260] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[54105.343261] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[54105.343263] RBP: ffff96ae063a39f8 R08: 00000000000000a5 R09: 0000000000000000
[54105.343264] R10: 0000000000000706 R11: 0000000000000000 R12: 0000000000000000
[54105.343265] R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc07ee470
[54105.343267] FS: 0000000000000000(0000) GS:ffff88c9af500000(0000) knlGS:0000000000000000
[54105.343269] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54105.343271] CR2: 0000000000000000 CR3: 000000001980a004 CR4: 00000000000606e0
[54105.343272] Call Trace:
[54105.343292] abd_borrow_buf_copy+0x2c/0x80 [zfs]
[54105.343323] zio_crypt_copy_dnode_bonus+0x31/0x140 [zfs]
[54105.343327] ? preempt_schedule_common+0xe/0x20
[54105.343345] arc_buf_fill+0x402/0xb60 [zfs]
[54105.343364] arc_untransform+0x21/0x90 [zfs]
[54105.343383] dbuf_read_verify_dnode_crypt+0xee/0x160 [zfs]
[54105.343402] dbuf_read+0x19d/0xb80 [zfs]
[54105.343425] dmu_tx_check_ioerr+0x68/0xd0 [zfs]
[54105.343448] dmu_tx_hold_free_impl+0x132/0x250 [zfs]
[54105.343470] dmu_tx_hold_free+0x3c/0x50 [zfs]
[54105.343491] dmu_free_long_range+0x234/0x4b0 [zfs]
[54105.343511] dmu_free_long_object+0x26/0xc0 [zfs]
[54105.343533] receive_freeobjects.isra.10+0x7a/0x100 [zfs]
[54105.343555] receive_writer_thread+0x43c/0xb80 [zfs]
[54105.343558] ? set_next_task_fair+0x73/0xb0
[54105.343565] ? spl_kmem_free+0x33/0x40 [spl]
[54105.343586] ? receive_read_prefetch+0x140/0x140 [zfs]
[54105.343591] thread_generic_wrapper+0x74/0x90 [spl]
[54105.343613] ? receive_read_prefetch+0x140/0x140 [zfs]
[54105.343617] ? thread_generic_wrapper+0x74/0x90 [spl]
[54105.343620] kthread+0x120/0x140
[54105.343624] ? __thread_exit+0x20/0x20 [spl]
[54105.343626] ? kthread_park+0x90/0x90
[54105.343629] ret_from_fork+0x35/0x40
[54105.343631] Modules linked in: ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp sch_ingress iptable_filter bpfilter bonding 8021q garp mrp openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 softdog nfnetlink_log nfnetlink intel_rapl_msr mei_hdcp ppdev intel_rapl_common x86_pkg_temp_thermal intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cryptd rapl intel_cstate snd_hda_codec_hdmi pcspkr i915 drm_kms_helper snd_hda_codec_realtek mei_me snd_hda_codec_generic ledtrig_audio mei drm snd_hda_intel snd_intel_dspcfg snd_hda_codec i2c_algo_bit snd_hda_core fb_sys_fops syscopyarea sysfillrect snd_hwdep sysimgblt snd_pcm snd_timer snd soundcore parport_pc parport mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nct6775 hwmon_vid coretemp sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO)
[54105.343657] zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq libcrc32c r8169 realtek e1000e i2c_i801 lpc_ich ahci libahci ehci_pci ehci_hcd xhci_pci xhci_hcd mpt3sas raid_class scsi_transport_sas video
[54105.343675] CR2: 0000000000000000
[54105.343677] ---[ end trace 61f4d52a93ce31c9 ]---
[54105.343695] RIP: 0010:abd_borrow_buf+0x13/0x70 [zfs]
[54105.343696] Code: 49 8b 44 24 18 41 5c 41 5d 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 49 89 f5 41 54 53 <44> 8b 27 48 89 fb 41 83 e4 01 75 30 44 8b 77 1c 48 8b 47 20 4d 85
[54105.343699] RSP: 0018:ffff96ae063a39d8 EFLAGS: 00010246
[54105.343700] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[54105.343702] RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
[54105.343703] RBP: ffff96ae063a39f8 R08: 00000000000000a5 R09: 0000000000000000
[54105.343705] R10: 0000000000000706 R11: 0000000000000000 R12: 0000000000000000
[54105.343706] R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc07ee470
[54105.343708] FS: 0000000000000000(0000) GS:ffff88c9af500000(0000) knlGS:0000000000000000
[54105.343709] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[54105.343710] CR2: 0000000000000000 CR3: 000000001980a004 CR4: 00000000000606e0
[89614.646434] hrtimer: interrupt took 14015 ns
Similar hung system, on the same distribution. The problem has been around ever since reinstalling the system and enabling encryption with ZFS 0.8.5, and persists with 5.4.114-1-pve / 2.0.4-pve1
dmesg
kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page
kernel: PGD 0 P4D 0
kernel: Oops: 0000 [#1] SMP PTI
kernel: CPU: 1 PID: 23147 Comm: receive_writer Tainted: P O 5.4.114-1-pve #1
kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Q2900-ITX, BIOS P1.60 03/01/2018
kernel: RIP: 0010:abd_verify+0x5/0x70 [zfs]
kernel: Code: 00 00 00 66 90 0f 1f 44 00 00 55 8b 07 c1 e8 06 48 89 e5 83 e0 01 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 <8b> 07 a8 01 75 4f 55 48 89 e5 41 55 41 54 49 89 fc 53 a8 40 74 40
kernel: RSP: 0018:ffffbd418d5b7930 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
kernel: RBP: ffffbd418d5b7948 R08: 0000000000000037 R09: 0000000000000000
kernel: R10: 000000000000b1e5 R11: 0000000000000000 R12: 0000000000004000
kernel: R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc06b19d0
kernel: FS: 0000000000000000(0000) GS:ffff97e2efe80000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000000 CR3: 00000003150e4000 CR4: 00000000001026e0
kernel: Call Trace:
kernel: ? abd_borrow_buf+0x17/0x40 [zfs]
kernel: abd_borrow_buf_copy+0x2c/0x80 [zfs]
kernel: zio_crypt_copy_dnode_bonus+0x31/0x140 [zfs]
kernel: ? preempt_schedule_common+0xe/0x20
kernel: arc_buf_fill+0x402/0xb40 [zfs]
kernel: arc_untransform+0x21/0x80 [zfs]
kernel: dbuf_read_verify_dnode_crypt+0xee/0x160 [zfs]
kernel: ? __cv_init+0x42/0x60 [spl]
kernel: dbuf_read_impl.constprop.33+0x10c/0x6e0 [zfs]
kernel: ? arc_space_consume+0x4f/0x120 [zfs]
kernel: ? dbuf_create+0x3e8/0x580 [zfs]
kernel: ? mutex_lock+0x12/0x30
kernel: ? dbuf_find+0x111/0x1d0 [zfs]
kernel: ? _cond_resched+0x19/0x30
kernel: ? down_read+0x12/0xa0
kernel: dbuf_read+0x1b2/0x510 [zfs]
kernel: dmu_tx_check_ioerr+0x68/0xd0 [zfs]
kernel: dmu_tx_hold_free_impl+0x132/0x250 [zfs]
kernel: dmu_tx_hold_free+0x3c/0x50 [zfs]
kernel: dmu_free_long_range+0x22a/0x4d0 [zfs]
kernel: dmu_free_long_object+0x26/0xc0 [zfs]
kernel: receive_freeobjects.isra.15+0x8d/0x120 [zfs]
kernel: receive_writer_thread+0x46a/0xa00 [zfs]
kernel: ? spl_kmem_free+0x33/0x40 [spl]
kernel: ? receive_process_write_record+0x180/0x180 [zfs]
kernel: thread_generic_wrapper+0x74/0x90 [spl]
kernel: ? receive_process_write_record+0x180/0x180 [zfs]
kernel: ? thread_generic_wrapper+0x74/0x90 [spl]
kernel: kthread+0x120/0x140
kernel: ? __thread_exit+0x20/0x20 [spl]
kernel: ? kthread_park+0x90/0x90
kernel: ret_from_fork+0x35/0x40
kernel: Modules linked in: xt_multiport binfmt_misc veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter softdog nfnetlink_log nfnetlink snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio intel_rapl_msr intel_rapl_common intel_soc_dts_thermal intel_soc_dts_iosf intel_powerclamp i915 kvm_intel snd_hda_intel drm_kms_helper snd_intel_dspcfg kvm snd_hda_codec hci_uart irqbypass snd_hda_core punit_atom_debug btqca btrtl btbcm drm btintel crct10dif_pclmul i2c_algo_bit bluetooth crc32_pclmul fb_sys_fops syscopyarea sysfillrect ghash_clmulni_intel mei_hdcp snd_hwdep snd_pcm sysimgblt mei_txe ppdev ecdh_generic cryptd snd_timer ecc mei snd pcspkr parport_pc soundcore intel_cstate parport rfkill_gpio mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nct6775 hwmon_vid sunrpc coretemp ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O)
kernel: zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress hid_generic raid6_pq libcrc32c usbhid r8169 ahci xhci_pci libahci realtek i2c_i801 lpc_ich xhci_hcd video i2c_hid hid
kernel: CR2: 0000000000000000
kernel: ---[ end trace 737a4d6fa99905a7 ]---
kernel: RIP: 0010:abd_verify+0x5/0x70 [zfs]
kernel: Code: 00 00 00 66 90 0f 1f 44 00 00 55 8b 07 c1 e8 06 48 89 e5 83 e0 01 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 0f 1f 44 00 00 <8b> 07 a8 01 75 4f 55 48 89 e5 41 55 41 54 49 89 fc 53 a8 40 74 40
:
kernel: RSP: 0018:ffffbd418d5b7930 EFLAGS: 00010246
kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
kernel: RDX: 0000000000004000 RSI: 0000000000004000 RDI: 0000000000000000
kernel: RBP: ffffbd418d5b7948 R08: 0000000000000037 R09: 0000000000000000
kernel: R10: 000000000000b1e5 R11: 0000000000000000 R12: 0000000000004000
kernel: R13: 0000000000004000 R14: 0000000000000000 R15: ffffffffc06b19d0
kernel: FS: 0000000000000000(0000) GS:ffff97e2efe80000(0000) knlGS:0000000000000000
kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 0000000000000000 CR3: 00000003150e4000 CR4: 00000000001026e0
My problem is triggered by receiving snapshots into an encrypted dataset, i am not sending data anywhere from this node.
This machine is a consumer of snapshots for multiple machines (all different versions of ZFS). The kernel NULL pointer dereference always happens when a zfs receive runs, never when idle (most of the time there is a running receive, so this might not be valuable information). All sources are non-encrypted, the destination dataset is encrypted and opened with a key.
The commands being run are something along:
zfs send -I dataset1@from dataset1@to |compression|buffering|ssh| zfs receive -s -F 'encrypted-dataset/backups/dataset1'
I have tried checking on /proc/spl/kstat/zfs/dbgmsg (zfs_flags = 256, turning it higher, and i can't manage the amout of data it produces) but nothing interesting showed up.
Perhaps someone could tell us, how to diagnose it further?
On a side note, it seems all three of us are using relatively old arcitectures, from the same era:
- segdy: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0a 12/21/2015 - C220 released Q2'13
- pquan: H81 Pro BTC R2.0 - H81 released Q3'13
- me: Pentium J2900, Bay Trail-D released Q4'13
Same problem here, on NixOS, Linux 5.4.114, ZFS 2.0.4, also (probably) when receiving into an encrypted dataset.
Kernel log: https://gist.github.com/lheckemann/1c11ab6865c44a192c9d2597d17ba72b
This is the second time these symptoms have shown up, though I didn't think to get logs the first time around.
I'm on a Celeron G3900 (Skylake, Q4'15).
EDIT: I'm receiving non-raw sends.
- segdy: Supermicro X10SLL-F/X10SLL-F, BIOS 3.0a 12/21/2015 - C220 released Q2'13
- pquan: H81 Pro BTC R2.0 - H81 released Q3'13
- me: Pentium J2900, Bay Trail-D released Q4'13
I'm not sure if this is related to the age of the architecture. But add another of my systems: Intel(R) Xeon(R) CPU E3-1231 running on ASUS P9D-M board. This is relatively "new" when compared to the others.
Incidentally, I also have a X8SIL running an X3440 cpu and 32GB ram. That has not experienced the issue (yet). All these systems are, as you define them, consumers for ZFS snapshots (i.e. backup servers receiving the snapshots from production). The lockups seem related to load too. The more you receive, the more likely the issue occurs. Parallel streams seem trigger the issue more easily .
Today I had a corruption on the receiving system. The load average was over 3000 when I had to hard-reset it. On boot, the zfs import took an unusual long time. zfs status is reporting a permanent data error
errors: Permanent errors have been detected in the following files:
<0x1e39d>:<0x0>
But I'm not sure what that file is. If data corruption is involved, this would be a serious issue.
Could you share the output of these two commands?
cat /sys/module/icp/parameters/icp_aes_impl
cat /sys/module/icp/parameters/icp_gcm_impl
I have a vague suspicion that all the CPUs mentioned here don't believe in aesni, and am wondering if there's something awry in the codepath they're taking.
(Note that I am not remotely an expert on this code, just a random person musing about what could be going so awry for you and yet not causing much more screaming by encryption users.)
(If you'd like to more aggressively experiment with this theory, I suspect the speed order for this goes aesni > x86_64 > generic, so if you are indeed using x86_64, you could try using the "generic" implementation and see if the problem still reproduces.)
# cat /sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64
# cat /sys/module/icp/parameters/icp_gcm_impl
cycle [fastest] generic pclmulqdq
I have switched both parameters to generic now. Will report back as things advance.
One further thought - I don't see a way to validate which implementation it was currently using on "fastest", so it's also possible you were already using generic for one of both of those. So if it still NULL derefs, you could try explicitly setting x86_64 and pclmulqdq.
(I thought of this because if #12106 is indeed related to this, it definitely can't be using x86_64, so...)
# cat /sys/module/icp/parameters/icp_aes_impl cycle [fastest] generic x86_64 # cat /sys/module/icp/parameters/icp_gcm_impl cycle [fastest] generic pclmulqdq
I have switched both parameters to generic now. Will report back as things advance.
Same output here, on all CPU I use (aes-ni aware and in use not only for zfs but for many other things).
# cat /sys/module/icp/parameters/icp_aes_impl
cycle [fastest] generic x86_64
# cat /sys/module/icp/parameters/icp_gcm_impl
cycle [fastest] generic pclmulqdq
Are we suggesting that the aes-ni implementation in the kernel is "buggy"? That is most certainly not the case.
Not sure if I pointed this out, but, in my case, I'm using different encryption keys on the sending and on the receiving side. Also, some of my senders are not encrypted, but the receiver is (via a parent fs which is encrypted).
No, I was suggesting precisely the opposite - that the reason people here might be seeing an issue and others aren't is the absence of use of aesni.
edit: Just for reference, on my (Broadwell) Xeon:
$ for i in aes gcm;do cat /sys/module/icp/parameters/icp_${i}_impl;echo "";done;
cycle [fastest] generic x86_64 aesni
cycle [fastest] avx generic pclmulqdq
So I was hypothesizing the absence of use of one or both of those could be related to this, and was more suspicious of aesni than avx.
Understood. Not sure how to test if zfs is using aes-ni, but a quick check on openssl with evp and non-evp api show a triple-fold increase with using evp, suggesting that aes-ni hardware is being exercised.
I'm going to turn explicitly on aesni and avx on one of the servers now. Just to test the opposite case of generic.
cycle fastest generic x86_64 [aesni]
cycle fastest [avx] generic pclmulqdq
AFAICS from quickly skimming the code, there's no equivalent of vdev_raidz_bench for the crypto bits. Maybe I'll see how hard implementing one would be after this.
Just to be clear I'm trying to reproduce the right thing, could somebody share which values of "encryption/keyformat" they're using on the dataset that things are being received under? (And of course, if they turn out to all be the same/different, that's another data point...)
My consumer is:
rpool/backups encryption aes-256-gcm -
rpool/backups keylocation file:///rpool/other-encrypted/zfs.KEY local
rpool/backups keyformat raw
(After changing icp_[aes|gcm]_impl to "generic" a couple of hours ago, the next backup iteration caused the same null pointer/stack trace.
After restarting and re-applying the "generic" settings, has yet crashed again)
Well, unless generic was mysteriously calculated to be faster before and you were always using that, there goes that theory. Rats.
edit: Well, I guess it's technically not impossible for both the non-aesni/avx implementations to cause similar misbehavior, but I'm going to assume for now that it's not specific to those...somehow.
@rincebrain The fastest implementation is statically selected according to the available hardware. See here and here. To summarize:
GCM: generic
< pclmulqdq
< aes
AES: generic
< x86_64
< aesni
I'm still working on supporting more instruction set extensions for GCM and am thinking of adding a bench as part of that work. Currently the static selection should always select the fastest one.
Further it's quite unlikely that the implementations differ in behavior since ztest
uses the cycle
implementation, which cycles through all avialable implementations per encryption call/block, and should've catched any discrepancies. I think it's more likely that the load encryption puts on the system triggers some race condition. Just a guess though.
@rincebrain The fastest implementation is statically selected according to the available hardware. See here and here. To summarize:
GCM:
generic
<pclmulqdq
<aes
AES:generic
<x86_64
<aesni
I'm still working on supporting more instruction set extensions for GCM and am thinking of adding a bench as part of that work. Currently the static selection should always select the fastest one.
Further it's quite unlikely that the implementations differ in behavior since
ztest
uses thecycle
implementation, which cycles through all avialable implementations per encryption call/block, and should've catched any discrepancies. I think it's more likely that the load encryption puts on the system triggers some race condition. Just a guess though.
Yeah, I'd buy that, the only reason I proposed the selection as an idea before was, as I said, that I noticed all of them happened to be using CPUs that did not believe in aesni, and the assumption that most people are not running into this regularly, so something about their setups is different.
I also, annoyingly, have not been able to reproduce this in a VM no matter how I've varied the amount of computational power or cores, so far. :/
I must say, this is the first time someone talks about cpu beliefs. This is a serious issue. It happens on multiple cpu generations, mainboard etc. All my stuff is server grade using ecc memory and enterprise sata disks. No errors are logged on the disks, memory, cpu anywhere. Yet, now I'm starting to develop data errors on multiple servers. These seem related to errors in snapshots, even on "source" snapshots: i.e. those that were not transferred to a backup server, but that are native to a server. It started when I started using encrypted storage and OpenZFS v2.0.4 (specifially the proxmox kernel pve-kernel-5.4.114-1-pve)
It might be related to proxmox. I tought I've never seen it with the 0.8.5 zfs module, but turns out it was there. Now I have the same issue with 2.0.4. The thing that changed was encryption.
For now the data corruption regards only snapshots. I hope it won't start affecting live files.
It also seems there's some silent corruption goin on. On one of the servers with data errors, I never had this null pointer dereference, yet, there are corrupted snapshots, and may of them:
errors: Permanent errors have been detected in the following files:
<0x16c01>:<0x0>
<0x17506>:<0x1>
<0x16220>:<0x0>
<0x1742a>:<0x0>
<0x1742a>:<0x1>
<0x1742c>:<0x1>
<0x1612f>:<0x0>
<0x17435>:<0x0>
<0x17439>:<0x0>
<0x16151>:<0x0>
<0x17451>:<0x0>
<0x17451>:<0x1>
<0x16d66>:<0x0>
rpool/vm/vm-1234-disk-0@autosnap_2021-05-22_20:00:03_hourly:<0x0>
<0x16280>:<0x0>
<0x1678c>:<0x0>
rpool/vm/vm-1234-disk-0@autosnap_2021-05-22_22:00:01_hourly:<0x0>
<0x165ad>:<0x0>
<0x165ad>:<0x1>
<0x171bd>:<0x0>
<0x15fcb>:<0x0>
<0x165d8>:<0x0>
<0x169f9>:<0x0>
Those hex ids are deleted snapshots (most likely). It can happen that a snapshot is created/deleted during the send/receive. They are asynchronous processes.
At the moment I'm running a scrub now on all servers. Will see if those errors go away or not (deleted snapshots). But the "vm-1234" snapshots above cannot be deleted. They're not visible when issuing zfs list -t snapshot, yet, they do have a name in the error list.
# zfs destroy rpool/vm/vm-1234-disk-0@autosnap_2021-05-22_22:00:01_hourly
could not find any snapshots to destroy; check snapshot names.
There's also more:
# zfs list -t snapshot rpool/vm/vm-1234-disk-0|grep 05-22
cannot iterate filesystems: I/O error
#
Ok, I was waken up this morning by a (perhaps related) issue on one of the "sending" servers. The stack trace does not show the encryption part and seems more related to the arc. I haven't seen one of those for years, so being a very rare thing, I decided to post it here in hope it would help.
I was doing some things on this server tonight, namely:
- transferring all filesystems from an encrypted parent to an unencrypted parent (i.e. from rpool/encrypted/filesystem{0..n} to rpool/non-encrypted/filesystem{0..n}
- making hourly snapshots, some of which must have happened during the transfer, and deleting older expired snapshots.
- making (very likely) other send transfers from the encrypted filesystems to a remote backup server.
- scrubbing the rpool pool to verify for errors and hopefully remove those older snapshots which are now identified via a hex-id in the error table.
- all this while running around 10 lxc containers and a couple of kvm virtual machines.
The server was completely locked up, load average around 6800 and the following stack trace:
[1214732.919868] VERIFY3(0 == remove_reference(hdr, NULL, tag)) failed (0 == 1)
[1214732.919930] PANIC at arc.c:3791:arc_buf_destroy()
[1214732.919985] Showing stack for process 1114
[1214732.920040] CPU: 2 PID: 1114 Comm: z_rd_int Tainted: P O 5.4.114-1-pve #1
[1214732.920114] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.3.V1 02/04/2020
[1214732.920193] Call Trace:
[1214732.920251] dump_stack+0x6d/0x8b
[1214732.920309] spl_dumpstack+0x29/0x2b [spl]
[1214732.920365] spl_panic+0xd3/0xfb [spl]
[1214732.920420] ? __wake_up_common+0x78/0x140
[1214732.920521] ? zei_add_range+0x140/0x140 [zfs]
[1214732.920576] ? __wake_up+0x13/0x20
[1214732.920630] ? __cv_broadcast+0x2a/0x40 [spl]
[1214732.920715] ? zfs_zevent_post+0x223/0x280 [zfs]
[1214732.920792] arc_buf_destroy+0x101/0x110 [zfs]
[1214732.920869] arc_read_done+0x243/0x480 [zfs]
[1214732.920957] zio_done+0x43f/0x1020 [zfs]
[1214732.921045] zio_execute+0x99/0xf0 [zfs]
[1214732.921101] taskq_thread+0x2f7/0x4e0 [spl]
[1214732.921157] ? wake_up_q+0x80/0x80
[1214732.921243] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1214732.921302] kthread+0x120/0x140
[1214732.921357] ? task_done+0xb0/0xb0 [spl]
[1214732.921410] ? kthread_park+0x90/0x90
[1214732.921465] ret_from_fork+0x35/0x40
[1214871.316671] INFO: task z_rd_int:1114 blocked for more than 120 seconds.
[1214871.316733] Tainted: P O 5.4.114-1-pve #1
[1214871.316790] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214871.316864] z_rd_int D 0 1114 2 0x80004000
[1214871.316921] Call Trace:
[1214871.316978] __schedule+0x2e6/0x700
[1214871.317040] schedule+0x33/0xa0
[1214871.317096] spl_panic+0xf9/0xfb [spl]
[1214871.317195] ? zei_add_range+0x140/0x140 [zfs]
[1214871.317252] ? __wake_up+0x13/0x20
[1214871.317307] ? __cv_broadcast+0x2a/0x40 [spl]
[1214871.317389] ? zfs_zevent_post+0x223/0x280 [zfs]
[1214871.317466] arc_buf_destroy+0x101/0x110 [zfs]
[1214871.317542] arc_read_done+0x243/0x480 [zfs]
[1214871.317628] zio_done+0x43f/0x1020 [zfs]
[1214871.317727] zio_execute+0x99/0xf0 [zfs]
[1214871.317783] taskq_thread+0x2f7/0x4e0 [spl]
[1214871.317839] ? wake_up_q+0x80/0x80
[1214871.317925] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1214871.317984] kthread+0x120/0x140
[1214871.318050] ? task_done+0xb0/0xb0 [spl]
[1214871.318104] ? kthread_park+0x90/0x90
[1214871.318157] ret_from_fork+0x35/0x40
[1214871.318215] INFO: task dp_sync_taskq:1157 blocked for more than 120 seconds.
[1214871.318287] Tainted: P O 5.4.114-1-pve #1
[1214871.318344] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214871.318417] dp_sync_taskq D 0 1157 2 0x80004000
[1214871.318473] Call Trace:
[1214871.318525] __schedule+0x2e6/0x700
[1214871.318577] schedule+0x33/0xa0
[1214871.318631] cv_wait_common+0x104/0x130 [spl]
[1214871.318685] ? wait_woken+0x80/0x80
[1214871.318751] __cv_wait+0x15/0x20 [spl]
[1214871.318834] dsl_scan_prefetch_thread+0xad/0x290 [zfs]
[1214871.318892] taskq_thread+0x2f7/0x4e0 [spl]
[1214871.318947] ? wake_up_q+0x80/0x80
[1214871.319001] kthread+0x120/0x140
[1214871.319066] ? task_done+0xb0/0xb0 [spl]
[1214871.319119] ? kthread_park+0x90/0x90
[1214871.319173] ret_from_fork+0x35/0x40
[1214871.319228] INFO: task txg_sync:1325 blocked for more than 120 seconds.
[1214871.319286] Tainted: P O 5.4.114-1-pve #1
[1214871.319342] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214871.319416] txg_sync D 0 1325 2 0x80004000
[1214871.319472] Call Trace:
[1214871.319523] __schedule+0x2e6/0x700
[1214871.319607] ? vdev_mirror_io_start+0xb0/0x1a0 [zfs]
[1214871.319662] schedule+0x33/0xa0
[1214871.319715] schedule_timeout+0x152/0x330
[1214871.319778] ? __next_timer_interrupt+0xd0/0xd0
[1214871.319833] io_schedule_timeout+0x1e/0x50
[1214871.319888] __cv_timedwait_common+0x138/0x170 [spl]
[1214871.319944] ? wait_woken+0x80/0x80
[1214871.319998] __cv_timedwait_io+0x19/0x20 [spl]
[1214871.320086] zio_wait+0x139/0x280 [zfs]
[1214871.320162] arc_read+0xd68/0x1210 [zfs]
[1214871.320249] ? arc_hdr_set_compress+0x50/0x50 [zfs]
[1214871.320332] dsl_scan_visitbp+0x77d/0xcd0 [zfs]
[1214871.320388] ? __cv_broadcast+0x2a/0x40 [spl]
[1214871.320470] ? dsl_scan_prefetch+0x222/0x260 [zfs]
[1214871.320552] dsl_scan_visit_rootbp+0xe8/0x150 [zfs]
[1214871.320648] dsl_scan_visitds+0x1ae/0x510 [zfs]
[1214871.320790] ? dnode_rele_and_unlock+0xb6/0xe0 [zfs]
[1214871.320874] ? dnode_rele+0x3b/0x40 [zfs]
[1214871.320951] ? dbuf_rele_and_unlock+0x306/0x6a0 [zfs]
[1214871.321039] ? dsl_dataset_hold_obj+0x68c/0x9e0 [zfs]
[1214871.321130] ? dbuf_rele+0x3b/0x40 [zfs]
[1214871.321246] dsl_scan_sync+0x90e/0x1320 [zfs]
[1214871.321337] spa_sync+0x610/0xfe0 [zfs]
[1214871.321392] ? mutex_lock+0x12/0x30
[1214871.321478] ? spa_txg_history_init_io+0x104/0x110 [zfs]
[1214871.321568] txg_sync_thread+0x2e1/0x4a0 [zfs]
[1214871.321656] ? txg_thread_exit.isra.13+0x60/0x60 [zfs]
[1214871.321714] thread_generic_wrapper+0x74/0x90 [spl]
[1214871.321770] kthread+0x120/0x140
[1214871.321825] ? __thread_exit+0x20/0x20 [spl]
[1214871.321879] ? kthread_park+0x90/0x90
[1214871.321933] ret_from_fork+0x35/0x40
[1214992.151145] INFO: task z_rd_int:1114 blocked for more than 241 seconds.
[1214992.151211] Tainted: P O 5.4.114-1-pve #1
[1214992.151268] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214992.151342] z_rd_int D 0 1114 2 0x80004000
[1214992.151399] Call Trace:
[1214992.151456] __schedule+0x2e6/0x700
[1214992.151518] schedule+0x33/0xa0
[1214992.151584] spl_panic+0xf9/0xfb [spl]
[1214992.151687] ? zei_add_range+0x140/0x140 [zfs]
[1214992.151743] ? __wake_up+0x13/0x20
[1214992.151798] ? __cv_broadcast+0x2a/0x40 [spl]
[1214992.151882] ? zfs_zevent_post+0x223/0x280 [zfs]
[1214992.151966] arc_buf_destroy+0x101/0x110 [zfs]
[1214992.152042] arc_read_done+0x243/0x480 [zfs]
[1214992.152128] zio_done+0x43f/0x1020 [zfs]
[1214992.152213] zio_execute+0x99/0xf0 [zfs]
[1214992.152281] taskq_thread+0x2f7/0x4e0 [spl]
[1214992.152337] ? wake_up_q+0x80/0x80
[1214992.152423] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1214992.152482] kthread+0x120/0x140
[1214992.152537] ? task_done+0xb0/0xb0 [spl]
[1214992.152591] ? kthread_park+0x90/0x90
[1214992.152645] ret_from_fork+0x35/0x40
[1214992.152702] INFO: task dp_sync_taskq:1157 blocked for more than 241 seconds.
[1214992.152774] Tainted: P O 5.4.114-1-pve #1
[1214992.152831] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214992.152905] dp_sync_taskq D 0 1157 2 0x80004000
[1214992.152968] Call Trace:
[1214992.153019] __schedule+0x2e6/0x700
[1214992.153072] schedule+0x33/0xa0
[1214992.153126] cv_wait_common+0x104/0x130 [spl]
[1214992.153180] ? wait_woken+0x80/0x80
[1214992.153235] __cv_wait+0x15/0x20 [spl]
[1214992.153315] dsl_scan_prefetch_thread+0xad/0x290 [zfs]
[1214992.153374] taskq_thread+0x2f7/0x4e0 [spl]
[1214992.153428] ? wake_up_q+0x80/0x80
[1214992.153482] kthread+0x120/0x140
[1214992.153536] ? task_done+0xb0/0xb0 [spl]
[1214992.153589] ? kthread_park+0x90/0x90
[1214992.153654] ret_from_fork+0x35/0x40
[1214992.153709] INFO: task txg_sync:1325 blocked for more than 241 seconds.
[1214992.153767] Tainted: P O 5.4.114-1-pve #1
[1214992.153823] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1214992.153897] txg_sync D 0 1325 2 0x80004000
[1214992.153953] Call Trace:
[1214992.154016] __schedule+0x2e6/0x700
[1214992.154099] ? vdev_mirror_io_start+0xb0/0x1a0 [zfs]
[1214992.159404] schedule+0x33/0xa0
[1214992.159463] schedule_timeout+0x152/0x330
[1214992.159521] ? __next_timer_interrupt+0xd0/0xd0
[1214992.159579] io_schedule_timeout+0x1e/0x50
[1214992.159640] __cv_timedwait_common+0x138/0x170 [spl]
[1214992.159699] ? wait_woken+0x80/0x80
[1214992.159754] __cv_timedwait_io+0x19/0x20 [spl]
[1214992.159842] zio_wait+0x139/0x280 [zfs]
[1214992.159918] arc_read+0xd68/0x1210 [zfs]
[1214992.159994] ? arc_hdr_set_compress+0x50/0x50 [zfs]
[1214992.160079] dsl_scan_visitbp+0x77d/0xcd0 [zfs]
[1214992.160136] ? __cv_broadcast+0x2a/0x40 [spl]
[1214992.160219] ? dsl_scan_prefetch+0x222/0x260 [zfs]
[1214992.160304] dsl_scan_visit_rootbp+0xe8/0x150 [zfs]
[1214992.160388] dsl_scan_visitds+0x1ae/0x510 [zfs]
[1214992.160469] ? dnode_rele_and_unlock+0xb6/0xe0 [zfs]
[1214992.160551] ? dnode_rele+0x3b/0x40 [zfs]
[1214992.160628] ? dbuf_rele_and_unlock+0x306/0x6a0 [zfs]
[1214992.160711] ? dsl_dataset_hold_obj+0x68c/0x9e0 [zfs]
[1214992.160791] ? dbuf_rele+0x3b/0x40 [zfs]
[1214992.160874] dsl_scan_sync+0x90e/0x1320 [zfs]
[1214992.160959] spa_sync+0x610/0xfe0 [zfs]
[1214992.161013] ? mutex_lock+0x12/0x30
[1214992.161098] ? spa_txg_history_init_io+0x104/0x110 [zfs]
[1214992.161186] txg_sync_thread+0x2e1/0x4a0 [zfs]
[1214992.161273] ? txg_thread_exit.isra.13+0x60/0x60 [zfs]
[1214992.161332] thread_generic_wrapper+0x74/0x90 [spl]
[1214992.161388] kthread+0x120/0x140
[1214992.161442] ? __thread_exit+0x20/0x20 [spl]
[1214992.161497] ? kthread_park+0x90/0x90
[1214992.161550] ret_from_fork+0x35/0x40
[1215112.985599] INFO: task z_rd_int:1114 blocked for more than 362 seconds.
[1215112.985661] Tainted: P O 5.4.114-1-pve #1
[1215112.985718] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1215112.985792] z_rd_int D 0 1114 2 0x80004000
[1215112.985849] Call Trace:
[1215112.985905] __schedule+0x2e6/0x700
[1215112.985960] schedule+0x33/0xa0
[1215112.986017] spl_panic+0xf9/0xfb [spl]
[1215112.986119] ? zei_add_range+0x140/0x140 [zfs]
[1215112.986175] ? __wake_up+0x13/0x20
[1215112.986229] ? __cv_broadcast+0x2a/0x40 [spl]
[1215112.986314] ? zfs_zevent_post+0x223/0x280 [zfs]
[1215112.986395] arc_buf_destroy+0x101/0x110 [zfs]
[1215112.986473] arc_read_done+0x243/0x480 [zfs]
[1215112.986561] zio_done+0x43f/0x1020 [zfs]
[1215112.986648] zio_execute+0x99/0xf0 [zfs]
[1215112.986705] taskq_thread+0x2f7/0x4e0 [spl]
[1215112.986761] ? wake_up_q+0x80/0x80
[1215112.986847] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1215112.986906] kthread+0x120/0x140
[1215112.986961] ? task_done+0xb0/0xb0 [spl]
[1215112.987015] ? kthread_park+0x90/0x90
[1215112.987069] ret_from_fork+0x35/0x40
[1215112.987126] INFO: task dp_sync_taskq:1157 blocked for more than 362 seconds.
[1215112.987199] Tainted: P O 5.4.114-1-pve #1
[1215112.987255] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1215112.987329] dp_sync_taskq D 0 1157 2 0x80004000
[1215112.987385] Call Trace:
[1215112.987436] __schedule+0x2e6/0x700
[1215112.987490] schedule+0x33/0xa0
[1215112.987543] cv_wait_common+0x104/0x130 [spl]
[1215112.987598] ? wait_woken+0x80/0x80
[1215112.987653] __cv_wait+0x15/0x20 [spl]
[1215112.987735] dsl_scan_prefetch_thread+0xad/0x290 [zfs]
[1215112.987794] taskq_thread+0x2f7/0x4e0 [spl]
[1215112.987849] ? wake_up_q+0x80/0x80
[1215112.987902] kthread+0x120/0x140
[1215112.987957] ? task_done+0xb0/0xb0 [spl]
[1215112.988010] ? kthread_park+0x90/0x90
[1215112.988064] ret_from_fork+0x35/0x40
[1215233.820082] INFO: task z_rd_int:1114 blocked for more than 483 seconds.
[1215233.820144] Tainted: P O 5.4.114-1-pve #1
[1215233.820201] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1215233.820275] z_rd_int D 0 1114 2 0x80004000
[1215233.820332] Call Trace:
[1215233.820389] __schedule+0x2e6/0x700
[1215233.820443] schedule+0x33/0xa0
[1215233.820501] spl_panic+0xf9/0xfb [spl]
[1215233.820603] ? zei_add_range+0x140/0x140 [zfs]
[1215233.820660] ? __wake_up+0x13/0x20
[1215233.820714] ? __cv_broadcast+0x2a/0x40 [spl]
[1215233.820799] ? zfs_zevent_post+0x223/0x280 [zfs]
[1215233.820876] arc_buf_destroy+0x101/0x110 [zfs]
[1215233.820953] arc_read_done+0x243/0x480 [zfs]
[1215233.821041] zio_done+0x43f/0x1020 [zfs]
[1215233.821129] zio_execute+0x99/0xf0 [zfs]
[1215233.821185] taskq_thread+0x2f7/0x4e0 [spl]
[1215233.821241] ? wake_up_q+0x80/0x80
[1215233.821327] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[1215233.821386] kthread+0x120/0x140
[1215233.821441] ? task_done+0xb0/0xb0 [spl]
[1215233.821494] ? kthread_park+0x90/0x90
[1215233.821549] ret_from_fork+0x35/0x40
[1215233.821606] INFO: task dp_sync_taskq:1157 blocked for more than 483 seconds.
[1215233.821679] Tainted: P O 5.4.114-1-pve #1
[1215233.821735] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1215233.821809] dp_sync_taskq D 0 1157 2 0x80004000
[1215233.821865] Call Trace:
[1215233.821917] __schedule+0x2e6/0x700
[1215233.821970] schedule+0x33/0xa0
[1215233.822024] cv_wait_common+0x104/0x130 [spl]
[1215233.822078] ? wait_woken+0x80/0x80
[1215233.822133] __cv_wait+0x15/0x20 [spl]
[1215233.822216] dsl_scan_prefetch_thread+0xad/0x290 [zfs]
[1215233.822274] taskq_thread+0x2f7/0x4e0 [spl]
[1215233.822329] ? wake_up_q+0x80/0x80
[1215233.822383] kthread+0x120/0x140
[1215233.822437] ? task_done+0xb0/0xb0 [spl]
[1215233.822491] ? kthread_park+0x90/0x90
[1215233.822545] ret_from_fork+0x35/0x40
This is an Intel(R) Xeon(R) CPU E5-1620 v2 (4 core, 8 threads) cpu, 32GB of ECC ram running on a Supermicro X9SRE-3F motherboard.
No other errors logged. The system was responsive due to a separated ext4 root unrelated to zfs. I was unable to unmount, mount, touch anything zfs-related. I had to issue a unmount/sync/reboot magic sysreq to reboot it.
Maybe the reason we're able to trigger this issue is because these are server systems doing heavy lifting on real parallel threads as opposed to "simulated" under a VM?
I hope this helps. Let me know if I can do anything to help diagnose further.
Edit: I'm reading he metrics on the server and can give some more details. The issue started around 2 o'clock in the morning, The total lockup of the server arrived around 4 o'clock, 2 hours later. I had the following symptoms:
- iowait climbed from 10% to 100% steady, then at 4 o'clock iowait went to 50%, system time went from 0-5% to 50%
- Disk io request went from 200/sec to 0 at 2 o'clock and staid like this until the reset.
- swap usage was around 4GB the whole time (100%), with used swap climbing from 0 to ~512Mb starting at 2 o'clock
- Swap io bytes were high. There was active swap in/out going on. This is not normal. Usually this server has 0% swapping going around.
All other parameters are ok. This last lock-up seems unrelated to the crypt path above. It looks more like the issues of ram starvation like we used to have back a few years ago.
A data point: I left my Xeon D-1540 (8c16t, Broadwell) (Debian buster, 4.19.0-16-amd64, 2.0.3-1~bpo10+1) running a loop for 24 hours of running 16 zfs receives into an encrypted parent of an unencrypted source send (no flags) of a ~39GiB dataset I have, with no VM, just in case it was indeed something that couldn't be triggered for that reason, then destroying them, and so on. No excitement ensued.
Can you share the output of zfs get all [one of these source datasets]
and zpool get all [source pool]
, and the same for the destination pool and dataset? I'd like to eliminate as many possible differences as I can.
I have randomly picked two sources with two datasets, and the destination: zfs get all output It might be worth noting:
-
I have yet to encounter data corruption (scrubs are monthly, and the issue has been persistent around for about a year now (see target pool creation date)
-
The sends are incremental
-
Example command: /opt/sanoid-2.0.3/syncoid root@source-machine:rpool-ssd/data rpool/backups/source-machine/rpool-ssd/data --skip-parent -r --compress pigz-fast -o ServerAliveCountMax=3 -o ServerAliveInterval=30 --no-sync-snap --debug (there are machines where i make the sync-snap so the --no-sync-snap is omitted)
-
I have summarized the available crash data:
single thread, icp_aes_impl icp_gcm_impl on fastest Apr 02 08:16:26 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Apr 14 09:05:41 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Apr 17 04:33:22 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 May 01 16:02:09 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 May 06 11:46:07 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 May 12 05:49:43 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 May 12 21:07:13 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
(Since the last two crashes where so close to eachother, I've implemented a watchdog and automatic unlocking, and went full berserk, re-enabling concurrency) 4 thread, icp_aes_impl icp_gcm_impl on fastest: May 21 23:25:21 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Uptime: 39:46:02 May 24 03:31:06 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Uptime: 52:05:45
4 threads icp_aes_impl icp_gcm_impl on generic: May 24 14:00:40 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Uptime: 10:29:34 May 26 03:29:24 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000 Uptime: 37:28:44
4 threads icp_aes_impl icp_gcm_impl on x86_64 / pclmulqdq No crash yet, uptime is 3 days, 7:47. (Not even close to the ~2 week best uptime with single thread, but with 4 threads it is the best so far)
This time, I left a loop of 4 sends and rollbacks running, after creating an incremental by (on the same source I used for my prior send loop:
- truncating half the files to 0b and deleting the other half
- setting src and dst to compression=gzip-9 since the examples posted by cserem all had that
- rsync a --inplace all the data back over
- take a new snapshot
Then zfs send -I snap1 snap2 > incsend
Then a loop of while true; for i in $(seq 1 4); do dd if=incsend bs=1M | zfs recv encroot/$i & done; wait; for i in $(seq 1 4);do zfs rollback encroot/$i@snap1;done; done;
(boy, my server loved 4 gzip-9 streams at once)
Unfortunately, as you might guess from my not leading with a success statement, it did not reproduce after 24 hours of looping.
I'll probably explore including a zvol next, unless someone else comes up with a brighter idea.
I think I stated before, my setup, but I'll try to explain my setup, in a deeper fashion:
- I'm running encrypted to encrypted, recursive, incremental transfer. The keys on the two servers differ.
- I'm using resumable transfers because of the size of each of them
- The size of the dataset is easily 200-1TB each. Some of them are mail pools with millions of small files (in case this matters).
- While doing the transfers, I'm also snapshotting these same datasets every hour, possibly having several snapshots occur during the transfer.
- And, I'm deleting expired snapshots on both the source and destination server. Those expired snapshots are not interested in the transfer.
- There are around 60 to 100 different snapshots in each dataset: 48 hourly, 30 daily, 12 monthly and an yearly snapshot, so some of the data is quite old.
- Both servers use a 2x2 mirrors (2 stripes of two mirrors), spinning disks (2TB each, , and thus awfully slow, 512bytes physical but ashift is 12).
I do not lock easily. Once in a week/10 days.
Now, some meat. In the past days, I detected some corruption on one of the source servers. ZFS could mount the dataset but the data in it was inconsistent (missing most directories under the root). Also I could not unmount it once mounted. I had this issue on two related datasets: Same LXC root and data (1TB mail server data).
Weirdly enough, sending this corrupted dataset to another dataset on the same machine seemed to fix the issue on one of them. The other (the mail spool), I had to roll back to a known good state, then I had to resync all missing mails form an off-site server before putting it back in production (this is a multisite, geo-replicated mail spool setup, so I had some live data to sync in addition).
I was able to delete the corrupted datasets. A full scrub fixed the data errors in the already deleted snapshots.
I know all the above does not give any hints, other than anecdotal evidence of a possible problem. I'm only sure there was corruption to start with and it was silent. It probably got to break some algo somewhere. I'd start with a full scrub on all affected pools.
Here's some get all statistics. Please note, despite the name of the rpool (proxmox root pool) this is not an actual root pool containing the boot files. I just use the same name for historic reasons, but the pool is mounted inside an "ext4" root.
First: sending server pool:
# zpool get all rpool
NAME PROPERTY VALUE SOURCE
rpool size 3.38T -
rpool capacity 49% -
rpool altroot - default
rpool health ONLINE -
rpool guid 10861474846250979260 -
rpool version - default
rpool bootfs - default
rpool delegation on default
rpool autoreplace off default
rpool cachefile - default
rpool failmode wait default
rpool listsnapshots off default
rpool autoexpand off default
rpool dedupratio 1.00x -
rpool free 1.71T -
rpool allocated 1.67T -
rpool readonly off -
rpool ashift 12 local
rpool comment - default
rpool expandsize - -
rpool freeing 0 -
rpool fragmentation 35% -
rpool leaked 0 -
rpool multihost off default
rpool checkpoint - -
rpool load_guid 8733562770359337761 -
rpool autotrim off default
rpool feature@async_destroy enabled local
rpool feature@empty_bpobj active local
rpool feature@lz4_compress active local
rpool feature@multi_vdev_crash_dump enabled local
rpool feature@spacemap_histogram active local
rpool feature@enabled_txg active local
rpool feature@hole_birth active local
rpool feature@extensible_dataset active local
rpool feature@embedded_data active local
rpool feature@bookmarks enabled local
rpool feature@filesystem_limits enabled local
rpool feature@large_blocks enabled local
rpool feature@large_dnode active local
rpool feature@sha512 enabled local
rpool feature@skein enabled local
rpool feature@edonr enabled local
rpool feature@userobj_accounting active local
rpool feature@encryption active local
rpool feature@project_quota active local
rpool feature@device_removal enabled local
rpool feature@obsolete_counts enabled local
rpool feature@zpool_checkpoint enabled local
rpool feature@spacemap_v2 active local
rpool feature@allocation_classes enabled local
rpool feature@resilver_defer enabled local
rpool feature@bookmark_v2 enabled local
rpool feature@redaction_bookmarks disabled local
rpool feature@redacted_datasets disabled local
rpool feature@bookmark_written disabled local
rpool feature@log_spacemap disabled local
rpool feature@livelist disabled local
rpool feature@device_rebuild disabled local
rpool feature@zstd_compress disabled local
Sending server dataset. This is an example. All my datasets are configured the same way (lz4, xattr=sa etc).
# zfs get all rpool/vm/subvol-4242-disk-0
NAME PROPERTY VALUE SOURCE
rpool/vm/subvol-4242-disk-0 type filesystem -
rpool/vm/subvol-4242-disk-0 creation Wed Mar 10 8:45 2021 -
rpool/vm/subvol-4242-disk-0 used 11.7G -
rpool/vm/subvol-4242-disk-0 available 1.60T -
rpool/vm/subvol-4242-disk-0 referenced 5.54G -
rpool/vm/subvol-4242-disk-0 compressratio 2.30x -
rpool/vm/subvol-4242-disk-0 mounted yes -
rpool/vm/subvol-4242-disk-0 quota none default
rpool/vm/subvol-4242-disk-0 reservation none default
rpool/vm/subvol-4242-disk-0 recordsize 128K default
rpool/vm/subvol-4242-disk-0 mountpoint /rpool/vm/subvol-4242-disk-0 default
rpool/vm/subvol-4242-disk-0 sharenfs off default
rpool/vm/subvol-4242-disk-0 checksum on default
rpool/vm/subvol-4242-disk-0 compression lz4 inherited from rpool
rpool/vm/subvol-4242-disk-0 atime off inherited from rpool
rpool/vm/subvol-4242-disk-0 devices on default
rpool/vm/subvol-4242-disk-0 exec on default
rpool/vm/subvol-4242-disk-0 setuid on default
rpool/vm/subvol-4242-disk-0 readonly off default
rpool/vm/subvol-4242-disk-0 zoned off default
rpool/vm/subvol-4242-disk-0 snapdir hidden default
rpool/vm/subvol-4242-disk-0 aclmode discard default
rpool/vm/subvol-4242-disk-0 aclinherit restricted default
rpool/vm/subvol-4242-disk-0 createtxg 3675833 -
rpool/vm/subvol-4242-disk-0 canmount on default
rpool/vm/subvol-4242-disk-0 xattr sa inherited from rpool
rpool/vm/subvol-4242-disk-0 copies 1 default
rpool/vm/subvol-4242-disk-0 version 5 -
rpool/vm/subvol-4242-disk-0 utf8only off -
rpool/vm/subvol-4242-disk-0 normalization none -
rpool/vm/subvol-4242-disk-0 casesensitivity sensitive -
rpool/vm/subvol-4242-disk-0 vscan off default
rpool/vm/subvol-4242-disk-0 nbmand off default
rpool/vm/subvol-4242-disk-0 sharesmb off default
rpool/vm/subvol-4242-disk-0 refquota none default
rpool/vm/subvol-4242-disk-0 refreservation none default
rpool/vm/subvol-4242-disk-0 guid 13076753325802082524 -
rpool/vm/subvol-4242-disk-0 primarycache all default
rpool/vm/subvol-4242-disk-0 secondarycache all default
rpool/vm/subvol-4242-disk-0 usedbysnapshots 6.13G -
rpool/vm/subvol-4242-disk-0 usedbydataset 5.54G -
rpool/vm/subvol-4242-disk-0 usedbychildren 0B -
rpool/vm/subvol-4242-disk-0 usedbyrefreservation 0B -
rpool/vm/subvol-4242-disk-0 logbias latency default
rpool/vm/subvol-4242-disk-0 objsetid 53904 -
rpool/vm/subvol-4242-disk-0 dedup off default
rpool/vm/subvol-4242-disk-0 mlslabel none default
rpool/vm/subvol-4242-disk-0 sync standard inherited from rpool/vm
rpool/vm/subvol-4242-disk-0 dnodesize auto inherited from rpool
rpool/vm/subvol-4242-disk-0 refcompressratio 2.06x -
rpool/vm/subvol-4242-disk-0 written 8K -
rpool/vm/subvol-4242-disk-0 logicalused 24.6G -
rpool/vm/subvol-4242-disk-0 logicalreferenced 10.8G -
rpool/vm/subvol-4242-disk-0 volmode default default
rpool/vm/subvol-4242-disk-0 filesystem_limit none default
rpool/vm/subvol-4242-disk-0 snapshot_limit none default
rpool/vm/subvol-4242-disk-0 filesystem_count none default
rpool/vm/subvol-4242-disk-0 snapshot_count none default
rpool/vm/subvol-4242-disk-0 snapdev hidden default
rpool/vm/subvol-4242-disk-0 acltype posix inherited from rpool
rpool/vm/subvol-4242-disk-0 context none default
rpool/vm/subvol-4242-disk-0 fscontext none default
rpool/vm/subvol-4242-disk-0 defcontext none default
rpool/vm/subvol-4242-disk-0 rootcontext none default
rpool/vm/subvol-4242-disk-0 relatime off default
rpool/vm/subvol-4242-disk-0 redundant_metadata all default
rpool/vm/subvol-4242-disk-0 overlay on default
rpool/vm/subvol-4242-disk-0 encryption aes-256-gcm -
rpool/vm/subvol-4242-disk-0 keylocation none default
rpool/vm/subvol-4242-disk-0 keyformat passphrase -
rpool/vm/subvol-4242-disk-0 pbkdf2iters 350000 -
rpool/vm/subvol-4242-disk-0 encryptionroot rpool -
rpool/vm/subvol-4242-disk-0 keystatus available -
rpool/vm/subvol-4242-disk-0 special_small_blocks 0 default
Here's a receiver get all :
# zpool get all rpool
NAME PROPERTY VALUE SOURCE
rpool size 3.62T -
rpool capacity 70% -
rpool altroot - default
rpool health ONLINE -
rpool guid 16014196203982227614 -
rpool version - default
rpool bootfs rpool/ROOT/pve-1 local
rpool delegation on default
rpool autoreplace off default
rpool cachefile - default
rpool failmode wait default
rpool listsnapshots off default
rpool autoexpand off default
rpool dedupratio 1.00x -
rpool free 1.07T -
rpool allocated 2.56T -
rpool readonly off -
rpool ashift 12 local
rpool comment - default
rpool expandsize - -
rpool freeing 0 -
rpool fragmentation 29% -
rpool leaked 0 -
rpool multihost off default
rpool checkpoint - -
rpool load_guid 9824040886009751730 -
rpool autotrim off default
rpool feature@async_destroy enabled local
rpool feature@empty_bpobj active local
rpool feature@lz4_compress active local
rpool feature@multi_vdev_crash_dump enabled local
rpool feature@spacemap_histogram active local
rpool feature@enabled_txg active local
rpool feature@hole_birth active local
rpool feature@extensible_dataset active local
rpool feature@embedded_data active local
rpool feature@bookmarks enabled local
rpool feature@filesystem_limits enabled local
rpool feature@large_blocks enabled local
rpool feature@large_dnode active local
rpool feature@sha512 enabled local
rpool feature@skein enabled local
rpool feature@edonr enabled local
rpool feature@userobj_accounting active local
rpool feature@encryption active local
rpool feature@project_quota active local
rpool feature@device_removal enabled local
rpool feature@obsolete_counts enabled local
rpool feature@zpool_checkpoint enabled local
rpool feature@spacemap_v2 active local
rpool feature@allocation_classes enabled local
rpool feature@resilver_defer enabled local
rpool feature@bookmark_v2 enabled local
rpool feature@redaction_bookmarks disabled local
rpool feature@redacted_datasets disabled local
rpool feature@bookmark_written disabled local
rpool feature@log_spacemap disabled local
rpool feature@livelist disabled local
rpool feature@device_rebuild disabled local
rpool feature@zstd_compress disabled local
And a dataset:
NAME PROPERTY VALUE SOURCE
rpool/backup/vm/subvol-3140-disk-1 type filesystem -
rpool/backup/vm/subvol-3140-disk-1 creation Thu Apr 8 10:28 2021 -
rpool/backup/vm/subvol-3140-disk-1 used 20.6G -
rpool/backup/vm/subvol-3140-disk-1 available 970G -
rpool/backup/vm/subvol-3140-disk-1 referenced 7.66G -
rpool/backup/vm/subvol-3140-disk-1 compressratio 2.13x -
rpool/backup/vm/subvol-3140-disk-1 mounted no -
rpool/backup/vm/subvol-3140-disk-1 quota none default
rpool/backup/vm/subvol-3140-disk-1 reservation none default
rpool/backup/vm/subvol-3140-disk-1 recordsize 128K default
rpool/backup/vm/subvol-3140-disk-1 mountpoint /rpool/backup/vm/subvol-3140-disk-1 default
rpool/backup/vm/subvol-3140-disk-1 sharenfs off default
rpool/backup/vm/subvol-3140-disk-1 checksum on default
rpool/backup/vm/subvol-3140-disk-1 compression lz4 inherited from rpool
rpool/backup/vm/subvol-3140-disk-1 atime off inherited from rpool
rpool/backup/vm/subvol-3140-disk-1 devices on default
rpool/backup/vm/subvol-3140-disk-1 exec on default
rpool/backup/vm/subvol-3140-disk-1 setuid on default
rpool/backup/vm/subvol-3140-disk-1 readonly off default
rpool/backup/vm/subvol-3140-disk-1 zoned off default
rpool/backup/vm/subvol-3140-disk-1 snapdir hidden default
rpool/backup/vm/subvol-3140-disk-1 aclmode discard default
rpool/backup/vm/subvol-3140-disk-1 aclinherit restricted default
rpool/backup/vm/subvol-3140-disk-1 createtxg 10724 -
rpool/backup/vm/subvol-3140-disk-1 canmount on default
rpool/backup/vm/subvol-3140-disk-1 xattr on default
rpool/backup/vm/subvol-3140-disk-1 copies 1 default
rpool/backup/vm/subvol-3140-disk-1 version 5 -
rpool/backup/vm/subvol-3140-disk-1 utf8only off -
rpool/backup/vm/subvol-3140-disk-1 normalization none -
rpool/backup/vm/subvol-3140-disk-1 casesensitivity sensitive -
rpool/backup/vm/subvol-3140-disk-1 vscan off default
rpool/backup/vm/subvol-3140-disk-1 nbmand off default
rpool/backup/vm/subvol-3140-disk-1 sharesmb off default
rpool/backup/vm/subvol-3140-disk-1 refquota none default
rpool/backup/vm/subvol-3140-disk-1 refreservation none default
rpool/backup/vm/subvol-3140-disk-1 guid 5702550096622166170 -
rpool/backup/vm/subvol-3140-disk-1 primarycache all default
rpool/backup/vm/subvol-3140-disk-1 secondarycache all default
rpool/backup/vm/subvol-3140-disk-1 usedbysnapshots 12.9G -
rpool/backup/vm/subvol-3140-disk-1 usedbydataset 7.66G -
rpool/backup/vm/subvol-3140-disk-1 usedbychildren 0B -
rpool/backup/vm/subvol-3140-disk-1 usedbyrefreservation 0B -
rpool/backup/vm/subvol-3140-disk-1 logbias latency default
rpool/backup/vm/subvol-3140-disk-1 objsetid 185 -
rpool/backup/vm/subvol-3140-disk-1 dedup off default
rpool/backup/vm/subvol-3140-disk-1 mlslabel none default
rpool/backup/vm/subvol-3140-disk-1 sync standard inherited from rpool
rpool/backup/vm/subvol-3140-disk-1 dnodesize legacy default
rpool/backup/vm/subvol-3140-disk-1 refcompressratio 1.19x -
rpool/backup/vm/subvol-3140-disk-1 written 0 -
rpool/backup/vm/subvol-3140-disk-1 logicalused 43.2G -
rpool/backup/vm/subvol-3140-disk-1 logicalreferenced 8.96G -
rpool/backup/vm/subvol-3140-disk-1 volmode default default
rpool/backup/vm/subvol-3140-disk-1 filesystem_limit none default
rpool/backup/vm/subvol-3140-disk-1 snapshot_limit none default
rpool/backup/vm/subvol-3140-disk-1 filesystem_count none default
rpool/backup/vm/subvol-3140-disk-1 snapshot_count none default
rpool/backup/vm/subvol-3140-disk-1 snapdev hidden default
rpool/backup/vm/subvol-3140-disk-1 acltype off default
rpool/backup/vm/subvol-3140-disk-1 context none default
rpool/backup/vm/subvol-3140-disk-1 fscontext none default
rpool/backup/vm/subvol-3140-disk-1 defcontext none default
rpool/backup/vm/subvol-3140-disk-1 rootcontext none default
rpool/backup/vm/subvol-3140-disk-1 relatime off default
rpool/backup/vm/subvol-3140-disk-1 redundant_metadata all default
rpool/backup/vm/subvol-3140-disk-1 overlay on default
rpool/backup/vm/subvol-3140-disk-1 encryption aes-256-gcm -
rpool/backup/vm/subvol-3140-disk-1 keylocation none default
rpool/backup/vm/subvol-3140-disk-1 keyformat passphrase -
rpool/backup/vm/subvol-3140-disk-1 pbkdf2iters 350000 -
rpool/backup/vm/subvol-3140-disk-1 encryptionroot rpool/backup -
rpool/backup/vm/subvol-3140-disk-1 keystatus available -
rpool/backup/vm/subvol-3140-disk-1 special_small_blocks 0 default
And a last thing I almost forgot: On some of the systems I'm having a misaligned write error. In the past this seemed related to hardware problems with bad sata cables and/or controller issues. I'm getting this issue on servers with onboard sata ports with very short cables. Again all this started after the upgrade to 0.8.x and most prominently I'm finding it on v2.0.4. This happens on both SSD and SATA disks. I'll reduce the speed on the sata disks to 1.5GBit as a check but doing so on the SSDs would not be feasible.
Sorry to drop all this meat on the table. I hope I'm not causing confusion. I've not been able to isolate the issue in a better way.
Update: 4 threads icp_aes_impl icp_gcm_impl on x86_64 / pclmulqdq crashed after: 5 days, 14:47 Trying again with the above parameters.
I was able to get the zfs related processes:
root 20454 0.0 0.0 2384 700 ? S 18:25 0:00 sh -c ssh -o ServerAliveCountMax=3 -o ServerAliveInterval=30 -S /tmp/syncoid-root@source1-1622478259 root@source1 ' zfs send -I '"'"'rpool/rpool-ssd-backup/subvol-120-disk-2'"'"'@'"'"'2021-05-31-180000'"'"' '"'"'rpool/rpool-ssd-backup/subvol-120-disk-2'"'"'@'"'"'2021-05-31-181500'"'"' | pigz -3 | mbuffer -q -s 128k -m 16M 2>/dev/null' | mbuffer -q -s 128k -m 16M 2>/dev/null | pigz -dc | pv -p -t -e -r -b -s 52098832 | zfs receive -s -F 'rpool/backups/SOURCE1/rpool/rpool-ssd-backup/subvol-120-disk-2' 2>&1
root 20459 0.0 0.0 15708 4172 ? S 18:25 0:00 zfs receive -s -F rpool/backups/SOURCE1/rpool/rpool-ssd-backup/subvol-120-disk-2
root 28671 0.0 0.0 2384 764 ? S 18:25 0:00 sh -c ssh -o ServerAliveCountMax=3 -o ServerAliveInterval=30 -S /tmp/syncoid-root@SOURCE2-1622478239 root@SOURCE2 ' zfs send -I '"'"'rpool/data/subvol-117-disk-0'"'"'@'"'"'syncoid_pve_2021-05-31:18:12:10-GMT02:00'"'"' '"'"'rpool/data/subvol-117-disk-0'"'"'@'"'"'syncoid_pve_2021-05-31:18:25:51-GMT02:00'"'"' | pigz -3 | mbuffer -q -s 128k -m 16M 2>/dev/null' | mbuffer -q -s 128k -m 16M 2>/dev/null | pigz -dc | pv -p -t -e -r -b -s 495575408 | zfs receive -s -F 'rpool/backups/SOURCE2/rpool/data/subvol-117-disk-0' 2>&1
root 28680 0.3 0.0 15708 4424 ? D 18:25 0:00 zfs receive -s -F rpool/backups/SOURCE2/rpool/data/subvol-117-disk-0
root 31831 0.0 0.0 2384 760 ? S 18:26 0:00 sh -c zfs rollback -R 'rpool/backups/SOURCE3/rpool-ssd/data/subvol-104-disk-1'@'2021-05-31-180000'
root 31832 0.0 0.0 15704 4144 ? D 18:26 0:00 zfs rollback -R rpool/backups/SOURCE3/rpool-ssd/data/subvol-104-disk-1@2021-05-31-180000
root 31863 0.0 0.0 2384 764 ? S 18:26 0:00 sh -c zfs destroy 'rpool/backups/SOURCE4/rpool/data/subvol-106-disk-2'@syncoid_pve_2021-05-31:18:12:03-GMT02:00
root 31865 0.0 0.0 15708 4012 ? S 18:26 0:00 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-106-disk-2@syncoid_pve_2021-05-31:18:12:03-GMT02:00
Since the destroys (thoese are some of the receives where I deliberately make a snapshot for syncing, and they are being cleaned up after the transfer) have the highest pid, perhaps the crash is related to that? Then again the rollback is 'D'... I will hopefully have more ps outputs in the future.
4 threads icp_aes_impl icp_gcm_impl on x86_64 / pclmulqdq crashed after: 1 day, 6:44 I am switching back to default (fastest)
Unfortunately i couldn't capture a process list this time.
Here are some events from previous crashes up to the point the system is rebooted and the pool is imported:
zpool history
2021-05-24.03:30:46 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-112-disk-1@syncoid_pve_2021-05-24:03:15:33-GMT02:00
2021-05-24.03:30:51 zfs rollback -R rpool/backups/SOURCE8/rpool/data/subvol-106-disk-2@syncoid_pve_2021-05-24:03:15:50-GMT02:00
2021-05-24.03:30:52 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1@syncoid_pve_2021-05-24:03:15:46-GMT02:00
2021-05-24.03:30:55 zfs receive -s -F rpool/backups/SOURCE8/rpool/data/subvol-106-disk-2
2021-05-24.03:30:57 zfs destroy rpool/backups/SOURCE8/rpool/data/subvol-106-disk-2@syncoid_pve_2021-05-24:03:15:50-GMT02:00
2021-05-24.03:31:00 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1
2021-05-24.03:31:02 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1@syncoid_pve_2021-05-24:03:15:46-GMT02:00
2021-05-24.03:31:03 zfs receive -s -F rpool/backups/SOURCE3/rpool/rpool-ssd-backup/subvol-120-disk-2
2021-05-24.03:31:04 zfs receive -s -F rpool/backups/SOURCE7/rpool-hdd/data/vm-102-disk-5
2021-05-24.03:31:04 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/ROOT/pve-1@2021-05-24-030000
>>>>May 24 03:31:06 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
2021-05-24.07:11:22 zpool import -N rpool
--
2021-05-24.14:00:33 zfs receive -s -F rpool/backups/SOURCE7/rpool-hdd/data/vm-101-disk-0
2021-05-24.14:00:34 zfs receive -F rpool/backups/SOURCE5/rpool/data/vm-100-disk-2
2021-05-24.14:00:34 zfs receive -s -F rpool/backups/SOURCE4/rpool/ROOT/pve-1
2021-05-24.14:00:35 zfs destroy rpool/backups/SOURCE5/rpool/data/vm-100-disk-2@syncoid_pve_2021-05-24:13:51:04-GMT02:00
2021-05-24.14:00:35 zfs rollback -R rpool/backups/SOURCE7/rpool-hdd/data/vm-102-disk-0@2021-05-24-134500
2021-05-24.14:00:36 zfs receive -s -F rpool/backups/SOURCE6/rpool/data/vm-101-disk-0
2021-05-24.14:00:37 zfs destroy rpool/backups/SOURCE6/rpool/data/vm-101-disk-0@syncoid_pve_2021-05-24:13:51:14-GMT02:00
2021-05-24.14:00:37 zfs destroy rpool/backups/SOURCE4/rpool/ROOT/pve-1@syncoid_pve_2021-05-24:13:50:54-GMT02:00
2021-05-24.14:00:38 zfs rollback -R rpool/backups/SOURCE6/rpool/data/vm-102-disk-0@syncoid_pve_2021-05-24:13:51:24-GMT02:00
2021-05-24.14:00:39 zfs rollback -R rpool/backups/SOURCE3/rpool/ROOT/pve-1@2021-05-24-134500
>>>>May 24 14:00:40 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
2021-05-24.16:03:19 zpool import -N rpool
--
2021-05-26.03:29:07 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-101-disk-2
2021-05-26.03:29:10 zfs rollback -R rpool/backups/SOURCE3/rpool/rpool-ssd-backup/subvol-120-disk-1@2021-05-26-030000
2021-05-26.03:29:10 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-101-disk-2@syncoid_pve_2021-05-26:03:14:09-GMT02:00
2021-05-26.03:29:11 zfs receive -s -F rpool/backups/SOURCE3/rpool/rpool-ssd-backup/subvol-120-disk-1
2021-05-26.03:29:13 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-102-disk-1@syncoid_pve_2021-05-26:03:14:20-GMT02:00
2021-05-26.03:29:14 zfs rollback -R rpool/backups/SOURCE8/rpool/ROOT/pve-1@syncoid_pve_2021-05-26:03:14:38-GMT02:00
2021-05-26.03:29:16 zfs rollback -R rpool/backups/SOURCE3/rpool/rpool-ssd-backup/subvol-120-disk-2@2021-05-26-030000
2021-05-26.03:29:20 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-102-disk-1
2021-05-26.03:29:22 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-102-disk-1@syncoid_pve_2021-05-26:03:14:20-GMT02:00
>>>>May 26 03:29:24 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
2021-05-26.03:29:26 zfs receive -s -F rpool/backups/SOURCE8/rpool/ROOT/pve-1
2021-05-26.03:40:17 zpool import -N rpool
--
2021-05-31.18:25:37 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-112-disk-1@syncoid_pve_2021-05-31:18:11:48-GMT02:00
2021-05-31.18:25:38 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-100-disk-1@2021-05-31-180000
2021-05-31.18:25:42 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-112-disk-1
2021-05-31.18:25:42 zfs receive -s -F rpool/backups/SOURCE8/rpool/data/subvol-106-disk-0
2021-05-31.18:25:43 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-100-disk-1
2021-05-31.18:25:43 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-112-disk-1@syncoid_pve_2021-05-31:18:11:48-GMT02:00
2021-05-31.18:25:44 zfs destroy rpool/backups/SOURCE8/rpool/data/subvol-106-disk-0@syncoid_pve_2021-05-31:18:11:39-GMT02:00
>>>>May 31 18:25:44 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
2021-05-31.18:25:45 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-100-disk-2@2021-05-31-180000
2021-05-31.18:25:47 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1@syncoid_pve_2021-05-31:18:12:00-GMT02:00
2021-05-31.18:25:47 zfs rollback -R rpool/backups/SOURCE8/rpool/data/subvol-106-disk-1@syncoid_pve_2021-05-31:18:11:51-GMT02:00
2021-05-31.18:25:48 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-100-disk-2
2021-05-31.18:25:49 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-0@2021-05-31-180000
2021-05-31.18:25:49 zfs receive -s -F rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1
2021-05-31.18:25:50 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-0
2021-05-31.18:25:51 zfs destroy rpool/backups/SOURCE4/rpool/data/subvol-115-disk-1@syncoid_pve_2021-05-31:18:12:00-GMT02:00
2021-05-31.18:25:51 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-1@2021-05-31-180000
2021-05-31.18:25:52 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-1
2021-05-31.18:25:54 zfs rollback -R rpool/backups/SOURCE4/rpool/data/subvol-117-disk-0@syncoid_pve_2021-05-31:18:12:10-GMT02:00
2021-05-31.18:25:54 zfs receive -s -F rpool/backups/SOURCE8/rpool/data/subvol-106-disk-1
2021-05-31.18:25:55 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-2@2021-05-31-180000
2021-05-31.18:26:04 zfs destroy rpool/backups/SOURCE8/rpool/data/subvol-106-disk-1@syncoid_pve_2021-05-31:18:11:51-GMT02:00
2021-05-31.18:26:05 zfs rollback -R rpool/backups/SOURCE8/rpool/data/subvol-106-disk-2@syncoid_pve_2021-05-31:18:12:03-GMT02:00
2021-05-31.18:26:09 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-101-disk-2
2021-05-31.18:26:11 zfs rollback -R rpool/backups/SOURCE2/rpool-ssd/data/subvol-104-disk-0@2021-05-31-180000
2021-05-31.18:26:12 zfs receive -s -F rpool/backups/SOURCE2/rpool-ssd/data/subvol-104-disk-0
2021-05-31.18:37:02 zpool import -N rpool
--
2021-06-02.01:07:35 zfs destroy rpool/backups/SOURCE4/rpool/ROOT@syncoid_pve_2021-06-02:00:39:31-GMT02:00
2021-06-02.01:07:37 zfs receive -F rpool/backups/SOURCE5/rpool/data/subvol-101-disk-1
2021-06-02.01:07:37 zfs destroy rpool/backups/SOURCE5/rpool/data/subvol-101-disk-1@syncoid_pve_2021-06-02:00:39:37-GMT02:00
2021-06-02.01:07:39 zfs rollback -R rpool/backups/SOURCE4/rpool/ROOT/pve-1@syncoid_pve_2021-06-02:00:39:44-GMT02:00
2021-06-02.01:07:41 zfs receive -s -F rpool/backups/SOURCE7/rpool-hdd/data/subvol-103-disk-0
2021-06-02.01:07:42 zfs rollback -R rpool/backups/SOURCE7/rpool-hdd/data/vm-101-disk-0@2021-06-02-003000
2021-06-02.01:07:45 zfs rollback -R rpool/backups/SOURCE6/rpool/data/vm-100-disk-0@syncoid_pve_2021-06-02:00:40:18-GMT02:00
2021-06-02.01:07:45 zfs rollback -R rpool/backups/SOURCE5/rpool/data/vm-100-disk-1@syncoid_pve_2021-06-02:00:39:46-GMT02:00
>>>>Jun 02 01:07:46 pve kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
2021-06-02.01:07:48 zfs receive -s -F rpool/backups/SOURCE7/rpool-hdd/data/vm-101-disk-0
2021-06-02.01:07:49 zfs rollback -R rpool/backups/SOURCE7/rpool-hdd/data/vm-102-disk-0@2021-06-02-003000
2021-06-02.01:21:17 zpool import -N rpool
Aren't there some commands that I could run after the exception, to gather more useful data?
It seems that "reflowing" some of the problematic datasets on different mount point in the same pool has had some beneficial effects in my case. I mean just moving datasets with syncoid from rpool/mount1/dataset[x] -> rpool/mount2/dataset[x] and deleting some older snapshots and scrubbing the pool. I'm moving said datasets back to their original position now to see if the fix is permanent. This is all empiric work, but it would support the theory of some type of corruption occurring undetected. It does not seem to affect data tough, just metadata (I hope).
BTW: zfs send/receive is so painfully slow. 2x2 mirrors spinning disk often manage only 20-30MB on average, with lots of time spent under the 5-10MB region.
You might benefit, if you're not using it, from zfs send -c
, since you're using gzip-9, and I would find it difficult to overstate how slow it is.
edit: I'm terribly sorry, I was thinking of someone else in the thread who is using gzip-9. My suggestion remains as stated, but assuming its properties work for you, you might want to explore using -w to avoid paying the de+re-encryption overhead.
Because, for example, I get much faster throughput than that on my single-disk pools sans encryption.
You might also want to use e.g. perf top
and see if it is indeed spending a whole bunch of CPU cycles in the encryption code, or somewhere else.
edit: I'm terribly sorry, I was thinking of someone else in the thread who is using gzip-9. My suggestion remains as stated, but assuming its properties work for you, you might want to explore using -w to avoid paying the de+re-encryption overhead.
Because, for example, I get much faster throughput than that on my single-disk pools sans encryption.
You might also want to use e.g.
perf top
and see if it is indeed spending a whole bunch of CPU cycles in the encryption code, or somewhere else.
I'm spending ~50% of cpu time in encryption/decryption and lz4 compression/decompression. However, running with -w increases the speed only marginally. Guess I'm IO bound.
While doing the above test, I managed to lock up my server, transferring the single dataset from one place to another. Below is the stack trace. I was running a zfs send/receive with -w, so no encryption/decryption was taking place. There were some snapshotting, listing, deleting mixed during the transfer, for sure. Other than that, just 90-100% io usage for a few hours. I might want to mention that the dataset is around 800GB of size. It contains tens of millions of small files (maildir storage) and about 100 snapshots.
Am I pushing ZFS too much ?
[822539.204672] Hardware name: Supermicro X9SRE/X9SRE-3F/X9SRi/X9SRi-3F/X9SRE/X9SRE-3F/X9SRi/X9SRi-3F, BIOS 3.3.V1 02 /04/2020
[822539.204756] Call Trace:
[822539.204813] dump_stack+0x6d/0x8b
[822539.204874] spl_dumpstack+0x29/0x2b [spl]
[822539.204931] spl_panic+0xd3/0xfb [spl]
[822539.205027] ? zfs_btree_insert_leaf_impl.isra.9+0x4c/0x60 [zfs]
[822539.205121] ? abd_iter_advance+0x2d/0x60 [zfs]
[822539.205200] ? zfs_btree_find_parent_idx+0x80/0xd0 [zfs]
[822539.205291] ? zfs_btree_find+0x160/0x2d0 [zfs]
[822539.205381] range_tree_remove_impl+0x67e/0x1000 [zfs]
[822539.205469] ? metaslab_df_alloc+0xc6/0x5e0 [zfs]
[822539.205527] ? ktime_get_raw_ts64+0x38/0xd0
[822539.205614] ? multilist_sublist_unlock+0x2b/0x40 [zfs]
[822539.205701] range_tree_remove+0x10/0x20 [zfs]
[822539.205787] metaslab_alloc_dva+0x264/0x1170 [zfs]
[822539.205873] metaslab_alloc+0xb4/0x240 [zfs]
[822539.205963] zio_dva_allocate+0xd5/0x820 [zfs]
[822539.206019] ? _cond_resched+0x19/0x30
[822539.206074] ? mutex_lock+0x12/0x30
[822539.206169] ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822539.206230] ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822539.206326] zio_execute+0x99/0xf0 [zfs]
[822539.206382] taskq_thread+0x2f7/0x4e0 [spl]
[822539.206438] ? wake_up_q+0x80/0x80
[822539.206523] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822539.206584] kthread+0x120/0x140
[822539.206640] ? task_done+0xb0/0xb0 [spl]
[822539.206694] ? kthread_park+0x90/0x90
[822539.206747] ret_from_fork+0x35/0x40
[822765.000993] INFO: task z_wr_iss:761 blocked for more than 120 seconds.
[822765.001055] Tainted: P O 5.4.114-1-pve #1
[822765.001111] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.001185] z_wr_iss D 0 761 2 0x80004000
[822765.001242] Call Trace:
[822765.001299] __schedule+0x2e6/0x700
[822765.001352] schedule+0x33/0xa0
[822765.001404] schedule_preempt_disabled+0xe/0x10
[822765.001459] __mutex_lock.isra.10+0x2c9/0x4c0
[822765.001513] __mutex_lock_slowpath+0x13/0x20
[822765.001567] mutex_lock+0x2c/0x30
[822765.001672] metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.001759] metaslab_alloc+0xb4/0x240 [zfs]
[822765.001847] zio_dva_allocate+0xd5/0x820 [zfs]
[822765.001934] ? zio_crypt_do_indirect_mac_checksum_abd+0x5d/0x70 [zfs]
[822765.001993] ? _cond_resched+0x19/0x30
[822765.002046] ? mutex_lock+0x12/0x30
[822765.002129] ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.002190] ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.002279] zio_execute+0x99/0xf0 [zfs]
[822765.002335] taskq_thread+0x2f7/0x4e0 [spl]
[822765.002391] ? wake_up_q+0x80/0x80
[822765.002477] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.002536] kthread+0x120/0x140
[822765.002590] ? task_done+0xb0/0xb0 [spl]
[822765.002644] ? kthread_park+0x90/0x90
[822765.002697] ret_from_fork+0x35/0x40
[822765.002751] INFO: task z_wr_iss:762 blocked for more than 120 seconds.
[822765.002809] Tainted: P O 5.4.114-1-pve #1
[822765.002865] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.002938] z_wr_iss D 0 762 2 0x80004000
[822765.002995] Call Trace:
[822765.003046] __schedule+0x2e6/0x700
[822765.003099] schedule+0x33/0xa0
[822765.003151] schedule_preempt_disabled+0xe/0x10
[822765.003205] __mutex_lock.isra.10+0x2c9/0x4c0
[822765.003262] __mutex_lock_slowpath+0x13/0x20
[822765.003317] mutex_lock+0x2c/0x30
[822765.003400] metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.003485] metaslab_alloc+0xb4/0x240 [zfs]
[822765.003573] zio_dva_allocate+0xd5/0x820 [zfs]
[822765.003628] ? _cond_resched+0x19/0x30
[822765.003681] ? mutex_lock+0x12/0x30
[822765.003763] ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.003822] ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.003911] zio_execute+0x99/0xf0 [zfs]
[822765.003966] taskq_thread+0x2f7/0x4e0 [spl]
[822765.004021] ? wake_up_q+0x80/0x80
[822765.004107] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.004165] kthread+0x120/0x140
[822765.004219] ? task_done+0xb0/0xb0 [spl]
[822765.004273] ? kthread_park+0x90/0x90
[822765.004326] ret_from_fork+0x35/0x40
[822765.004380] INFO: task z_wr_iss:763 blocked for more than 120 seconds.
[822765.004437] Tainted: P O 5.4.114-1-pve #1
[822765.004493] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.004568] z_wr_iss D 0 763 2 0x80004000
[822765.004624] Call Trace:
[822765.004676] __schedule+0x2e6/0x700
[822765.004728] schedule+0x33/0xa0
[822765.004781] schedule_preempt_disabled+0xe/0x10
[822765.004836] __mutex_lock.isra.10+0x2c9/0x4c0
[822765.004890] __mutex_lock_slowpath+0x13/0x20
[822765.004944] mutex_lock+0x2c/0x30
[822765.005103] metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.005206] metaslab_alloc+0xb4/0x240 [zfs]
[822765.005305] zio_dva_allocate+0xd5/0x820 [zfs]
[822765.005406] ? zio_crypt_do_indirect_mac_checksum_abd+0x5d/0x70 [zfs]
[822765.005468] ? _cond_resched+0x19/0x30
[822765.005525] ? mutex_lock+0x12/0x30
[822765.005623] ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.005688] ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.005791] zio_execute+0x99/0xf0 [zfs]
[822765.005852] taskq_thread+0x2f7/0x4e0 [spl]
[822765.005910] ? wake_up_q+0x80/0x80
[822765.006010] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.006072] kthread+0x120/0x140
[822765.006132] ? task_done+0xb0/0xb0 [spl]
[822765.006188] ? kthread_park+0x90/0x90
[822765.006245] ret_from_fork+0x35/0x40
[822765.006302] INFO: task z_wr_iss:764 blocked for more than 120 seconds.
[822765.006363] Tainted: P O 5.4.114-1-pve #1
[822765.006422] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.006498] z_wr_iss D 0 764 2 0x80004000
[822765.006558] Call Trace:
[822765.006613] __schedule+0x2e6/0x700
[822765.006670] schedule+0x33/0xa0
[822765.006726] schedule_preempt_disabled+0xe/0x10
[822765.006784] __mutex_lock.isra.10+0x2c9/0x4c0
[822765.006843] __mutex_lock_slowpath+0x13/0x20
[822765.006900] mutex_lock+0x2c/0x30
[822765.006996] metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.007097] metaslab_alloc+0xb4/0x240 [zfs]
[822765.007201] zio_dva_allocate+0xd5/0x820 [zfs]
[822765.007259] ? _cond_resched+0x19/0x30
[822765.007316] ? mutex_lock+0x12/0x30
[822765.007412] ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.007475] ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.007579] zio_execute+0x99/0xf0 [zfs]
[822765.012866] taskq_thread+0x2f7/0x4e0 [spl]
[822765.012929] ? wake_up_q+0x80/0x80
[822765.013056] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.013118] kthread+0x120/0x140
[822765.013177] ? task_done+0xb0/0xb0 [spl]
[822765.013234] ? kthread_park+0x90/0x90
[822765.013292] ret_from_fork+0x35/0x40
[822765.013346] INFO: task z_wr_iss:765 blocked for more than 120 seconds.
[822765.013404] Tainted: P O 5.4.114-1-pve #1
[822765.013460] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.013534] z_wr_iss D 0 765 2 0x80004000
[822765.013590] Call Trace:
[822765.013642] __schedule+0x2e6/0x700
[822765.013695] schedule+0x33/0xa0
[822765.013747] schedule_preempt_disabled+0xe/0x10
[822765.013802] __mutex_lock.isra.10+0x2c9/0x4c0
[822765.013856] __mutex_lock_slowpath+0x13/0x20
[822765.013910] mutex_lock+0x2c/0x30
[822765.013994] metaslab_alloc_dva+0x55d/0x1170 [zfs]
[822765.014081] metaslab_alloc+0xb4/0x240 [zfs]
[822765.014169] zio_dva_allocate+0xd5/0x820 [zfs]
[822765.014224] ? _cond_resched+0x19/0x30
[822765.014277] ? mutex_lock+0x12/0x30
[822765.014360] ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.014419] ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.014508] zio_execute+0x99/0xf0 [zfs]
[822765.014563] taskq_thread+0x2f7/0x4e0 [spl]
[822765.014618] ? wake_up_q+0x80/0x80
[822765.014704] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.014763] kthread+0x120/0x140
[822765.014817] ? task_done+0xb0/0xb0 [spl]
[822765.014871] ? kthread_park+0x90/0x90
[822765.014924] ret_from_fork+0x35/0x40
[822765.014978] INFO: task z_wr_iss:766 blocked for more than 120 seconds.
[822765.015036] Tainted: P O 5.4.114-1-pve #1
[822765.015092] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.015165] z_wr_iss D 0 766 2 0x80004000
[822765.015222] Call Trace:
[822765.015273] __schedule+0x2e6/0x700
[822765.015326] schedule+0x33/0xa0
[822765.015380] spl_panic+0xf9/0xfb [spl]
[822765.015467] ? abd_iter_advance+0x2d/0x60 [zfs]
[822765.015544] ? zfs_btree_find_parent_idx+0x80/0xd0 [zfs]
[822765.015624] ? zfs_btree_find+0x160/0x2d0 [zfs]
[822765.015709] range_tree_remove_impl+0x67e/0x1000 [zfs]
[822765.015794] ? metaslab_df_alloc+0xc6/0x5e0 [zfs]
[822765.015850] ? ktime_get_raw_ts64+0x38/0xd0
[822765.015934] ? multilist_sublist_unlock+0x2b/0x40 [zfs]
[822765.016020] range_tree_remove+0x10/0x20 [zfs]
[822765.016104] metaslab_alloc_dva+0x264/0x1170 [zfs]
[822765.016201] metaslab_alloc+0xb4/0x240 [zfs]
[822765.016289] zio_dva_allocate+0xd5/0x820 [zfs]
[822765.016345] ? _cond_resched+0x19/0x30
[822765.016398] ? mutex_lock+0x12/0x30
[822765.016485] ? metaslab_class_throttle_reserve+0xd0/0xf0 [zfs]
[822765.016548] ? tsd_hash_search.isra.5+0x47/0xa0 [spl]
[822765.016638] zio_execute+0x99/0xf0 [zfs]
[822765.016694] taskq_thread+0x2f7/0x4e0 [spl]
[822765.016749] ? wake_up_q+0x80/0x80
[822765.016835] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.016893] kthread+0x120/0x140
[822765.016948] ? task_done+0xb0/0xb0 [spl]
[822765.017037] ? kthread_park+0x90/0x90
[822765.017096] ret_from_fork+0x35/0x40
[822765.017156] INFO: task z_wr_int:769 blocked for more than 120 seconds.
[822765.017216] Tainted: P O 5.4.114-1-pve #1
[822765.017282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.017359] z_wr_int D 0 769 2 0x80004000
[822765.017420] Call Trace:
[822765.017476] __schedule+0x2e6/0x700
[822765.017533] schedule+0x33/0xa0
[822765.017589] schedule_preempt_disabled+0xe/0x10
[822765.017647] __mutex_lock.isra.10+0x2c9/0x4c0
[822765.017706] __mutex_lock_slowpath+0x13/0x20
[822765.017763] mutex_lock+0x2c/0x30
[822765.017869] metaslab_free_concrete+0xb3/0x240 [zfs]
[822765.017970] metaslab_free_impl+0xad/0xe0 [zfs]
[822765.018070] metaslab_free_dva+0x5e/0x80 [zfs]
[822765.018170] metaslab_free+0x196/0x1c0 [zfs]
[822765.018270] zio_free_sync+0xe4/0x100 [zfs]
[822765.018374] zio_free+0xb9/0xf0 [zfs]
[822765.018472] dsl_free+0x11/0x20 [zfs]
[822765.018572] dsl_dataset_block_kill+0x2a9/0x4f0 [zfs]
[822765.018665] dbuf_write_done+0x1ff/0x220 [zfs]
[822765.018752] arc_write_done+0x8f/0x410 [zfs]
[822765.018855] zio_done+0x43f/0x1020 [zfs]
[822765.018958] zio_execute+0x99/0xf0 [zfs]
[822765.019019] taskq_thread+0x2f7/0x4e0 [spl]
[822765.019078] ? wake_up_q+0x80/0x80
[822765.019178] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.019240] kthread+0x120/0x140
[822765.019299] ? task_done+0xb0/0xb0 [spl]
[822765.019356] ? kthread_park+0x90/0x90
[822765.019412] ret_from_fork+0x35/0x40
[822765.019470] INFO: task z_wr_int:772 blocked for more than 120 seconds.
[822765.019530] Tainted: P O 5.4.114-1-pve #1
[822765.019589] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.019666] z_wr_int D 0 772 2 0x80004000
[822765.019725] Call Trace:
[822765.019781] __schedule+0x2e6/0x700
[822765.019838] schedule+0x33/0xa0
[822765.019893] schedule_preempt_disabled+0xe/0x10
[822765.019952] __mutex_lock.isra.10+0x2c9/0x4c0
[822765.020011] __mutex_lock_slowpath+0x13/0x20
[822765.020068] mutex_lock+0x2c/0x30
[822765.020163] metaslab_free_concrete+0xb3/0x240 [zfs]
[822765.020264] metaslab_free_impl+0xad/0xe0 [zfs]
[822765.020365] metaslab_free_dva+0x5e/0x80 [zfs]
[822765.020464] metaslab_free+0x196/0x1c0 [zfs]
[822765.020567] zio_free_sync+0xe4/0x100 [zfs]
[822765.020671] zio_free+0xb9/0xf0 [zfs]
[822765.020768] dsl_free+0x11/0x20 [zfs]
[822765.020863] dsl_dataset_block_kill+0x2a9/0x4f0 [zfs]
[822765.020956] dbuf_write_done+0x1ff/0x220 [zfs]
[822765.021076] arc_write_done+0x8f/0x410 [zfs]
[822765.021180] zio_done+0x43f/0x1020 [zfs]
[822765.021283] zio_execute+0x99/0xf0 [zfs]
[822765.021344] taskq_thread+0x2f7/0x4e0 [spl]
[822765.021403] ? wake_up_q+0x80/0x80
[822765.021504] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.021565] kthread+0x120/0x140
[822765.021624] ? task_done+0xb0/0xb0 [spl]
[822765.021681] ? kthread_park+0x90/0x90
[822765.021738] ret_from_fork+0x35/0x40
[822765.021797] INFO: task z_wr_int:775 blocked for more than 120 seconds.
[822765.021858] Tainted: P O 5.4.114-1-pve #1
[822765.021917] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.021993] z_wr_int D 0 775 2 0x80004000
[822765.022053] Call Trace:
[822765.022108] __schedule+0x2e6/0x700
[822765.022164] schedule+0x33/0xa0
[822765.022220] schedule_preempt_disabled+0xe/0x10
[822765.022278] __mutex_lock.isra.10+0x2c9/0x4c0
[822765.022337] __mutex_lock_slowpath+0x13/0x20
[822765.022394] mutex_lock+0x2c/0x30
[822765.022492] metaslab_free_concrete+0xb3/0x240 [zfs]
[822765.022590] metaslab_free_impl+0xad/0xe0 [zfs]
[822765.022687] metaslab_free_dva+0x5e/0x80 [zfs]
[822765.022787] metaslab_free+0x196/0x1c0 [zfs]
[822765.022890] zio_free_sync+0xe4/0x100 [zfs]
[822765.022993] zio_free+0xb9/0xf0 [zfs]
[822765.023090] dsl_free+0x11/0x20 [zfs]
[822765.023182] dsl_dataset_block_kill+0x2a9/0x4f0 [zfs]
[822765.023276] dbuf_write_done+0x1ff/0x220 [zfs]
[822765.023366] arc_write_done+0x8f/0x410 [zfs]
[822765.023469] zio_done+0x43f/0x1020 [zfs]
[822765.023572] zio_execute+0x99/0xf0 [zfs]
[822765.023632] taskq_thread+0x2f7/0x4e0 [spl]
[822765.023691] ? wake_up_q+0x80/0x80
[822765.023791] ? zio_taskq_member.isra.14.constprop.20+0x70/0x70 [zfs]
[822765.023853] kthread+0x120/0x140
[822765.023913] ? task_done+0xb0/0xb0 [spl]
[822765.023969] ? kthread_park+0x90/0x90
[822765.024026] ret_from_fork+0x35/0x40
[822765.024089] INFO: task txg_sync:1350 blocked for more than 120 seconds.
[822765.024150] Tainted: P O 5.4.114-1-pve #1
[822765.024209] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[822765.024286] txg_sync D 0 1350 2 0x80004000
[822765.024346] Call Trace:
[822765.024401] __schedule+0x2e6/0x700
[822765.024458] schedule+0x33/0xa0
[822765.024514] schedule_timeout+0x152/0x330
[822765.024573] ? __next_timer_interrupt+0xd0/0xd0
[822765.024631] io_schedule_timeout+0x1e/0x50
[822765.024690] __cv_timedwait_common+0x138/0x170 [spl]
[822765.024750] ? wait_woken+0x80/0x80
[822765.024808] __cv_timedwait_io+0x19/0x20 [spl]
[822765.024912] zio_wait+0x139/0x280 [zfs]
[822765.025009] ? _cond_resched+0x19/0x30
[822765.025109] dsl_pool_sync+0xdc/0x510 [zfs]
[822765.025211] spa_sync+0x5a4/0xfe0 [zfs]
[822765.025268] ? mutex_lock+0x12/0x30
[822765.025368] ? spa_txg_history_init_io+0x104/0x110 [zfs]
[822765.025472] txg_sync_thread+0x2e1/0x4a0 [zfs]
[822765.030783] ? txg_thread_exit.isra.13+0x60/0x60 [zfs]
[822765.030846] thread_generic_wrapper+0x74/0x90 [spl]
[822765.030905] kthread+0x120/0x140
[822765.030960] ? __thread_exit+0x20/0x20 [spl]
[822765.031014] ? kthread_park+0x90/0x90
[822765.031068] ret_from_fork+0x35/0x40
In general: no, you're not pushing it too much. I was passing around a 2 TB snapshot stream while I was trying to reproduce this with a bunch of gzip-9 sends on loop.
It would be useful if you could share the lines immediately preceding what you pasted, as they should include the specific panic message (which I'm guessing is different, since the stack is).
I have been periodically spending time considering what I'd like to know for this and how to get it; I know the former but not the latter. (What I'd like to know, at the moment, is where the NULLs that are being dereferenced in the initial panics are coming from; short of somewhat invasive patching, I don't see a good way to answer that question.)
Of course, this isn't helped by my not being at all familiar with these codepaths. Someone more familiar might be able to pull out an answer much quicker.
Here's another question - did you turn on encryption while your ZFS version was 0.8.X? I just noticed both your sending and receiving pool are on the feature flags from there, and am wondering if the reason I can't reproduce it is that my pool only had encryption turned on recently...
(One more experiment you could play with, if you wish, would be setting zfs_abd_scatter_enabled
to 0 - it may help or hurt performance to doso, but it's just a thought; it's possible it won't help at all.)