zfs task txg_sync blocked for more than 120 seconds.

System information

Type	Version/Name
Distribution Name	Proxmox
Distribution Version	6.0.1-2
Linux Kernel	5.0.18-1-pve
Architecture	64-bit Intel
ZFS Version	0.8.1-pve1
SPL Version	0.8.1-pve1

The pool is 40TB raw (8 x 5TB) in RAIDZ2. De-duplication is off. Drives are new and verified working with badblocks (4 passes) and smart tests. Compression is set to lz4. This has been verified with native encryption on and off on the same hardware. The drives are in 2 x external thunderbolt enclosures, which would be a red flag had they not run for > 1 year on another system with different (same model) drives without fault.

Describe the problem you're observing

The IO hangs during long rsync jobs (e.g. 1TB+) over the network, or shorter ones locally (one pool to another) (1GB+).

This causes rsync to hang and any other programs that use the pool will also go unresponsive if started (as would be expected if the IO is hung. E.g. the console is responsive over SSH, but ls on the root of the pool hangs.

When this happened on ZFS 0.7.x it kernel panicked and crashed the computer, but since 0.8.1 it only causes pool IO related tasks to hang but the server remains responsive otherwise.

This seems related to #2934, #7946, #4361, #2611 and #4011.

The system only has 8GB RAM, which may or may not be an issue, but I wouldn't expect a kernel panic regardless. Other users on the related issues report increasing the RAM, or higher RAM initially with the same error.

CPU usage is generally high (>80%) and write speeds are low throughout the transfer when the pool is encrypted, but this is expected as I believe the missing kernel symbols patch is not in this release (scheduled for 0.8.2?).

Describe how to reproduce the problem

Any long rsync job seems to crash it after a few hours, a fast, large local rsync job (1GB+ pool to pool) will crash it almost immediately.

Include any warning/errors/backtraces from the system logs

Initially:

Aug  6 03:44:37 local kernel: [114789.544998] INFO: task txg_sync:8515 blocked for more than 120 seconds.
Aug  6 03:44:37 local kernel: [114789.545003]       Tainted: P           O      5.0.18-1-pve #1
Aug  6 03:44:37 local kernel: [114789.545004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  6 03:44:37 local kernel: [114789.545006] txg_sync        D    0  8515      2 0x80000000
Aug  6 03:44:37 local kernel: [114789.545007] Call Trace:
Aug  6 03:44:37 local kernel: [114789.545013]  __schedule+0x2d4/0x870
Aug  6 03:44:37 local kernel: [114789.545061]  ? vdev_disk_io_start+0x981/0xa30 [zfs]
Aug  6 03:44:37 local kernel: [114789.545063]  schedule+0x2c/0x70
Aug  6 03:44:37 local kernel: [114789.545064]  schedule_timeout+0x157/0x360
Aug  6 03:44:37 local kernel: [114789.545067]  ? __next_timer_interrupt+0xd0/0xd0
Aug  6 03:44:37 local kernel: [114789.545068]  io_schedule_timeout+0x1e/0x50
Aug  6 03:44:37 local kernel: [114789.545073]  __cv_timedwait_common+0x12f/0x170 [spl]
Aug  6 03:44:37 local kernel: [114789.545075]  ? wait_woken+0x80/0x80
Aug  6 03:44:37 local kernel: [114789.545078]  __cv_timedwait_io+0x19/0x20 [spl]
Aug  6 03:44:37 local kernel: [114789.545109]  zio_wait+0x13a/0x280 [zfs]
Aug  6 03:44:37 local kernel: [114789.545111]  ? _cond_resched+0x19/0x30
Aug  6 03:44:37 local kernel: [114789.545142]  vdev_config_sync+0xe8/0x240 [zfs]
Aug  6 03:44:37 local kernel: [114789.545170]  spa_sync+0xbef/0xfc0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545200]  txg_sync_thread+0x2d9/0x4c0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545258]  ? txg_thread_exit.isra.11+0x60/0x60 [zfs]
Aug  6 03:44:37 local kernel: [114789.545262]  thread_generic_wrapper+0x74/0x90 [spl]
Aug  6 03:44:37 local kernel: [114789.545264]  kthread+0x120/0x140
Aug  6 03:44:37 local kernel: [114789.545267]  ? __thread_exit+0x20/0x20 [spl]
Aug  6 03:44:37 local kernel: [114789.545269]  ? __kthread_parkme+0x70/0x70
Aug  6 03:44:37 local kernel: [114789.545270]  ret_from_fork+0x35/0x40
Aug  6 03:44:37 local kernel: [114789.545272] INFO: task rsync:4565 blocked for more than 120 seconds.
Aug  6 03:44:37 local kernel: [114789.545274]       Tainted: P           O      5.0.18-1-pve #1
Aug  6 03:44:37 local kernel: [114789.545275] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  6 03:44:37 local kernel: [114789.545276] rsync           D    0  4565   9789 0x00000000
Aug  6 03:44:37 local kernel: [114789.545277] Call Trace:
Aug  6 03:44:37 local kernel: [114789.545279]  __schedule+0x2d4/0x870
Aug  6 03:44:37 local kernel: [114789.545281]  schedule+0x2c/0x70
Aug  6 03:44:37 local kernel: [114789.545281]  schedule_timeout+0x157/0x360
Aug  6 03:44:37 local kernel: [114789.545283]  ? __next_timer_interrupt+0xd0/0xd0
Aug  6 03:44:37 local kernel: [114789.545285]  io_schedule_timeout+0x1e/0x50
Aug  6 03:44:37 local kernel: [114789.545287]  __cv_timedwait_common+0x12f/0x170 [spl]
Aug  6 03:44:37 local kernel: [114789.545289]  ? wait_woken+0x80/0x80
Aug  6 03:44:37 local kernel: [114789.545291]  __cv_timedwait_io+0x19/0x20 [spl]
Aug  6 03:44:37 local kernel: [114789.545321]  zio_wait+0x13a/0x280 [zfs]
Aug  6 03:44:37 local kernel: [114789.545338]  dbuf_read+0x4e1/0xb60 [zfs]
Aug  6 03:44:37 local kernel: [114789.545357]  dmu_buf_hold+0x5d/0x90 [zfs]
Aug  6 03:44:37 local kernel: [114789.545387]  zap_lockdir+0x4e/0xb0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545417]  zap_lookup_norm+0x5d/0xc0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545447]  zap_lookup+0x16/0x20 [zfs]
Aug  6 03:44:37 local kernel: [114789.545477]  zfs_dirent_lock+0x524/0x670 [zfs]
Aug  6 03:44:37 local kernel: [114789.545506]  zfs_dirlook+0x92/0x290 [zfs]
Aug  6 03:44:37 local kernel: [114789.545536]  zfs_lookup+0x1ea/0x3a0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545564]  zpl_lookup+0xc4/0x1d0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545567]  ? __d_lookup+0x5a/0x140
Aug  6 03:44:37 local kernel: [114789.545568]  __lookup_slow+0x9b/0x150
Aug  6 03:44:37 local kernel: [114789.545570]  lookup_slow+0x3a/0x60
Aug  6 03:44:37 local kernel: [114789.545571]  walk_component+0x1bf/0x330
Aug  6 03:44:37 local kernel: [114789.545573]  ? link_path_walk.part.42+0x29e/0x540
Aug  6 03:44:37 local kernel: [114789.545574]  path_lookupat.isra.46+0x6d/0x220
Aug  6 03:44:37 local kernel: [114789.545576]  ? _cond_resched+0x19/0x30
Aug  6 03:44:37 local kernel: [114789.545576]  filename_lookup.part.60+0xa0/0x170
Aug  6 03:44:37 local kernel: [114789.545579]  ? __check_object_size+0x166/0x192
Aug  6 03:44:37 local kernel: [114789.545581]  ? strncpy_from_user+0x56/0x1b0
Aug  6 03:44:37 local kernel: [114789.545582]  user_path_at_empty+0x3e/0x50
Aug  6 03:44:37 local kernel: [114789.545583]  vfs_statx+0x76/0xe0
Aug  6 03:44:37 local kernel: [114789.545584]  ? chmod_common+0x12f/0x180
Aug  6 03:44:37 local kernel: [114789.545585]  __do_sys_newlstat+0x3d/0x70
Aug  6 03:44:37 local kernel: [114789.545586]  ? mntput+0x24/0x40
Aug  6 03:44:37 local kernel: [114789.545587]  ? path_put+0x1e/0x30
Aug  6 03:44:37 local kernel: [114789.545588]  ? do_fchmodat+0x6c/0xb0
Aug  6 03:44:37 local kernel: [114789.545589]  __x64_sys_newlstat+0x16/0x20
Aug  6 03:44:37 local kernel: [114789.545591]  do_syscall_64+0x5a/0x110
Aug  6 03:44:37 local kernel: [114789.545592]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug  6 03:44:37 local kernel: [114789.545593] RIP: 0033:0x7ffb93a7ab45
Aug  6 03:44:37 local kernel: [114789.545597] Code: Bad RIP value.
Aug  6 03:44:37 local kernel: [114789.545598] RSP: 002b:00007ffd99de4c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000006
Aug  6 03:44:37 local kernel: [114789.545599] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffb93a7ab45
Aug  6 03:44:37 local kernel: [114789.545599] RDX: 00007ffd99de4ee0 RSI: 00007ffd99de4ee0 RDI: 00007ffd99de70c0
Aug  6 03:44:37 local kernel: [114789.545600] RBP: 0000000000000000 R08: 0000564394b42750 R09: 0000000000000004
Aug  6 03:44:37 local kernel: [114789.545600] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd99de70c0
Aug  6 03:44:37 local kernel: [114789.545600] R13: 00007ffd99de4ee0 R14: 000000000015fdb0 R15: 0000000000000000
Aug  6 03:44:37 local kernel: [114789.545602] INFO: task rsync:4580 blocked for more than 120 seconds.
Aug  6 03:44:37 local kernel: [114789.545604]       Tainted: P           O      5.0.18-1-pve #1
Aug  6 03:44:37 local kernel: [114789.545605] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  6 03:44:37 local kernel: [114789.545607] rsync           D    0  4580   4565 0x00000000
Aug  6 03:44:37 local kernel: [114789.545607] Call Trace:
Aug  6 03:44:37 local kernel: [114789.545609]  __schedule+0x2d4/0x870
Aug  6 03:44:37 local kernel: [114789.545627]  ? dbuf_read+0x7d4/0xb60 [zfs]
Aug  6 03:44:37 local kernel: [114789.545629]  schedule+0x2c/0x70
Aug  6 03:44:37 local kernel: [114789.545631]  cv_wait_common+0x104/0x130 [spl]
Aug  6 03:44:37 local kernel: [114789.545633]  ? wait_woken+0x80/0x80
Aug  6 03:44:37 local kernel: [114789.545635]  __cv_wait+0x15/0x20 [spl]
Aug  6 03:44:37 local kernel: [114789.545656]  dmu_tx_wait+0xbf/0x3b0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545676]  dmu_tx_assign+0x161/0x480 [zfs]
Aug  6 03:44:37 local kernel: [114789.545706]  zfs_write+0x420/0xe80 [zfs]
Aug  6 03:44:37 local kernel: [114789.545708]  ? __skb_datagram_iter+0x1a9/0x2c0
Aug  6 03:44:37 local kernel: [114789.545711]  ? unix_stream_read_generic+0x703/0x900
Aug  6 03:44:37 local kernel: [114789.545740]  zpl_write_common_iovec+0xa3/0xf0 [zfs]
Aug  6 03:44:37 local kernel: [114789.545769]  zpl_iter_write+0xed/0x120 [zfs]
Aug  6 03:44:37 local kernel: [114789.545771]  new_sync_write+0x10c/0x170
Aug  6 03:44:37 local kernel: [114789.545773]  __vfs_write+0x29/0x40
Aug  6 03:44:37 local kernel: [114789.545774]  vfs_write+0xab/0x1b0
Aug  6 03:44:37 local kernel: [114789.545775]  ksys_write+0x5c/0xd0
Aug  6 03:44:37 local kernel: [114789.545776]  __x64_sys_write+0x1a/0x20
Aug  6 03:44:37 local kernel: [114789.545778]  do_syscall_64+0x5a/0x110
Aug  6 03:44:37 local kernel: [114789.545779]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug  6 03:44:37 local kernel: [114789.545779] RIP: 0033:0x7ffb93a7b504
Aug  6 03:44:37 local kernel: [114789.545781] Code: Bad RIP value.
Aug  6 03:44:37 local kernel: [114789.545781] RSP: 002b:00007ffd99de3e68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Aug  6 03:44:37 local kernel: [114789.545782] RAX: ffffffffffffffda RBX: 00007ffb8efd5010 RCX: 00007ffb93a7b504
Aug  6 03:44:37 local kernel: [114789.545783] RDX: 0000000000040000 RSI: 00007ffb8efd5010 RDI: 0000000000000003
Aug  6 03:44:37 local kernel: [114789.545783] RBP: 0000000000000000 R08: 0000000000040000 R09: 00007ffb8f014ae0
Aug  6 03:44:37 local kernel: [114789.545784] R10: 000000006b388e1f R11: 0000000000000246 R12: 0000000000000003
Aug  6 03:44:37 local kernel: [114789.545784] R13: 000056438ae7edc0 R14: 0000000000003ffa R15: 0000000000003ffa

Then periodically:

Aug  6 03:50:40 local kernel: [115152.034700] INFO: task rsync:4580 blocked for more than 120 seconds.
Aug  6 03:50:40 local kernel: [115152.034705]       Tainted: P           O      5.0.18-1-pve #1
Aug  6 03:50:40 local kernel: [115152.034706] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  6 03:50:40 local kernel: [115152.034708] rsync           D    0  4580   4565 0x00000000
Aug  6 03:50:40 local kernel: [115152.034709] Call Trace:
Aug  6 03:50:40 local kernel: [115152.034715]  __schedule+0x2d4/0x870
Aug  6 03:50:40 local kernel: [115152.034754]  ? dbuf_read+0x7d4/0xb60 [zfs]
Aug  6 03:50:40 local kernel: [115152.034756]  schedule+0x2c/0x70
Aug  6 03:50:40 local kernel: [115152.034761]  cv_wait_common+0x104/0x130 [spl]
Aug  6 03:50:40 local kernel: [115152.034764]  ? wait_woken+0x80/0x80
Aug  6 03:50:40 local kernel: [115152.034766]  __cv_wait+0x15/0x20 [spl]
Aug  6 03:50:40 local kernel: [115152.034787]  dmu_tx_wait+0xbf/0x3b0 [zfs]
Aug  6 03:50:40 local kernel: [115152.034808]  dmu_tx_assign+0x161/0x480 [zfs]
Aug  6 03:50:40 local kernel: [115152.034866]  zfs_write+0x420/0xe80 [zfs]
Aug  6 03:50:40 local kernel: [115152.034868]  ? __skb_datagram_iter+0x1a9/0x2c0
Aug  6 03:50:40 local kernel: [115152.034870]  ? unix_stream_read_generic+0x703/0x900
Aug  6 03:50:40 local kernel: [115152.034900]  zpl_write_common_iovec+0xa3/0xf0 [zfs]
Aug  6 03:50:40 local kernel: [115152.034928]  zpl_iter_write+0xed/0x120 [zfs]
Aug  6 03:50:40 local kernel: [115152.034930]  new_sync_write+0x10c/0x170
Aug  6 03:50:40 local kernel: [115152.034932]  __vfs_write+0x29/0x40
Aug  6 03:50:40 local kernel: [115152.034933]  vfs_write+0xab/0x1b0
Aug  6 03:50:40 local kernel: [115152.034934]  ksys_write+0x5c/0xd0
Aug  6 03:50:40 local kernel: [115152.034936]  __x64_sys_write+0x1a/0x20
Aug  6 03:50:40 local kernel: [115152.034937]  do_syscall_64+0x5a/0x110
Aug  6 03:50:40 local kernel: [115152.034939]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug  6 03:50:40 local kernel: [115152.034940] RIP: 0033:0x7ffb93a7b504
Aug  6 03:50:40 local kernel: [115152.034943] Code: Bad RIP value.
Aug  6 03:50:40 local kernel: [115152.034944] RSP: 002b:00007ffd99de3e68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Aug  6 03:50:40 local kernel: [115152.034945] RAX: ffffffffffffffda RBX: 00007ffb8efd5010 RCX: 00007ffb93a7b504
Aug  6 03:50:40 local kernel: [115152.034946] RDX: 0000000000040000 RSI: 00007ffb8efd5010 RDI: 0000000000000003
Aug  6 03:50:40 local kernel: [115152.034946] RBP: 0000000000000000 R08: 0000000000040000 R09: 00007ffb8f014ae0
Aug  6 03:50:40 local kernel: [115152.034947] R10: 000000006b388e1f R11: 0000000000000246 R12: 0000000000000003
Aug  6 03:50:40 local kernel: [115152.034947] R13: 000056438ae7edc0 R14: 0000000000003ffa R15: 0000000000003ffa

Aug 06 '19 18:08 AlexOwen

We had these issues with Ubuntu 18.04 Have you tried: zfs_prefetch_disable=1 and maybe zfs_nocacheflush=1

I am not sure but running the rsync with "nice -n 19" may have helped.

We have not seen many issues with latest patches for Ubuntu Eoan.

Aug 06 '19 19:08 prgwiz

@BrainSlayer isn't that the same problem we had with #9034? o.O

Aug 07 '19 20:08 c0d3z3r0

looks identical, yes. but we triggered the memory allocation problems by wrong flags. here it might be a real out of memory problem which again triggers the endless loop in memory allocation procedure i love so much

Am 07.08.2019 um 22:29 schrieb Michael Niewöhner:

@BrainSlayer https://github.com/BrainSlayer isn't that the same problem we had with #9034 https://github.com/zfsonlinux/zfs/pull/9034? o.O

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/zfsonlinux/zfs/issues/9130?email_source=notifications&email_token=AB2WNEYD5U2UTYU4LUQX6UDQDMWCXA5CNFSM4IJY3MZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ZTU3A#issuecomment-519256684, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2WNE5CEJJPMOM7JBATEE3QDMWCXANCNFSM4IJY3MZA.

Aug 08 '19 02:08 BrainSlayer

which again triggers the endless loop in memory allocation procedure i love so much

which should be fixed by our patch; so maybe we even fix a very common, recurring problem - great!

Aug 08 '19 07:08 c0d3z3r0

I may have the same problem, altough my hang appears when I delete many snapshots (>10000). Is this the same problem? RAIDZ1 on Ubuntu 18.04 LTS with HWE kernel (5.0.0-31-generic). If this is indeed the same problem, i can send you more information, because i still have 30000 snapshots i need to delete and everytime I try deleting large amounts, ZFS hangs. dmesg_zfs_hung_tasks_after_deleting_many_snapshots.log

Nov 12 '19 12:11 pkramme

Having the same problem while pools under heavy IO

System information

Type	Version/Name
Distribution Name	Proxmox
Distribution Version	6.0-11
Linux Kernel	5.0.21-4-pve
Architecture	amd64
ZFS Version	0.8.2-pve2
SPL Version	0.8.2-pve2

CPU: Xeon E3-1285L V3
RAM: 32GB DDR3-1600 ECC
HBA: LSI9305-24i

root@pve:~# zpool status
  pool: NAS
 state: ONLINE
  scan: scrub repaired 0B in 0 days 07:58:47 with 0 errors on Sun Nov 10 08:22:49 2019
config:

        NAME                                   STATE     READ WRITE CKSUM
        NAS                                    ONLINE       0     0     0
          raidz1-0                             ONLINE       0     0     0
            wwn-0x5000cca26bd9cc69             ONLINE       0     0     0
            ata-HGST_HUH721010ALE600_7JJ3GG4C  ONLINE       0     0     0
            ata-HGST_HUH721010ALE600_7JJ535MC  ONLINE       0     0     0

errors: No known data errors

  pool: S860
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:06:37 with 0 errors on Sun Nov 10 00:30:44 2019
config:

        NAME                                           STATE     READ WRITE CKSUM
        S860                                           ONLINE       0     0     0
          ata-Samsung_SSD_860_EVO_4TB_S3YPNB0KA00732Y  ONLINE       0     0     0
          ata-Samsung_SSD_860_EVO_4TB_S3YPNB0KA00919A  ONLINE       0     0     0

errors: No known data errors

Describe the problem you're observing

txg_sync hang during heavy IO, in my case during fio benchmark of NAS and S860 pool st the same time.

Describe how to reproduce the problem

Run fio benchmark of both pools at the same time:

[global]
name=fio-seq-write
filename=fio-seq-write
rw=write
bs=256K
direct=0
numjobs=1
time_based=1
runtime=300

[file1]
size=50G
ioengine=libaio
iodepth=16

Include any warning/errors/backtraces from the system logs

[  968.292727] INFO: task txg_sync:2130 blocked for more than 120 seconds.
[  968.292762]        Tainted: P          0      5.0.21-4-pve #1
[  968.292784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 
[  968.292815] txg_sync        D   0   2130      2 0x80000000
[  968.292837] Call Trace:
[  968.292855] __schedule+0x2d4/0x870
[  968.292872] schedule+Ox2c/0x70
[  968.292887] schedule_timeout+0x157/0x360
[  968.292906] ? __next_timer_interrupt+OxdO/OxdO
[  968.292925]  io_schedule_timeout+Ox1e/Ox50
[  968.292946] __cv_timedwait_common+Oxl2f/0x170  [spl]
[  968.292968] ? wait_woken+0x80/0x80
[  968.292985] __cv_timedwait_io+0x19/0x20 [spl]
[  968.293049] zio_wait+Ox13a/0x280 [zfs]
[  968.293067] ? _cond_resched+0x19/0x30
[  968.293115] dsl_pool_sync+Oxdc/Ox4f0 [zfs]
[  968.293168] spa_sync+0x5b2/Oxfc0 [zfs]
[  968.293221] ? spa_txg_history_init_io+0x106/0x110 [zfs]
[  968.293278] txg_sync_thread+0x2d9/0x4c0 [zfs]
[  968.293331] ? txg_thread_exit.isra.11+0x60/0x60 [zfs]
[  968.293355] thread_generic_wrapper+0x74/0x90 [spl]
[  968.293376] kthread+Ox120/0x140
[  968.293393] ? __thread_exit+0x20/0x20 [sell
[  968.293410]    __kthread_parkme+0x70/0x70
[  968.293428] ret_from_fork+0x35/0x40

Nov 12 '19 14:11 HiFiPhile

I've seen the same issue and just like @BrainSlayer and @c0d3z3r0 it was due to low memory. In my case VM's with proxmox with not enough Memory (for its taste).

Nov 13 '19 18:11 PrivatePuffin

I have the same issue. A bandwidth limited rsync job at 12MB/s, or vm starting in virtual box triggers this and the entire file system freezes. Doesn't seem to be memory related nor CPU related as both stay low during transfers and file access. Disc IO also stays low.

OS - Debian Stretch Kernel - 4.19.0-0.bpo.6-amd64 ZFS Version - 0.8.2-1 Pool: 3 Drive RaidZ1 L2ARC : 80GB SSD Drive. CPU - AMD Athlon 240G System Memory: 32GB System Ram.

ov 13 11:23:32 nas kernel: [59329.141166] INFO: task txg_sync:1228 blocked for more than 120 seconds. Nov 13 11:23:32 nas kernel: [59329.141368] Tainted: P OE 4.19.0-0.bpo.6-amd64 #1 Debian 4.19.67-2+deb10u1~bpo9+1 Nov 13 11:23:32 nas kernel: [59329.141598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 13 11:23:32 nas kernel: [59329.141827] txg_sync D 0 1228 2 0x80000000 Nov 13 11:23:32 nas kernel: [59329.141833] Call Trace: Nov 13 11:23:32 nas kernel: [59329.141846] ? __schedule+0x3f5/0x880 Nov 13 11:23:32 nas kernel: [59329.141920] ? spa_taskq_dispatch_ent+0x85/0xc0 [zfs] Nov 13 11:23:32 nas kernel: [59329.141924] schedule+0x32/0x80 Nov 13 11:23:32 nas kernel: [59329.141927] schedule_timeout+0x16f/0x350 Nov 13 11:23:32 nas kernel: [59329.141931] ? __next_timer_interrupt+0xc0/0xc0 Nov 13 11:23:32 nas kernel: [59329.141934] io_schedule_timeout+0x19/0x40 Nov 13 11:23:32 nas kernel: [59329.141942] __cv_timedwait_common+0x122/0x160 [spl] Nov 13 11:23:32 nas kernel: [59329.141947] ? remove_wait_queue+0x60/0x60 Nov 13 11:23:32 nas kernel: [59329.142006] zio_wait+0x124/0x260 [zfs] Nov 13 11:23:32 nas kernel: [59329.142060] dsl_pool_sync+0x106/0x520 [zfs] Nov 13 11:23:32 nas kernel: [59329.142118] spa_sync+0x550/0xf90 [zfs] Nov 13 11:23:32 nas kernel: [59329.142177] ? spa_txg_history_init_io+0xfe/0x110 [zfs] Nov 13 11:23:32 nas kernel: [59329.142235] txg_sync_thread+0x291/0x450 [zfs] Nov 13 11:23:32 nas kernel: [59329.142331] ? txg_wait_open+0xd0/0xd0 [zfs] Nov 13 11:23:32 nas kernel: [59329.142350] thread_generic_wrapper+0x6f/0x80 [spl] Nov 13 11:23:32 nas kernel: [59329.142353] kthread+0xf8/0x130 Nov 13 11:23:32 nas kernel: [59329.142360] ? __thread_exit+0x20/0x20 [spl] Nov 13 11:23:32 nas kernel: [59329.142361] ? kthread_create_worker_on_cpu+0x70/0x70 Nov 13 11:23:32 nas kernel: [59329.142364] ret_from_fork+0x22/0x40

avg-cpu: %user %nice %system %iowait %steal %idle 9.39 0.00 22.83 9.81 0.00 57.97

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdc 0.08 0.02 223.76 42.34 45460.22 650.13 346.57 0.68 2.71 3.15 0.42 1.47 39.24 sdb 0.05 0.02 146.82 42.74 45513.16 650.17 487.06 0.85 4.69 5.93 0.43 2.47 46.81 sda 0.00 0.45 0.41 488.19 0.85 46865.59 191.84 0.96 2.10 147.95 1.98 1.06 51.77 sdd 0.00 0.00 0.03 0.00 0.95 0.00 61.24 0.00 2.57 2.57 0.00 1.74 0.01 sde 0.09 0.02 230.35 42.18 45461.70 650.16 338.40 0.67 2.63 3.03 0.44 1.45 39.47 sdf 0.01 1.15 3.02 3.70 30.37 100.15 38.86 0.00 0.60 0.70 0.52 0.11 0.07

NAME STATE READ WRITE CKSUM zfspool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 sde ONLINE 0 0 0 ata-ST4000VN000-1H4168_Z301425Y ONLINE 0 0 0 ata-ST4000DM005-2DP166_ZDH164PY ONLINE 0 0 0 logs sdf3 ONLINE 0 0 0

cat /proc/spl/kstat/zfs/dmu_tx 11 1 0x01 12 3264 36614795947 60113570602791 name type data dmu_tx_assigned 4 4137795 dmu_tx_delay 4 0 dmu_tx_error 4 0 dmu_tx_suspended 4 0 dmu_tx_group 4 0 dmu_tx_memory_reserve 4 0 dmu_tx_memory_reclaim 4 0 dmu_tx_dirty_throttle 4 0 dmu_tx_dirty_delay 4 132825 dmu_tx_dirty_over_max 4 28 dmu_tx_dirty_frees_delay 4 1 dmu_tx_quota 4 0

ZFS Subsystem Report Wed Nov 13 11:30:46 2019 ARC Summary: (HEALTHY) Memory Throttle Count: 0

ARC Misc: Deleted: 649.80k Mutex Misses: 21 Evict Skips: 12

ARC Size: 98.85% 15.43 GiB Target Size: (Adaptive) 100.00% 15.61 GiB Min Size (Hard Limit): 6.25% 998.92 MiB Max Size (High Water): 16:1 15.61 GiB

ARC Size Breakdown: Recently Used Cache Size: 43.61% 4.38 GiB Frequently Used Cache Size: 56.39% 5.66 GiB

ARC Hash Breakdown: Elements Max: 1.20M Elements Current: 89.82% 1.08M Collisions: 1.07M Chain Max: 5 Chains: 117.18k

ARC Total accesses: 497.41M Cache Hit Ratio: 99.73% 496.07M Cache Miss Ratio: 0.27% 1.34M Actual Hit Ratio: 99.73% 496.06M

Data Demand Efficiency: 99.17% 433.01k Data Prefetch Efficiency: 8.47% 2.87k

CACHE HITS BY CACHE LIST: Most Recently Used: 2.84% 14.08M Most Frequently Used: 97.16% 481.98M Most Recently Used Ghost: 0.05% 262.03k Most Frequently Used Ghost: 0.00% 11.54k

CACHE HITS BY DATA TYPE: Demand Data: 0.09% 429.42k Prefetch Data: 0.00% 243 Demand Metadata: 99.91% 495.63M Prefetch Metadata: 0.00% 15.26k

CACHE MISSES BY DATA TYPE: Demand Data: 0.27% 3.59k Prefetch Data: 0.20% 2.63k Demand Metadata: 82.28% 1.10M Prefetch Metadata: 17.26% 231.71k

Nov 14 '19 03:11 seanmikhaels

This might all be related to: #9583

Nov 14 '19 06:11 PrivatePuffin

I'm experiencing a similar issue, seen with ZFS packages from Ubuntu Eoan and packages built from ba434b18ec6bcedb9d9eb39fa234fbba55032381, containing the fix from #9583. The issue is seen on a mirrored pool with encrypted datasets, running on a Raspberry Pi 4B, 4 GB RAM, zfs_arc_max=536870912. The Pi is running various Docker workloads backed by the dockerd ZFS storage driver. The average free memory is around 1.5 GB.

[238407.538074] INFO: task z_wr_iss:1683 blocked for more than 120 seconds.
[238407.545147]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.552411] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.560642] z_wr_iss        D    0  1683      2 0x00000028
[238407.560651] Call trace:
[238407.560665]  __switch_to+0xf0/0x150
[238407.560671]  __schedule+0x2cc/0x718
[238407.560675]  schedule+0x3c/0xb8
[238407.560679]  io_schedule+0x20/0x40
[238407.560685]  rq_qos_wait+0x100/0x178
[238407.560690]  wbt_wait+0xb4/0xf0
[238407.560694]  __rq_qos_throttle+0x38/0x50
[238407.560699]  blk_mq_make_request+0x120/0x5f8
[238407.560704]  generic_make_request+0xb4/0x2d8
[238407.560708]  submit_bio+0x44/0x1b0
[238407.560885]  vdev_disk_io_start+0x668/0x9e8 [zfs]
[238407.560992]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.561093]  zio_nowait+0xd4/0x170 [zfs]
[238407.561193]  vdev_queue_io+0x24c/0x268 [zfs]
[238407.561297]  zio_vdev_io_start+0x94/0x2b8 [zfs]
[238407.561401]  zio_nowait+0xd4/0x170 [zfs]
[238407.561502]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.561605]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.561707]  zio_nowait+0xd4/0x170 [zfs]
[238407.561808]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.561907]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[238407.562012]  zio_execute+0xac/0x110 [zfs]
[238407.562084]  taskq_thread+0x304/0x580 [spl]
[238407.562097]  kthread+0x104/0x130
[238407.562104]  ret_from_fork+0x10/0x1c
[238407.562117] INFO: task z_wr_iss_h:1685 blocked for more than 120 seconds.
[238407.569281]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.576523] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.584745] z_wr_iss_h      D    0  1685      2 0x00000028
[238407.584755] Call trace:
[238407.584768]  __switch_to+0xf0/0x150
[238407.584775]  __schedule+0x2cc/0x718
[238407.584779]  schedule+0x3c/0xb8
[238407.584783]  io_schedule+0x20/0x40
[238407.584789]  rq_qos_wait+0x100/0x178
[238407.584793]  wbt_wait+0xb4/0xf0
[238407.584797]  __rq_qos_throttle+0x38/0x50
[238407.584803]  blk_mq_make_request+0x120/0x5f8
[238407.584807]  generic_make_request+0xb4/0x2d8
[238407.584812]  submit_bio+0x44/0x1b0
[238407.584970]  vdev_disk_io_start+0x668/0x9e8 [zfs]
[238407.585065]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.585160]  zio_nowait+0xd4/0x170 [zfs]
[238407.585254]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.585348]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.585442]  zio_nowait+0xd4/0x170 [zfs]
[238407.585536]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.585630]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[238407.585724]  zio_execute+0xac/0x110 [zfs]
[238407.585742]  taskq_thread+0x304/0x580 [spl]
[238407.585748]  kthread+0x104/0x130
[238407.585753]  ret_from_fork+0x10/0x1c
[238407.585770] INFO: task txg_sync:1802 blocked for more than 120 seconds.
[238407.592634]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.599741] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.607819] txg_sync        D    0  1802      2 0x00000028
[238407.607826] Call trace:
[238407.607835]  __switch_to+0xf0/0x150
[238407.607843]  __schedule+0x2cc/0x718
[238407.607847]  schedule+0x3c/0xb8
[238407.607852]  schedule_timeout+0x9c/0x190
[238407.607856]  io_schedule_timeout+0x28/0x48
[238407.607875]  __cv_timedwait_common+0x1a8/0x1f8 [spl]
[238407.607889]  __cv_timedwait_io+0x3c/0x50 [spl]
[238407.608041]  zio_wait+0x130/0x2a0 [zfs]
[238407.608140]  dsl_pool_sync+0x3fc/0x498 [zfs]
[238407.608244]  spa_sync+0x530/0xe98 [zfs]
[238407.608341]  txg_sync_thread+0x2d0/0x488 [zfs]
[238407.608357]  thread_generic_wrapper+0x74/0xa0 [spl]
[238407.608366]  kthread+0x104/0x130
[238407.608371]  ret_from_fork+0x10/0x1c
[238407.608430] INFO: task dockerd:32620 blocked for more than 120 seconds.
[238407.615300]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.622415] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.630500] dockerd         D    0 32620      1 0x00000008
[238407.630508] Call trace:
[238407.630519]  __switch_to+0xf0/0x150
[238407.630528]  __schedule+0x2cc/0x718
[238407.630532]  schedule+0x3c/0xb8
[238407.630551]  cv_wait_common+0x188/0x1b0 [spl]
[238407.630565]  __cv_wait+0x30/0x40 [spl]
[238407.630687]  zil_commit_impl+0x234/0xd68 [zfs]
[238407.630784]  zil_commit+0x48/0x70 [zfs]
[238407.630882]  zfs_fsync+0x84/0x108 [zfs]
[238407.630977]  zpl_fsync+0x7c/0xb0 [zfs]
[238407.630984]  vfs_fsync_range+0x4c/0x88
[238407.630988]  do_fsync+0x48/0x88
[238407.630993]  __arm64_sys_fsync+0x24/0x38
[238407.630998]  el0_svc_common.constprop.0+0xe0/0x1e8
[238407.631001]  el0_svc_handler+0x34/0xa0
[238407.631006]  el0_svc+0x10/0x14
[238407.631047] INFO: task mongod:27428 blocked for more than 120 seconds.
[238407.637819]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.644927] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.653010] mongod          D    0 27428  25240 0x00000800
[238407.653016] Call trace:
[238407.653028]  __switch_to+0xf0/0x150
[238407.653036]  __schedule+0x2cc/0x718
[238407.653041]  schedule+0x3c/0xb8
[238407.653063]  cv_wait_common+0x188/0x1b0 [spl]
[238407.653077]  __cv_wait+0x30/0x40 [spl]
[238407.653205]  zil_commit_impl+0x234/0xd68 [zfs]
[238407.653310]  zil_commit+0x48/0x70 [zfs]
[238407.653407]  zfs_fsync+0x84/0x108 [zfs]
[238407.653502]  zpl_fsync+0x7c/0xb0 [zfs]
[238407.653508]  vfs_fsync_range+0x4c/0x88
[238407.653513]  __arm64_sys_msync+0x190/0x1f8
[238407.653518]  el0_svc_common.constprop.0+0xe0/0x1e8
[238407.653522]  el0_svc_handler+0x34/0xa0
[238407.653526]  el0_svc+0x10/0x14
[238407.653572] INFO: task mongod:32146 blocked for more than 120 seconds.
[238407.660351]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.667463] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.675546] mongod          D    0 32146  25240 0x00000800
[238407.675557] Call trace:
[238407.675565]  __switch_to+0xf0/0x150
[238407.675570]  __schedule+0x2cc/0x718
[238407.675574]  schedule+0x3c/0xb8
[238407.675592]  cv_wait_common+0x188/0x1b0 [spl]
[238407.675605]  __cv_wait+0x30/0x40 [spl]
[238407.675710]  zil_commit_impl+0x234/0xd68 [zfs]
[238407.675806]  zil_commit+0x48/0x70 [zfs]
[238407.675900]  zfs_fsync+0x84/0x108 [zfs]
[238407.675995]  zpl_fsync+0x7c/0xb0 [zfs]
[238407.676001]  vfs_fsync_range+0x4c/0x88
[238407.676005]  do_fsync+0x48/0x88
[238407.676010]  __arm64_sys_fdatasync+0x24/0x38
[238407.676015]  el0_svc_common.constprop.0+0xe0/0x1e8
[238407.676018]  el0_svc_handler+0x34/0xa0
[238407.676022]  el0_svc+0x10/0x14
[238407.676091] INFO: task z_wr_int:15893 blocked for more than 120 seconds.
[238407.683033]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.690134] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.698212] z_wr_int        D    0 15893      2 0x00000028
[238407.698219] Call trace:
[238407.698231]  __switch_to+0xf0/0x150
[238407.698239]  __schedule+0x2cc/0x718
[238407.698243]  schedule+0x3c/0xb8
[238407.698247]  io_schedule+0x20/0x40
[238407.698253]  rq_qos_wait+0x100/0x178
[238407.698258]  wbt_wait+0xb4/0xf0
[238407.698263]  __rq_qos_throttle+0x38/0x50
[238407.698268]  blk_mq_make_request+0x120/0x5f8
[238407.698274]  generic_make_request+0xb4/0x2d8
[238407.698279]  submit_bio+0x44/0x1b0
[238407.698402]  vdev_disk_io_start+0x668/0x9e8 [zfs]
[238407.698500]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.698601]  zio_nowait+0xd4/0x170 [zfs]
[238407.698700]  vdev_queue_io_done+0x1f0/0x2a0 [zfs]
[238407.698795]  zio_vdev_io_done+0xec/0x220 [zfs]
[238407.698889]  zio_execute+0xac/0x110 [zfs]
[238407.698906]  taskq_thread+0x304/0x580 [spl]
[238407.698912]  kthread+0x104/0x130
[238407.698916]  ret_from_fork+0x10/0x1c
[238407.698925] INFO: task z_wr_iss_h:15990 blocked for more than 120 seconds.
[238407.706051]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.713152] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.721228] z_wr_iss_h      D    0 15990      2 0x00000028
[238407.721234] Call trace:
[238407.721243]  __switch_to+0xf0/0x150
[238407.721254]  __schedule+0x2cc/0x718
[238407.721259]  schedule+0x3c/0xb8
[238407.721263]  io_schedule+0x20/0x40
[238407.721269]  rq_qos_wait+0x100/0x178
[238407.721274]  wbt_wait+0xb4/0xf0
[238407.721278]  __rq_qos_throttle+0x38/0x50
[238407.721283]  blk_mq_make_request+0x120/0x5f8
[238407.721289]  generic_make_request+0xb4/0x2d8
[238407.721294]  submit_bio+0x44/0x1b0
[238407.721401]  vdev_disk_io_start+0x668/0x9e8 [zfs]
[238407.721505]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.721604]  zio_nowait+0xd4/0x170 [zfs]
[238407.721698]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.721793]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.721887]  zio_nowait+0xd4/0x170 [zfs]
[238407.721984]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.722112]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[238407.722212]  zio_execute+0xac/0x110 [zfs]
[238407.722229]  taskq_thread+0x304/0x580 [spl]
[238407.722237]  kthread+0x104/0x130
[238407.722242]  ret_from_fork+0x10/0x1c
[238407.722248] INFO: task z_wr_iss_h:15991 blocked for more than 121 seconds.
[238407.729367]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238407.736471] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238407.744552] z_wr_iss_h      D    0 15991      2 0x00000028
[238407.744559] Call trace:
[238407.744572]  __switch_to+0xf0/0x150
[238407.744582]  __schedule+0x2cc/0x718
[238407.744586]  schedule+0x3c/0xb8
[238407.744589]  io_schedule+0x20/0x40
[238407.744595]  rq_qos_wait+0x100/0x178
[238407.744600]  wbt_wait+0xb4/0xf0
[238407.744604]  __rq_qos_throttle+0x38/0x50
[238407.744608]  blk_mq_make_request+0x120/0x5f8
[238407.744613]  generic_make_request+0xb4/0x2d8
[238407.744617]  submit_bio+0x44/0x1b0
[238407.744748]  vdev_disk_io_start+0x668/0x9e8 [zfs]
[238407.744844]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.744938]  zio_nowait+0xd4/0x170 [zfs]
[238407.745032]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.745126]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238407.745221]  zio_nowait+0xd4/0x170 [zfs]
[238407.745315]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238407.745409]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[238407.745503]  zio_execute+0xac/0x110 [zfs]
[238407.745520]  taskq_thread+0x304/0x580 [spl]
[238407.745526]  kthread+0x104/0x130
[238407.745530]  ret_from_fork+0x10/0x1c
[238528.372629] INFO: task z_wr_iss:1683 blocked for more than 241 seconds.
[238528.379559]       Tainted: P         C OE     5.3.0-1014-raspi2 #16-Ubuntu
[238528.386729] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[238528.394820] z_wr_iss        D    0  1683      2 0x00000028
[238528.394828] Call trace:
[238528.394841]  __switch_to+0xf0/0x150
[238528.394848]  __schedule+0x2cc/0x718
[238528.394852]  schedule+0x3c/0xb8
[238528.394855]  io_schedule+0x20/0x40
[238528.394861]  rq_qos_wait+0x100/0x178
[238528.394866]  wbt_wait+0xb4/0xf0
[238528.394871]  __rq_qos_throttle+0x38/0x50
[238528.394876]  blk_mq_make_request+0x120/0x5f8
[238528.394881]  generic_make_request+0xb4/0x2d8
[238528.394885]  submit_bio+0x44/0x1b0
[238528.395046]  vdev_disk_io_start+0x668/0x9e8 [zfs]
[238528.395142]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238528.395238]  zio_nowait+0xd4/0x170 [zfs]
[238528.395332]  vdev_queue_io+0x24c/0x268 [zfs]
[238528.395427]  zio_vdev_io_start+0x94/0x2b8 [zfs]
[238528.395521]  zio_nowait+0xd4/0x170 [zfs]
[238528.395615]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238528.395709]  zio_vdev_io_start+0xdc/0x2b8 [zfs]
[238528.395803]  zio_nowait+0xd4/0x170 [zfs]
[238528.395897]  vdev_mirror_io_start+0xa8/0x1b0 [zfs]
[238528.395991]  zio_vdev_io_start+0x248/0x2b8 [zfs]
[238528.396084]  zio_execute+0xac/0x110 [zfs]
[238528.396103]  taskq_thread+0x304/0x580 [spl]
[238528.396109]  kthread+0x104/0x130
[238528.396113]  ret_from_fork+0x10/0x1c

Dec 19 '19 19:12 lassebm

@lassebm Besides it being blocked IO, your stacktrace looks notably different.

Dec 19 '19 19:12 PrivatePuffin

@Ornias1993 I wasn't quite sure what to look for. Would you think it's worth a new issue?

Dec 19 '19 19:12 lassebm

@lassebm Actually, now I take a second look at it... The lower parts of the trace are actually quite similar...

Dec 19 '19 20:12 PrivatePuffin

System information

Type	Version/Name
Distribution Name	Ubuntu
Distribution Version	19.10
Linux Kernel	5.3.0-24-generic
Architecture	NUC7i3DNHE Intel i3 / 8GB RAM.
ZFS Version	0.8.1-1ubuntu14
SPL Version	0.8.1-1ubuntu14

I don't know if it's related but I have a similar issue on Ubuntu Eoan when attempting to rsync large amounts of NAS data from one machine to another. Transfer speeds are around 30MB/s and then suddenly they'll hit 5MB/s when these errors occur and will stay at that speed. Cancelling the rsync and waiting for the io to catch up temporarily fixes it.

[Tue Jan  7 06:06:14 2020] INFO: task txg_sync:1076 blocked for more than 120 seconds.
[Tue Jan  7 06:06:14 2020]       Tainted: P           O      5.3.0-24-generic #26-Ubuntu
[Tue Jan  7 06:06:14 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jan  7 06:06:14 2020] txg_sync        D    0  1076      2 0x80004000
[Tue Jan  7 06:06:14 2020] Call Trace:
[Tue Jan  7 06:06:14 2020]  __schedule+0x2b9/0x6c0
[Tue Jan  7 06:06:14 2020]  schedule+0x42/0xb0
[Tue Jan  7 06:06:14 2020]  schedule_timeout+0x152/0x2f0
[Tue Jan  7 06:06:14 2020]  ? spa_taskq_dispatch_ent+0x4f/0x70 [zfs]
[Tue Jan  7 06:06:14 2020]  ? __next_timer_interrupt+0xe0/0xe0
[Tue Jan  7 06:06:14 2020]  io_schedule_timeout+0x1e/0x50
[Tue Jan  7 06:06:14 2020]  __cv_timedwait_common+0x15e/0x1c0 [spl]
[Tue Jan  7 06:06:14 2020]  ? wait_woken+0x80/0x80
[Tue Jan  7 06:06:14 2020]  __cv_timedwait_io+0x19/0x20 [spl]
[Tue Jan  7 06:06:14 2020]  zio_wait+0x11b/0x230 [zfs]
[Tue Jan  7 06:06:14 2020]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[Tue Jan  7 06:06:14 2020]  dsl_pool_sync+0xbc/0x410 [zfs]
[Tue Jan  7 06:06:14 2020]  spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs]
[Tue Jan  7 06:06:14 2020]  spa_sync+0x312/0x5b0 [zfs]
[Tue Jan  7 06:06:14 2020]  txg_sync_thread+0x279/0x310 [zfs]
[Tue Jan  7 06:06:14 2020]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[Tue Jan  7 06:06:14 2020]  thread_generic_wrapper+0x83/0xa0 [spl]
[Tue Jan  7 06:06:14 2020]  kthread+0x104/0x140
[Tue Jan  7 06:06:14 2020]  ? clear_bit+0x20/0x20 [spl]
[Tue Jan  7 06:06:14 2020]  ? kthread_park+0x80/0x80
[Tue Jan  7 06:06:14 2020]  ret_from_fork+0x35/0x40

Jan 08 '20 09:01 kebian

@kebian If you fill out the same little form thats used on the issue template (os, distro, zfsversion etc.), it would be usefull for confirmation.

Jan 08 '20 10:01 PrivatePuffin

@kebian If you fill out the same little form thats used on the issue template (os, distro, zfsversion etc.), it would be usefull for confirmation.

Ok, I edited my above comment and added those details. Thanks.

Jan 08 '20 14:01 kebian

@kebian Awesome...

8GB ram, you sure you aren't out of memory?

Jan 08 '20 16:01 PrivatePuffin

can you share the disk scheduler that is in use by the underlying storage devices?

Jan 09 '20 20:01 bghira

can you share the disk scheduler that is in use by the underlying storage devices?

Welcome to github mister bug stuff! :) Maybe it would be relevant to explain to reporters how they can find out (not everyone is as well versed in Linux) and why it might be relevant to the issue?

Jan 09 '20 21:01 PrivatePuffin

Im having same issue on OpenMediaVault 5.3.10-1 with kernel 5.4.0-0-bpo.4-amd64 and ZFS v0.8.3-1~bpo10+1 although i remember having this issue for long time. It is usually triggered when im trying to delete or move some files in the console and at the same time im accessing the storage over the network.

[52320.685399] INFO: task txg_sync:856 blocked for more than 120 seconds.
[52320.685435]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[52320.685465] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[52320.685502] txg_sync        D    0   856      2 0x80004000
[52320.685505] Call Trace:
[52320.685513]  ? __schedule+0x2e6/0x6f0
[52320.685515]  schedule+0x2f/0xa0
[52320.685524]  taskq_wait+0x7a/0xc0 [spl]
[52320.685528]  ? finish_wait+0x80/0x80
[52320.685580]  dmu_objset_sync+0x357/0x560 [zfs]
[52320.685621]  dsl_dataset_sync+0x6d/0x260 [zfs]
[52320.685663]  dsl_pool_sync+0xe8/0x510 [zfs]
[52320.685713]  spa_sync+0x5af/0xfb0 [zfs]
[52320.685764]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
[52320.685813]  txg_sync_thread+0x294/0x460 [zfs]
[52320.685817]  ? __switch_to_asm+0x34/0x70
[52320.685866]  ? txg_thread_exit.isra.10+0x60/0x60 [zfs]
[52320.685871]  ? __thread_exit+0x20/0x20 [spl]
[52320.685876]  thread_generic_wrapper+0x6f/0x80 [spl]
[52320.685879]  kthread+0x112/0x130
[52320.685881]  ? kthread_park+0x80/0x80
[52320.685883]  ret_from_fork+0x35/0x40
[52320.685896] INFO: task z_rd_int:17925 blocked for more than 120 seconds.
[52320.685921]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[52320.685950] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[52320.685977] z_rd_int        D    0 17925      2 0x80004000
[52320.685978] Call Trace:
[52320.685980]  ? __schedule+0x2e6/0x6f0
[52320.685981]  schedule+0x2f/0xa0
[52320.685982]  schedule_preempt_disabled+0xa/0x10
[52320.685984]  __mutex_lock.isra.8+0x2b5/0x4a0
[52320.686034]  vdev_queue_io_done+0x46/0x250 [zfs]
[52320.686036]  ? _cond_resched+0x15/0x30
[52320.686084]  zio_vdev_io_done+0xd3/0x1f0 [zfs]
[52320.686133]  zio_execute+0x90/0xf0 [zfs]
[52320.686139]  taskq_thread+0x2df/0x520 [spl]
[52320.686141]  ? __switch_to_asm+0x34/0x70
[52320.686143]  ? wake_up_q+0x80/0x80
[52320.686202]  ? zio_taskq_member.isra.9.constprop.14+0x70/0x70 [zfs]
[52320.686207]  ? taskq_thread_spawn+0x50/0x50 [spl]
[52320.686209]  kthread+0x112/0x130
[52320.686210]  ? kthread_park+0x80/0x80
[52320.686212]  ret_from_fork+0x35/0x40
[52441.515498] INFO: task txg_sync:856 blocked for more than 241 seconds.
[52441.515549]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[52441.515578] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[52441.515606] txg_sync        D    0   856      2 0x80004000
[52441.515609] Call Trace:
[52441.515617]  ? __schedule+0x2e6/0x6f0
[52441.515618]  schedule+0x2f/0xa0
[52441.515627]  taskq_wait+0x7a/0xc0 [spl]
[52441.515631]  ? finish_wait+0x80/0x80
[52441.515693]  dmu_objset_sync+0x357/0x560 [zfs]
[52441.515734]  dsl_dataset_sync+0x6d/0x260 [zfs]
[52441.515776]  dsl_pool_sync+0xe8/0x510 [zfs]
[52441.515825]  spa_sync+0x5af/0xfb0 [zfs]
[52441.515876]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
[52441.515926]  txg_sync_thread+0x294/0x460 [zfs]
[52441.515929]  ? __switch_to_asm+0x34/0x70
[52441.515978]  ? txg_thread_exit.isra.10+0x60/0x60 [zfs]
[52441.515984]  ? __thread_exit+0x20/0x20 [spl]
[52441.515988]  thread_generic_wrapper+0x6f/0x80 [spl]
[52441.515992]  kthread+0x112/0x130
[52441.515993]  ? kthread_park+0x80/0x80
[52441.515995]  ret_from_fork+0x35/0x40
[52441.516012] INFO: task z_rd_int:17925 blocked for more than 241 seconds.
[52441.516037]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[52441.516081] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[52441.516115] z_rd_int        D    0 17925      2 0x80004000
[52441.516116] Call Trace:
[52441.516119]  ? __schedule+0x2e6/0x6f0
[52441.516120]  schedule+0x2f/0xa0
[52441.516121]  schedule_preempt_disabled+0xa/0x10
[52441.516123]  __mutex_lock.isra.8+0x2b5/0x4a0
[52441.516173]  vdev_queue_io_done+0x46/0x250 [zfs]
[52441.516176]  ? _cond_resched+0x15/0x30
[52441.516227]  zio_vdev_io_done+0xd3/0x1f0 [zfs]
[52441.516288]  zio_execute+0x90/0xf0 [zfs]
[52441.516299]  taskq_thread+0x2df/0x520 [spl]
[52441.516301]  ? __switch_to_asm+0x34/0x70
[52441.516302]  ? wake_up_q+0x80/0x80
[52441.516350]  ? zio_taskq_member.isra.9.constprop.14+0x70/0x70 [zfs]
[52441.516356]  ? taskq_thread_spawn+0x50/0x50 [spl]
[52441.516357]  kthread+0x112/0x130
[52441.516359]  ? kthread_park+0x80/0x80
[52441.516360]  ret_from_fork+0x35/0x40
[52562.345526] INFO: task txg_sync:856 blocked for more than 362 seconds.
[52562.345561]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[52562.345590] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[52562.345619] txg_sync        D    0   856      2 0x80004000
[52562.345621] Call Trace:
[52562.345630]  ? __schedule+0x2e6/0x6f0
[52562.345631]  schedule+0x2f/0xa0
[52562.345640]  taskq_wait+0x7a/0xc0 [spl]
[52562.345644]  ? finish_wait+0x80/0x80
[52562.345698]  dmu_objset_sync+0x357/0x560 [zfs]
[52562.345752]  dsl_dataset_sync+0x6d/0x260 [zfs]
[52562.345792]  dsl_pool_sync+0xe8/0x510 [zfs]
[52562.345840]  spa_sync+0x5af/0xfb0 [zfs]
[52562.345889]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
[52562.345941]  txg_sync_thread+0x294/0x460 [zfs]
[52562.345944]  ? __switch_to_asm+0x34/0x70
[52562.346007]  ? txg_thread_exit.isra.10+0x60/0x60 [zfs]
[52562.346012]  ? __thread_exit+0x20/0x20 [spl]
[52562.346017]  thread_generic_wrapper+0x6f/0x80 [spl]
[52562.346020]  kthread+0x112/0x130
[52562.346022]  ? kthread_park+0x80/0x80
[52562.346024]  ret_from_fork+0x35/0x40
[52562.346042] INFO: task z_rd_int:17925 blocked for more than 362 seconds.
[52562.346067]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[52562.346096] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[52562.346123] z_rd_int        D    0 17925      2 0x80004000
[52562.346124] Call Trace:
[52562.346126]  ? __schedule+0x2e6/0x6f0
[52562.346128]  schedule+0x2f/0xa0
[52562.346129]  schedule_preempt_disabled+0xa/0x10
[52562.346130]  __mutex_lock.isra.8+0x2b5/0x4a0
[52562.346181]  vdev_queue_io_done+0x46/0x250 [zfs]
[52562.346183]  ? _cond_resched+0x15/0x30
[52562.346231]  zio_vdev_io_done+0xd3/0x1f0 [zfs]
[52562.346280]  zio_execute+0x90/0xf0 [zfs]
[52562.346286]  taskq_thread+0x2df/0x520 [spl]
[52562.346288]  ? __switch_to_asm+0x34/0x70
[52562.346290]  ? wake_up_q+0x80/0x80
[52562.346337]  ? zio_taskq_member.isra.9.constprop.14+0x70/0x70 [zfs]
[52562.346343]  ? taskq_thread_spawn+0x50/0x50 [spl]
[52562.346345]  kthread+0x112/0x130
[52562.346346]  ? kthread_park+0x80/0x80
[52562.346348]  ret_from_fork+0x35/0x40
[52594.864825] systemd[1]: systemd-logind.service: Watchdog timeout (limit 3min)!
[52594.864886] systemd[1]: systemd-logind.service: Killing process 1379 (systemd-logind) with signal SIGABRT.
[52598.762708] systemd[1]: systemd-journald.service: Main process exited, code=killed, status=6/ABRT
[52598.762923] systemd[1]: systemd-journald.service: Failed with result 'watchdog'.
[52598.763820] systemd[1]: systemd-journald.service: Service has no hold-off time (RestartSec=0), scheduling restart.
[52598.764533] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1.
[52598.765565] systemd[1]: systemd-journal-flush.service: Succeeded.
[52598.765650] systemd[1]: Stopped Flush Journal to Persistent Storage.
[52598.765680] systemd[1]: Stopping Flush Journal to Persistent Storage...
[52598.765752] systemd[1]: Stopped Journal Service.
[52598.982722] systemd-journald[17953]: File /run/log/journal/e4adb6ec49c44c1db846cadde006c4e4/system.journal corrupted or uncleanly shut down, renaming and replacing.
[383032.378146] INFO: task z_iput:752 blocked for more than 120 seconds.
[383032.378185]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[383032.379018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[383032.379833] z_iput          D    0   752      2 0x80004000
[383032.379837] Call Trace:
[383032.379847]  ? __schedule+0x2e6/0x6f0
[383032.379849]  schedule+0x2f/0xa0
[383032.379850]  io_schedule+0x12/0x40
[383032.379860]  cv_wait_common+0xaf/0x130 [spl]
[383032.379864]  ? finish_wait+0x80/0x80
[383032.379941]  txg_wait_open+0x8a/0xd0 [zfs]
[383032.379980]  dmu_free_long_range+0x2fe/0x4b0 [zfs]
[383032.380035]  zfs_rmnode+0x262/0x330 [zfs]
[383032.380089]  ? zfs_zinactive+0xdc/0xf0 [zfs]
[383032.380141]  zfs_inactive+0x84/0x210 [zfs]
[383032.380146]  ? unmap_mapping_pages+0x5e/0x130
[383032.380196]  zpl_evict_inode+0x3c/0x50 [zfs]
[383032.380201]  evict+0xd2/0x1a0
[383032.380207]  taskq_thread+0x2df/0x520 [spl]
[383032.380210]  ? wake_up_q+0x80/0x80
[383032.380216]  ? taskq_thread_spawn+0x50/0x50 [spl]
[383032.380219]  kthread+0x112/0x130
[383032.380221]  ? kthread_park+0x80/0x80
[383032.380224]  ret_from_fork+0x35/0x40
[383032.380228] INFO: task txg_sync:856 blocked for more than 120 seconds.
[383032.380998]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[383032.381784] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[383032.382595] txg_sync        D    0   856      2 0x80004000
[383032.382597] Call Trace:
[383032.382600]  ? __schedule+0x2e6/0x6f0
[383032.382601]  schedule+0x2f/0xa0
[383032.382603]  schedule_timeout+0x159/0x310
[383032.382608]  ? __next_timer_interrupt+0xc0/0xc0
[383032.382609]  io_schedule_timeout+0x19/0x40
[383032.382617]  __cv_timedwait_common+0x125/0x160 [spl]
[383032.382633]  ? finish_wait+0x80/0x80
[383032.382695]  zio_wait+0x13a/0x280 [zfs]
[383032.382730]  ? bplist_iterate+0x126/0x140 [zfs]
[383032.382782]  spa_sync+0x5f0/0xfb0 [zfs]
[383032.382837]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
[383032.382891]  txg_sync_thread+0x294/0x460 [zfs]
[383032.382893]  ? __switch_to_asm+0x34/0x70
[383032.382946]  ? txg_thread_exit.isra.10+0x60/0x60 [zfs]
[383032.382952]  ? __thread_exit+0x20/0x20 [spl]
[383032.382957]  thread_generic_wrapper+0x6f/0x80 [spl]
[383032.382960]  kthread+0x112/0x130
[383032.382961]  ? kthread_park+0x80/0x80
[383032.382963]  ret_from_fork+0x35/0x40
[383032.382972] INFO: task nfsd:1585 blocked for more than 120 seconds.
[383032.383754]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[383032.384579] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[383032.385414] nfsd            D    0  1585      2 0x80004000
[383032.385415] Call Trace:
[383032.385417]  ? __schedule+0x2e6/0x6f0
[383032.385419]  ? __wake_up_common+0x7a/0x180
[383032.385420]  schedule+0x2f/0xa0
[383032.385421]  io_schedule+0x12/0x40
[383032.385426]  cv_wait_common+0xaf/0x130 [spl]
[383032.385428]  ? finish_wait+0x80/0x80
[383032.385483]  txg_wait_open+0x8a/0xd0 [zfs]
[383032.385521]  dmu_free_long_range+0x2fe/0x4b0 [zfs]
[383032.385573]  zfs_trunc+0x77/0x1f0 [zfs]
[383032.385625]  ? sa_lookup_locked+0x64/0xb0 [zfs]
[383032.385677]  zfs_freesp+0x9b/0x430 [zfs]
[383032.385729]  ? zfs_dirent_unlock+0xc0/0x140 [zfs]
[383032.385766]  ? dmu_object_size_from_db+0x22/0x80 [zfs]
[383032.385767]  ? _cond_resched+0x15/0x30
[383032.385768]  ? mutex_lock+0xe/0x30
[383032.385817]  ? rrw_exit+0x59/0x150 [zfs]
[383032.385869]  ? zfs_lookup+0x143/0x3a0 [zfs]
[383032.385919]  zpl_xattr_set_dir.isra.15+0x114/0x2a0 [zfs]
[383032.385970]  zpl_xattr_set+0x144/0x360 [zfs]
[383032.386021]  zpl_set_acl+0x14e/0x210 [zfs]
[383032.386071]  zpl_chmod_acl+0xc9/0xe0 [zfs]
[383032.386123]  zpl_setattr+0x150/0x180 [zfs]
[383032.386138]  notify_change+0x339/0x4a0
[383032.386155]  nfsd_setattr+0x3ff/0x470 [nfsd]
[383032.386176]  ? write_bytes_to_xdr_buf+0xbc/0xe0 [sunrpc]
[383032.386187]  nfsd4_setattr+0x7b/0x140 [nfsd]
[383032.386198]  nfsd4_proc_compound+0x381/0x6f0 [nfsd]
[383032.386206]  nfsd_dispatch+0x9e/0x210 [nfsd]
[383032.386221]  svc_process_common+0x342/0x6b0 [sunrpc]
[383032.386236]  ? svc_recv+0x2ff/0x980 [sunrpc]
[383032.386243]  ? nfsd_svc+0x2c0/0x2c0 [nfsd]
[383032.386251]  ? nfsd_destroy+0x50/0x50 [nfsd]
[383032.386264]  svc_process+0xd1/0x110 [sunrpc]
[383032.386272]  nfsd+0xe3/0x140 [nfsd]
[383032.386274]  kthread+0x112/0x130
[383032.386275]  ? kthread_park+0x80/0x80
[383032.386277]  ret_from_fork+0x35/0x40
[383032.386285] INFO: task mc:7614 blocked for more than 120 seconds.
[383032.387103]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[383032.387947] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[383032.388793] mc              D    0  7614  30679 0x00000000
[383032.388794] Call Trace:
[383032.388797]  ? __schedule+0x2e6/0x6f0
[383032.388799]  ? __wake_up_common+0x7a/0x180
[383032.388800]  schedule+0x2f/0xa0
[383032.388801]  io_schedule+0x12/0x40
[383032.388805]  cv_wait_common+0xaf/0x130 [spl]
[383032.388807]  ? finish_wait+0x80/0x80
[383032.388863]  txg_wait_open+0x8a/0xd0 [zfs]
[383032.388901]  dmu_free_long_range+0x2fe/0x4b0 [zfs]
[383032.388953]  zfs_rmnode+0x262/0x330 [zfs]
[383032.389006]  ? zfs_zinactive+0xdc/0xf0 [zfs]
[383032.389058]  zfs_inactive+0x84/0x210 [zfs]
[383032.389061]  ? unmap_mapping_pages+0x5e/0x130
[383032.389111]  zpl_evict_inode+0x3c/0x50 [zfs]
[383032.389113]  evict+0xd2/0x1a0
[383032.389116]  do_unlinkat+0x263/0x300
[383032.389120]  do_syscall_64+0x52/0x160
[383032.389122]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[383032.389124] RIP: 0033:0x7fe113fadfc7
[383032.389131] Code: Bad RIP value.
[383032.389131] RSP: 002b:00007ffce7809e98 EFLAGS: 00000202 ORIG_RAX: 0000000000000057
[383032.389133] RAX: ffffffffffffffda RBX: 000055cdbc571140 RCX: 00007fe113fadfc7
[383032.389134] RDX: 000055cdbc56e330 RSI: 00000000ffffffff RDI: 000055cdbc543500
[383032.389134] RBP: 000055cdbc5573a0 R08: 0000000000000000 R09: 0000000000000000
[383032.389135] R10: 0000000000000008 R11: 0000000000000202 R12: 000055cdbc52fd24
[383032.389136] R13: 00007ffce7809efc R14: 000055cdbc56bd10 R15: 000000ffff000000

May 07 '20 08:05 Dinth

@Dinth, how much RAM do you have, and how big are the files you're moving (total and individual) when you see the error?

My errors went away when I increased the RAM I have (from 8GB to 32GB), but obviously that was only a workaround and not a solution. It also only happened when moving thousands of files adding to over a terabyte across a file system boundary.

May 07 '20 08:05 AlexOwen

@Dinth, how much RAM do you have, and how big are the files you're moving (total and individual) when you see the error?

16GB of RAM. The files ive been deleting and moving were not big. Today when this happened i've been moving 1GB file and deleting circa 200GB of files.

May 07 '20 09:05 Dinth

Actually since the last time, zfs started doing that constantly, even after restart on idling system in single user mode and with pool unmounted. iostat -mx doesnt show anything abnormal and short SMART tests of drives were all successful. I have tried increasing RAM amount to 24 but has not fixed the issue.


[ 1572.097836] mount.zfs       D    0  3513      1 0x00004004
[ 1572.098251] Call Trace:
[ 1572.098690]  ? __schedule+0x2e6/0x6f0
[ 1572.099100]  schedule+0x2f/0xa0
[ 1572.099510]  io_schedule+0x12/0x40
[ 1572.099925]  cv_wait_common+0xaf/0x130 [spl]
[ 1572.100322]  ? finish_wait+0x80/0x80
[ 1572.100810]  txg_wait_synced_impl+0xc2/0x110 [zfs]
[ 1572.101264]  txg_wait_synced+0xc/0x40 [zfs]
[ 1572.101737]  zil_replay+0xdd/0x120 [zfs]
[ 1572.102191]  zfsvfs_setup+0x164/0x1f0 [zfs]
[ 1572.102647]  zfs_domount+0x38b/0x3f0 [zfs]
[ 1572.103107]  zpl_mount+0x13d/0x180 [zfs]
[ 1572.103539]  legacy_get_tree+0x27/0x40
[ 1572.103941]  vfs_get_tree+0x25/0xb0
[ 1572.104350]  do_mount+0x7ab/0x970
[ 1572.104825]  ksys_mount+0xb6/0xd0
[ 1572.105248]  __x64_sys_mount+0x21/0x30
[ 1572.105681]  do_syscall_64+0x52/0x160
[ 1572.106122]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1572.106580] RIP: 0033:0x7f7ba673afea
[ 1572.107049] Code: Bad RIP value.
[ 1572.107522] RSP: 002b:00007ffe1e4674a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 1572.108064] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ba673afea
[ 1572.108613] RDX: 000055bd31af811f RSI: 00007ffe1e46b5f0 RDI: 00007ffe1e46df2b
[ 1572.109151] RBP: 00007ffe1e46c630 R08: 00007ffe1e4685f0 R09: 000055bd32666bf0
[ 1572.109699] R10: 0000000001000000 R11: 0000000000000246 R12: 000055bd326630e0
[ 1572.110258] R13: 00007ffe1e4675f0 R14: 00007ffe1e4685f0 R15: 0000000000000002
[ 1581.262529] systemd-system-update-generator[1823]: Offline system update overridden by runlevel "single" on the kernel command line
[ 1581.618931] systemd-system-update-generator[1847]: Offline system update overridden by runlevel "single" on the kernel command line
[ 1581.916296] systemd-system-update-generator[1870]: Offline system update overridden by runlevel "single" on the kernel command line
[ 1692.928764] INFO: task mount.zfs:3513 blocked for more than 483 seconds.
[ 1692.929912]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[ 1692.930983] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1692.932006] mount.zfs       D    0  3513      1 0x00004004
[ 1692.933126] Call Trace:
[ 1692.934263]  ? __schedule+0x2e6/0x6f0
[ 1692.935349]  schedule+0x2f/0xa0
[ 1692.936457]  io_schedule+0x12/0x40
[ 1692.937566]  cv_wait_common+0xaf/0x130 [spl]
[ 1692.938587]  ? finish_wait+0x80/0x80
[ 1692.939756]  txg_wait_synced_impl+0xc2/0x110 [zfs]
[ 1692.941040]  txg_wait_synced+0xc/0x40 [zfs]
[ 1692.942257]  zil_replay+0xdd/0x120 [zfs]
[ 1692.943486]  zfsvfs_setup+0x164/0x1f0 [zfs]
[ 1692.944633]  zfs_domount+0x38b/0x3f0 [zfs]
[ 1692.945829]  zpl_mount+0x13d/0x180 [zfs]
[ 1692.946849]  legacy_get_tree+0x27/0x40
[ 1692.947852]  vfs_get_tree+0x25/0xb0
[ 1692.948917]  do_mount+0x7ab/0x970
[ 1692.949909]  ksys_mount+0xb6/0xd0
[ 1692.950917]  __x64_sys_mount+0x21/0x30
[ 1692.951919]  do_syscall_64+0x52/0x160
[ 1692.952932]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1692.953822] RIP: 0033:0x7f7ba673afea
[ 1692.954743] Code: Bad RIP value.
[ 1692.955579] RSP: 002b:00007ffe1e4674a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 1692.956480] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ba673afea
[ 1692.957369] RDX: 000055bd31af811f RSI: 00007ffe1e46b5f0 RDI: 00007ffe1e46df2b
[ 1692.958246] RBP: 00007ffe1e46c630 R08: 00007ffe1e4685f0 R09: 000055bd32666bf0
[ 1692.959106] R10: 0000000001000000 R11: 0000000000000246 R12: 000055bd326630e0
[ 1692.959966] R13: 00007ffe1e4675f0 R14: 00007ffe1e4685f0 R15: 0000000000000002
[ 1813.760838] INFO: task mount.zfs:3513 blocked for more than 604 seconds.
[ 1813.760939]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[ 1813.761065] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1813.761196] mount.zfs       D    0  3513      1 0x00004004
[ 1813.761347] Call Trace:
[ 1813.761491]  ? __schedule+0x2e6/0x6f0
[ 1813.761641]  schedule+0x2f/0xa0
[ 1813.761794]  io_schedule+0x12/0x40
[ 1813.761978]  cv_wait_common+0xaf/0x130 [spl]
[ 1813.762168]  ? finish_wait+0x80/0x80
[ 1813.762419]  txg_wait_synced_impl+0xc2/0x110 [zfs]
[ 1813.762697]  txg_wait_synced+0xc/0x40 [zfs]
[ 1813.762964]  zil_replay+0xdd/0x120 [zfs]
[ 1813.763228]  zfsvfs_setup+0x164/0x1f0 [zfs]
[ 1813.763508]  zfs_domount+0x38b/0x3f0 [zfs]
[ 1813.763780]  zpl_mount+0x13d/0x180 [zfs]
[ 1813.764016]  legacy_get_tree+0x27/0x40
[ 1813.764265]  vfs_get_tree+0x25/0xb0
[ 1813.764537]  do_mount+0x7ab/0x970
[ 1813.764828]  ksys_mount+0xb6/0xd0
[ 1813.765101]  __x64_sys_mount+0x21/0x30
[ 1813.765383]  do_syscall_64+0x52/0x160
[ 1813.765671]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1813.765975] RIP: 0033:0x7f7ba673afea
[ 1813.766310] Code: Bad RIP value.
[ 1813.766627] RSP: 002b:00007ffe1e4674a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 1813.766983] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ba673afea
[ 1813.767346] RDX: 000055bd31af811f RSI: 00007ffe1e46b5f0 RDI: 00007ffe1e46df2b
[ 1813.767723] RBP: 00007ffe1e46c630 R08: 00007ffe1e4685f0 R09: 000055bd32666bf0
[ 1813.768109] R10: 0000000001000000 R11: 0000000000000246 R12: 000055bd326630e0
[ 1813.768506] R13: 00007ffe1e4675f0 R14: 00007ffe1e4685f0 R15: 0000000000000002
[ 1934.593065] INFO: task mount.zfs:3513 blocked for more than 724 seconds.
[ 1934.593432]       Tainted: P           OE     5.4.0-0.bpo.4-amd64 #1 Debian 5.4.19-1~bpo10+1
[ 1934.593791] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1934.594165] mount.zfs       D    0  3513      1 0x00004004
[ 1934.594534] Call Trace:
[ 1934.594901]  ? __schedule+0x2e6/0x6f0
[ 1934.595261]  schedule+0x2f/0xa0
[ 1934.595624]  io_schedule+0x12/0x40
[ 1934.596018]  cv_wait_common+0xaf/0x130 [spl]
[ 1934.596392]  ? finish_wait+0x80/0x80
[ 1934.596834]  txg_wait_synced_impl+0xc2/0x110 [zfs]
[ 1934.597302]  txg_wait_synced+0xc/0x40 [zfs]
[ 1934.597723]  zil_replay+0xdd/0x120 [zfs]
[ 1934.598134]  zfsvfs_setup+0x164/0x1f0 [zfs]
[ 1934.598554]  zfs_domount+0x38b/0x3f0 [zfs]
[ 1934.598972]  zpl_mount+0x13d/0x180 [zfs]
[ 1934.599349]  legacy_get_tree+0x27/0x40
[ 1934.599737]  vfs_get_tree+0x25/0xb0
[ 1934.600119]  do_mount+0x7ab/0x970
[ 1934.600500]  ksys_mount+0xb6/0xd0
[ 1934.600870]  __x64_sys_mount+0x21/0x30
[ 1934.601267]  do_syscall_64+0x52/0x160
[ 1934.601635]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1934.602012] RIP: 0033:0x7f7ba673afea
[ 1934.602395] Code: Bad RIP value.
[ 1934.602788] RSP: 002b:00007ffe1e4674a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
[ 1934.603231] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7ba673afea
[ 1934.603650] RDX: 000055bd31af811f RSI: 00007ffe1e46b5f0 RDI: 00007ffe1e46df2b
[ 1934.604065] RBP: 00007ffe1e46c630 R08: 00007ffe1e4685f0 R09: 000055bd32666bf0
[ 1934.604494] R10: 0000000001000000 R11: 0000000000000246 R12: 000055bd326630e0
[ 1934.604945] R13: 00007ffe1e4675f0 R14: 00007ffe1e4685f0 R15: 0000000000000002

Linux 5.4.0-0.bpo.4-amd64 (r720-omv.wickhay) 	07/05/20 	_x86_64_	(4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           5.13    0.00    2.33   64.86    0.00   27.68

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
sda              5.55   12.68      0.26      0.57     2.92    18.76  34.47  59.67    0.60    4.05   0.05    48.31    45.72   0.59   1.08
sdd             35.00    0.14      0.36      0.00     0.00     0.00   0.00   0.00   26.50    0.17   0.86    10.52    10.09   6.16  21.63
sdb             81.80    0.15      0.60      0.00     0.00     0.00   0.00   0.00   64.55    0.62   5.12     7.50    10.02   3.89  31.90
sdg             58.57    0.15      0.45      0.00     0.00     0.00   0.00   0.00   37.84    0.17   2.10     7.92    10.01   5.43  31.87
sde             81.44    0.12      0.60      0.00     0.00     0.00   0.00   0.00   64.16    0.62   5.07     7.50    11.63   3.95  32.21
sdc             35.10    0.15      0.36      0.00     0.00     0.00   0.00   0.34   26.67    0.15   0.87    10.52     9.44   6.16  21.71
sdf             58.76    0.15      0.45      0.00     0.00     0.00   0.00   0.00   37.55    0.15   2.09     7.92     9.64   5.45  32.10

(sda is not in a pool and on a different controller)

May 07 '20 11:05 Dinth

Does it import read only? Can you scrub then? What kind of drives?

May 07 '20 11:05 scineram

It actually freezes the computer when trying to mount ZFS during normal boot. Now I have managed to import the pool in single user mode:

root@r720-omv:~# zpool status -v Data
  pool: Data
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 1 days 20:48:00 with 0 errors on Mon Apr 13 21:12:01 2020
config:

	NAME                                         STATE     READ WRITE CKSUM
	Data                                         ONLINE       0     0     0
	  raidz2-0                                   ONLINE       0     0     0
	    ata-HGST_HDS724040ALE640_PK2338P4HPDXJC  ONLINE       0     0     0
	    ata-HGST_HDS724040ALE640_PKA334PCG9L92N  ONLINE       0     0     0
	    ata-HGST_HDS724040ALE640_PK2338P4HPSVVC  ONLINE       0     0     0
	    ata-HGST_HDS724040ALE640_PKA334PCG9HBLN  ONLINE       0     0     0
	    ata-HGST_HDS724040ALE640_PK2338P4HM5K8C  ONLINE       0     0     0
	    ata-HGST_HDS724040ALE640_PK1334PEJDE30S  ONLINE       0     0     0

and i have run'd zpool upgrade as thats what zpool is suggesting and i also found some posts online asking users to update the pool when it freezes a kernel. So far the upgrade is running for 10+ mins without any feedback, im not sure if thats normal? Sorry if its going slightly OT, but had this issue with host freezing during IO for long time, on two different hosts and its the first time when actually the freeze resulted in unmountable pool

May 07 '20 11:05 Dinth

also iotop shows that txg_sync process is reading 250-290K/s, taking 99.9% of available IO. Im still waiting for zpool upgrade to finish

May 07 '20 13:05 Dinth

You may also see a difference by playing with the 'sync=' zfs property, particularly setting 'sync=always' helped me a little, although it may have been a coincidence. You should also read up on the effects of it because it may not be suitable, especially in production server.

On Thu, 7 May 2020 at 14:31, Michal [email protected] wrote:

also iotop shows that txg_sync process is reading 250-290K/s, taking 99.9% of available IO. Im still waiting for zpool upgrade to finish

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/openzfs/zfs/issues/9130#issuecomment-625257195, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIKST6BHXAJWFF6JTEXURTRQKZ25ANCNFSM4IJY3MZA .

May 07 '20 14:05 AlexOwen

Type Version/Name

Distribution Name Ubuntu

Distribution Version 20.04

Linux Kernel 5.4.0-29-generic

Architecture i5-3570k / 24GB DDR3

zfs --version zfs-0.8.3-1ubuntu12 \ zfs-kmod-0.8.3-1ubuntu12

I seem to have hit this last night. I ran zpool remove dpool sdb which was the read cache disk. I had a little shy of 8GB of available RAM out of 24GB total. sdb is a 120GB SSD.

pool: dpool
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	dpool       ONLINE       0     0     0
	  sdc       ONLINE       0     0     0
	  sdd       ONLINE       0     0     0
	cache
	  sdb       ONLINE       0     0     0

errors: No known data errors
qkj@holodeck:/m/media    (=^･^=)ﾉ
➤➤➤➤➤ sudo zpool remove dpool sdb

after that command, pretty much everything stopped responding. Apps that were previously open were responding, and I could ssh in, but most programs were not responding or hung when run.

The system just hung instantly, I saw <5% CPU utilization, RAM utilization did not change, and I wasn't able to see any disk IO

dmesg:

[283203.538506] Mem-Info:
[283203.538509] active_anon:921936 inactive_anon:379250 isolated_anon:0
                 active_file:506522 inactive_file:537943 isolated_file:0
                 unevictable:23542 dirty:243 writeback:0 unstable:0
                 slab_reclaimable:75422 slab_unreclaimable:516165
                 mapped:250435 shmem:140701 pagetables:12987 bounce:0
                 free:286341 free_pcp:71 free_cma:0
[283203.538512] Node 0 active_anon:3687744kB inactive_anon:1517000kB active_file:2026088kB inactive_file:2151772kB unevictable:94168kB isolated(anon):0kB isolated(file):0kB mapped:1001740kB dirty:972kB writeback:0kB shmem:562804kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[283203.538513] Node 0 DMA free:15892kB min:44kB low:56kB high:68kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15900kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[283203.538516] lowmem_reserve[]: 0 3317 23834 23834 23834
[283203.538517] Node 0 DMA32 free:369576kB min:9400kB low:12796kB high:16192kB active_anon:78872kB inactive_anon:59576kB active_file:37008kB inactive_file:17960kB unevictable:368kB writepending:0kB present:3551800kB managed:3486056kB mlocked:0kB kernel_stack:1048kB pagetables:3300kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[283203.538520] lowmem_reserve[]: 0 0 20516 20516 20516
[283203.538521] Node 0 Normal free:759896kB min:58136kB low:79144kB high:100152kB active_anon:3608348kB inactive_anon:1457424kB active_file:1989308kB inactive_file:2133752kB unevictable:93956kB writepending:972kB present:21485568kB managed:21016812kB mlocked:19044kB kernel_stack:21960kB pagetables:48648kB bounce:0kB free_pcp:268kB local_pcp:68kB free_cma:0kB
[283203.538523] lowmem_reserve[]: 0 0 0 0 0
[283203.538524] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
[283203.538529] Node 0 DMA32: 13708*4kB (UE) 13186*8kB (UE) 12062*16kB (UE) 516*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 369824kB
[283203.538533] Node 0 Normal: 569*4kB (UME) 1506*8kB (UME) 43245*16kB (UME) 1617*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB (H) 0*4096kB = 760036kB
[283203.538538] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[283203.538538] 1188300 total pagecache pages
[283203.538540] 4030 pages in swap cache
[283203.538541] Swap cache stats: add 1236478, delete 1232441, find 912242/1105682
[283203.538542] Free swap  = 1745584kB
[283203.538542] Total swap = 2097148kB
[283203.538543] 6263340 pages RAM
[283203.538543] 0 pages HighMem/MovableOnly
[283203.538543] 133648 pages reserved
[283203.538543] 0 pages cma reserved
[283203.538544] 0 pages hwpoisoned
[287100.556647] INFO: task z_wr_iss:570 blocked for more than 120 seconds.
[287100.556650]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.556650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.556660] z_wr_iss        D    0   570      2 0x80004000
[287100.556662] Call Trace:
[287100.556667]  __schedule+0x2e3/0x740
[287100.556669]  schedule+0x42/0xb0
[287100.556671]  rwsem_down_write_slowpath+0x244/0x4d0
[287100.556731]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[287100.556734]  down_write+0x41/0x50
[287100.556766]  dbuf_write_ready+0x1d4/0x3d0 [zfs]
[287100.556795]  arc_write_ready+0xa5/0x4b0 [zfs]
[287100.556837]  zio_ready+0x63/0x3f0 [zfs]
[287100.556878]  zio_execute+0x91/0xe0 [zfs]
[287100.556884]  taskq_thread+0x245/0x430 [spl]
[287100.556886]  ? wake_up_q+0x70/0x70
[287100.556936]  ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[287100.556937]  kthread+0x104/0x140
[287100.556942]  ? task_done+0x90/0x90 [spl]
[287100.556943]  ? kthread_park+0x90/0x90
[287100.556945]  ret_from_fork+0x35/0x40
[287100.556947] INFO: task z_wr_iss:571 blocked for more than 120 seconds.
[287100.556948]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.556949] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.556950] z_wr_iss        D    0   571      2 0x80004000
[287100.556951] Call Trace:
[287100.556952]  __schedule+0x2e3/0x740
[287100.556953]  schedule+0x42/0xb0
[287100.556954]  rwsem_down_write_slowpath+0x244/0x4d0
[287100.556990]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[287100.556991]  down_write+0x41/0x50
[287100.557018]  dbuf_write_ready+0x1d4/0x3d0 [zfs]
[287100.557042]  arc_write_ready+0xa5/0x4b0 [zfs]
[287100.557081]  zio_ready+0x63/0x3f0 [zfs]
[287100.557120]  zio_execute+0x91/0xe0 [zfs]
[287100.557124]  taskq_thread+0x245/0x430 [spl]
[287100.557126]  ? wake_up_q+0x70/0x70
[287100.557164]  ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[287100.557165]  kthread+0x104/0x140
[287100.557169]  ? task_done+0x90/0x90 [spl]
[287100.557170]  ? kthread_park+0x90/0x90
[287100.557172]  ret_from_fork+0x35/0x40
[287100.557173] INFO: task z_wr_iss:572 blocked for more than 120 seconds.
[287100.557174]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.557174] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.557175] z_wr_iss        D    0   572      2 0x80004000
[287100.557176] Call Trace:
[287100.557177]  __schedule+0x2e3/0x740
[287100.557179]  ? sched_clock+0x9/0x10
[287100.557179]  schedule+0x42/0xb0
[287100.557181]  rwsem_down_write_slowpath+0x244/0x4d0
[287100.557213]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[287100.557215]  down_write+0x41/0x50
[287100.557239]  dbuf_write_ready+0x1d4/0x3d0 [zfs]
[287100.557262]  arc_write_ready+0xa5/0x4b0 [zfs]
[287100.557300]  zio_ready+0x63/0x3f0 [zfs]
[287100.557337]  zio_execute+0x91/0xe0 [zfs]
[287100.557341]  taskq_thread+0x245/0x430 [spl]
[287100.557343]  ? wake_up_q+0x70/0x70
[287100.557380]  ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[287100.557381]  kthread+0x104/0x140
[287100.557385]  ? task_done+0x90/0x90 [spl]
[287100.557386]  ? kthread_park+0x90/0x90
[287100.557387]  ret_from_fork+0x35/0x40
[287100.557392] INFO: task txg_sync:647 blocked for more than 120 seconds.
[287100.557393]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.557394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.557394] txg_sync        D    0   647      2 0x80004000
[287100.557395] Call Trace:
[287100.557396]  __schedule+0x2e3/0x740
[287100.557397]  schedule+0x42/0xb0
[287100.557407]  schedule_timeout+0x152/0x2f0
[287100.557409]  ? __next_timer_interrupt+0xe0/0xe0
[287100.557410]  io_schedule_timeout+0x1e/0x50
[287100.557413]  __cv_timedwait_common+0x15e/0x1c0 [spl]
[287100.557415]  ? wait_woken+0x80/0x80
[287100.557418]  __cv_timedwait_io+0x19/0x20 [spl]
[287100.557480]  zio_wait+0x11b/0x230 [zfs]
[287100.557515]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[287100.557545]  dsl_pool_sync+0xbc/0x410 [zfs]
[287100.557579]  spa_sync_iterate_to_convergence+0xe0/0x1c0 [zfs]
[287100.557612]  spa_sync+0x312/0x5b0 [zfs]
[287100.557646]  txg_sync_thread+0x27a/0x310 [zfs]
[287100.557680]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[287100.557684]  thread_generic_wrapper+0x83/0xa0 [spl]
[287100.557685]  kthread+0x104/0x140
[287100.557688]  ? clear_bit+0x20/0x20 [spl]
[287100.557689]  ? kthread_park+0x90/0x90
[287100.557690]  ret_from_fork+0x35/0x40
[287100.557692] INFO: task systemd-journal:983 blocked for more than 120 seconds.
[287100.557693]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.557694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.557694] systemd-journal D    0   983      1 0x00000120
[287100.557695] Call Trace:
[287100.557696]  __schedule+0x2e3/0x740
[287100.557697]  schedule+0x42/0xb0
[287100.557700]  cv_wait_common+0x133/0x180 [spl]
[287100.557701]  ? wait_woken+0x80/0x80
[287100.557704]  __cv_wait+0x15/0x20 [spl]
[287100.557740]  zfs_rangelock_enter_writer+0x41/0x150 [zfs]
[287100.557777]  zfs_rangelock_enter+0xb6/0xf0 [zfs]
[287100.557813]  zfs_trunc+0x53/0x200 [zfs]
[287100.557845]  ? do_raw_spin_unlock+0x9/0x10 [zfs]
[287100.557877]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[287100.557908]  ? sa_lookup+0x7c/0x90 [zfs]
[287100.557945]  zfs_freesp+0xa8/0x2b0 [zfs]
[287100.557980]  ? zfs_zaccess_common+0x78/0xa0 [zfs]
[287100.558015]  ? zfs_zaccess+0x24b/0x3e0 [zfs]
[287100.558051]  zfs_setattr+0xd76/0x2230 [zfs]
[287100.558087]  ? zpl_xattr_get_dir+0x5e/0x130 [zfs]
[287100.558090]  ? atomic_dec+0xd/0x20 [spl]
[287100.558122]  ? __raw_spin_unlock+0x9/0x10 [zfs]
[287100.558153]  ? rrw_exit+0x67/0xc0 [zfs]
[287100.558156]  ? atomic_dec_and_test+0xd/0x20 [spl]
[287100.558159]  ? put_cred+0x1b/0x30 [spl]
[287100.558160]  ? _cond_resched+0x19/0x30
[287100.558161]  ? __kmalloc_node+0x205/0x320
[287100.558198]  ? zpl_xattr_get+0xdb/0x140 [zfs]
[287100.558201]  ? spl_kmem_alloc_impl+0xa8/0x100 [spl]
[287100.558237]  zpl_setattr+0xf7/0x160 [zfs]
[287100.558239]  notify_change+0x357/0x4e0
[287100.558241]  do_truncate+0x78/0xd0
[287100.558242]  do_sys_ftruncate+0x14a/0x1c0
[287100.558244]  __x64_sys_ftruncate+0x1b/0x20
[287100.558246]  do_syscall_64+0x57/0x190
[287100.558248]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[287100.558249] RIP: 0033:0x7f03b5813ccb
[287100.558253] Code: Bad RIP value.
[287100.558253] RSP: 002b:00007ffff85220c8 EFLAGS: 00000202 ORIG_RAX: 000000000000004d
[287100.558254] RAX: ffffffffffffffda RBX: 00007ffff8522100 RCX: 00007f03b5813ccb
[287100.558255] RDX: 0000563417553f90 RSI: 0000000004800000 RDI: 0000000000000034
[287100.558255] RBP: 0000563417611240 R08: 0000000000000001 R09: 00005634176112c4
[287100.558256] R10: 0000000000000000 R11: 0000000000000202 R12: 00007ffff85220f8
[287100.558256] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
[287100.558270] INFO: task txg_sync:2005 blocked for more than 120 seconds.
[287100.558271]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.558272] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.558273] txg_sync        D    0  2005      2 0x80004000
[287100.558273] Call Trace:
[287100.558275]  __schedule+0x2e3/0x740
[287100.558277]  ? __switch_to+0x7f/0x470
[287100.558278]  schedule+0x42/0xb0
[287100.558281]  cv_wait_common+0x133/0x180 [spl]
[287100.558282]  ? wait_woken+0x80/0x80
[287100.558285]  __cv_wait+0x15/0x20 [spl]
[287100.558318]  spa_config_enter+0x101/0x110 [zfs]
[287100.558352]  spa_txg_history_init_io+0x5e/0xd0 [zfs]
[287100.558385]  txg_sync_thread+0x260/0x310 [zfs]
[287100.558418]  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[287100.558422]  thread_generic_wrapper+0x83/0xa0 [spl]
[287100.558423]  kthread+0x104/0x140
[287100.558426]  ? clear_bit+0x20/0x20 [spl]
[287100.558427]  ? kthread_park+0x90/0x90
[287100.558429]  ret_from_fork+0x35/0x40
[287100.558430] INFO: task mmp:2006 blocked for more than 120 seconds.
[287100.558431]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.558432] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.558432] mmp             D    0  2006      2 0x80004000
[287100.558433] Call Trace:
[287100.558434]  __schedule+0x2e3/0x740
[287100.558435]  ? __wake_up_common_lock+0x8a/0xc0
[287100.558436]  schedule+0x42/0xb0
[287100.558439]  cv_wait_common+0x133/0x180 [spl]
[287100.558440]  ? wait_woken+0x80/0x80
[287100.558443]  __cv_wait+0x15/0x20 [spl]
[287100.558475]  spa_config_enter+0x101/0x110 [zfs]
[287100.558509]  vdev_count_leaves+0x26/0x60 [zfs]
[287100.558540]  mmp_thread+0x2a4/0x5a0 [zfs]
[287100.558571]  ? mmp_write_uberblock+0x550/0x550 [zfs]
[287100.558575]  thread_generic_wrapper+0x83/0xa0 [spl]
[287100.558576]  kthread+0x104/0x140
[287100.558580]  ? clear_bit+0x20/0x20 [spl]
[287100.558580]  ? kthread_park+0x90/0x90
[287100.558582]  ret_from_fork+0x35/0x40
[287100.558627] INFO: task deluged:1671679 blocked for more than 120 seconds.
[287100.558627]       Tainted: P           O      5.4.0-29-generic #33-Ubuntu
[287100.558628] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[287100.558629] deluged         D    0 1671679      1 0x00000000
[287100.558629] Call Trace:
[287100.558630]  __schedule+0x2e3/0x740
[287100.558631]  schedule+0x42/0xb0
[287100.558634]  cv_wait_common+0x133/0x180 [spl]
[287100.558635]  ? wait_woken+0x80/0x80
[287100.558638]  __cv_wait+0x15/0x20 [spl]
[287100.558672]  spa_config_enter+0x101/0x110 [zfs]
[287100.558708]  zfs_blkptr_verify+0x267/0x440 [zfs]
[287100.558744]  zio_read+0x42/0xc0 [zfs]
[287100.558766]  ? arc_buf_destroy+0x100/0x100 [zfs]
[287100.558788]  arc_read+0x876/0x1100 [zfs]
[287100.558810]  ? dbuf_rele_and_unlock+0x4e0/0x4e0 [zfs]
[287100.558846]  ? atomic_cmpxchg+0x16/0x30 [zfs]
[287100.558869]  dbuf_read_impl+0x1d2/0x610 [zfs]
[287100.558905]  ? zio_null+0x2f/0x40 [zfs]
[287100.558927]  dbuf_read+0xcb/0x5f0 [zfs]
[287100.558931]  ? spl_kmem_free_impl+0x23/0x30 [spl]
[287100.558954]  dbuf_hold_impl_arg+0x47a/0x5e0 [zfs]
[287100.558976]  dbuf_hold_impl+0x24/0x40 [zfs]
[287100.558998]  dbuf_hold_level+0x32/0x60 [zfs]
[287100.559020]  dbuf_hold+0x16/0x20 [zfs]
[287100.559044]  dmu_buf_hold_array_by_dnode+0xee/0x490 [zfs]
[287100.559068]  dmu_read_uio_dnode+0x49/0xf0 [zfs]
[287100.559104]  ? zrl_add_impl+0x31/0xb0 [zfs]
[287100.559128]  dmu_read_uio_dbuf+0x47/0x60 [zfs]
[287100.559164]  zfs_read+0x117/0x300 [zfs]
[287100.559199]  zpl_read_common_iovec+0x99/0xe0 [zfs]
[287100.559235]  zpl_iter_read_common+0xa8/0x100 [zfs]
[287100.559271]  zpl_iter_read+0x58/0x80 [zfs]
[287100.559272]  do_iter_readv_writev+0x1a3/0x1d0
[287100.559273]  do_iter_read+0xe2/0x1a0
[287100.559275]  vfs_readv+0x6e/0xb0
[287100.559276]  ? security_file_free+0x54/0x60
[287100.559278]  ? call_rcu+0x10/0x20
[287100.559279]  ? __fput+0x150/0x260
[287100.559280]  do_preadv+0x93/0xd0
[287100.559281]  __x64_sys_preadv+0x21/0x30
[287100.559283]  do_syscall_64+0x57/0x190
[287100.559284]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[287100.559285] RIP: 0033:0x7f536403d548
[287100.559286] Code: Bad RIP value.
[287100.559287] RSP: 002b:00007f535b7fd4a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000127
[287100.559288] RAX: ffffffffffffffda RBX: 00007f535b7fd600 RCX: 00007f536403d548
[287100.559288] RDX: 0000000000000020 RSI: 00007f535b7fd600 RDI: 0000000000000026
[287100.559289] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[287100.559289] R10: 000000000becd2d0 R11: 0000000000000246 R12: 0000000000000020
[287100.559290] R13: 0000000000000026 R14: 0000000000000020 R15: 000000000becd2d0

Had to hard reset with the power button, then after a reboot zpool showed everything healthy and when I ran the same command zpool remove dpool sdb it completed instantly.

edit: after a reboot and reloading deluged, I found that a large number of files which had been open in deluge were all missing. On me for not stopping deluged before running the operation, but I also do not believe it should have crashed. Good thing for external backups, right? :)

May 21 '20 04:05 queenkjuul

It was fine for past few months (although i wasnt doing any large IO) until yesterday i needed to delete some large files. It froze after few files, i have restarted the server, it booted normally, tried to delete those files again and again it froze after few files, but now i cannot boot as it fails to mount ZFS. So exactly same situation as a couple of months ago. I have increased system memory to 24GB of RAM, but it didnt help Screenshot 2020-09-05 at 18 17 35

Sep 05 '20 17:09 Dinth

@Dinth is dedup enabled? deduplication can lead to very high delay times while deleting from my experience

Sep 05 '20 23:09 BrainSlayer

zfs zfs copied to clipboard

task txg_sync blocked for more than 120 seconds.

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

System information

zfs
zfs copied to clipboard