zfs Disk failure with multihost enabled leads to suspended pool

System information

Type	Version/Name
Distribution Name	CentOS
Distribution Version	7.9
Kernel Version	3.10.0-1160.6.1.el7.x86_64
Architecture	x86_64
OpenZFS Version	0.8.5

Describe the problem you're observing

We're running SAS disks in JBODs connected to 2 hosts. We've had multiple occurrences where a single disk failure has caused its pool to get suspended due to MMP writes failing long enough to hit the timeout. Given the version we're running, I'm assuming that the patches from both #7709 and #8495 are already in place. Furthermore, we set zfs_multihost_fail_intervals=30. We'd very much like to see the disk failed and the pool go into degraded mode rather than getting suspended.

Anecdotally, while running 0.7 with multihost_fail_intervals=30, we did experience a few disk failures and each one simply led to a degraded zpool. Since updating to 0.8, every disk failure has ended in a suspended zpool.

Please let me know if there's more info I can provide. Thanks.

Include any warning/errors/backtraces from the system logs

Jul 20 10:57:33 bss7 zed: eid=85 class=delay pool_guid=0x295229136564F2C7 vdev_path=/dev/disk/by-id/wwn-0x5000cca267051be0-part1
Jul 20 16:45:14 bss7 kernel: sd 0:0:42:0: attempting task abort! scmd(ffff91ce90a1be00)
Jul 20 16:45:14 bss7 kernel: sd 0:0:42:0: [sdaq] tag#0 CDB: Write(10) 2a 00 00 00 08 3f 00 00 01 00
Jul 20 16:45:14 bss7 kernel: scsi target0:0:42: handle(0x0036), sas_address(0x5000cca267051be2), phy(41)
Jul 20 16:45:14 bss7 kernel: scsi target0:0:42: enclosure logical id(0x5000ccab040a2680), slot(50) 
Jul 20 16:45:14 bss7 kernel: scsi target0:0:42: enclosure level(0x0000), connector name(     )
Jul 20 16:45:14 bss7 kernel: sd 0:0:42:0: task abort: SUCCESS scmd(ffff91ce90a1be00)
Jul 20 16:45:14 bss7 kernel: sd 0:0:42:0: [sdaq] tag#0 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=30s
Jul 20 16:45:14 bss7 kernel: sd 0:0:42:0: [sdaq] tag#0 CDB: Write(10) 2a 00 00 00 08 3f 00 00 01 00
Jul 20 16:45:14 bss7 kernel: blk_update_request: I/O error, dev sdaq, sector 16888
Jul 20 16:45:14 bss7 kernel: zio pool=storage705 vdev=/dev/disk/by-id/wwn-0x5000cca267051be0-part1 error=5 type=2 offset=258048 size=4096 flags=180ac0
Jul 20 16:45:23 bss7 kernel: sd 0:0:42:0: [sdaq] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=5s
Jul 20 16:45:23 bss7 kernel: sd 0:0:42:0: [sdaq] tag#1 Sense Key : Medium Error [current] [descriptor] 
Jul 20 16:45:23 bss7 kernel: sd 0:0:42:0: [sdaq] tag#1 Add. Sense: Unrecovered read error
Jul 20 16:45:23 bss7 kernel: sd 0:0:42:0: [sdaq] tag#1 CDB: Read(10) 28 00 91 87 b7 82 00 00 02 00
Jul 20 16:45:23 bss7 kernel: blk_update_request: critical medium error, dev sdaq, sector 19532725264
Jul 20 16:45:23 bss7 kernel: zio pool=storage705 vdev=/dev/disk/by-id/wwn-0x5000cca267051be0-part1 error=61 type=1 offset=10000746946560 size=8192 flags=b08c1
Jul 20 16:45:23 bss7 zed: eid=86 class=io pool_guid=0x295229136564F2C7 vdev_path=/dev/disk/by-id/wwn-0x5000cca267051be0-part1
Jul 20 16:45:28 bss7 zed: eid=87 class=delay pool_guid=0x295229136564F2C7 vdev_path=/dev/disk/by-id/wwn-0x5000cca267051be0-part1
Jul 20 16:45:28 bss7 zed: eid=88 class=io pool_guid=0x295229136564F2C7 vdev_path=/dev/disk/by-id/wwn-0x5000cca267051be0-part1
Jul 20 16:45:36 bss7 kernel: sd 0:0:42:0: [sdaq] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=7s
Jul 20 16:45:36 bss7 kernel: sd 0:0:42:0: [sdaq] tag#1 Sense Key : Medium Error [current] [descriptor] 
Jul 20 16:45:36 bss7 kernel: sd 0:0:42:0: [sdaq] tag#1 Add. Sense: Unrecovered read error
Jul 20 16:45:36 bss7 kernel: sd 0:0:42:0: [sdaq] tag#1 CDB: Read(10) 28 00 91 87 b7 82 00 00 02 00
Jul 20 16:45:36 bss7 kernel: blk_update_request: critical medium error, dev sdaq, sector 19532725264
Jul 20 16:45:36 bss7 kernel: zio pool=storage705 vdev=/dev/disk/by-id/wwn-0x5000cca267051be0-part1 error=61 type=1 offset=10000746946560 size=8192 flags=b0ac1
Jul 20 16:45:36 bss7 zed: eid=89 class=io pool_guid=0x295229136564F2C7 vdev_path=/dev/disk/by-id/wwn-0x5000cca267051be0-part1
Jul 20 16:45:54 bss7 kernel: sd 0:0:42:0: [sdaq] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=5s
Jul 20 16:45:54 bss7 kernel: sd 0:0:42:0: [sdaq] tag#0 Sense Key : Medium Error [current] [descriptor] 
Jul 20 16:45:54 bss7 kernel: sd 0:0:42:0: [sdaq] tag#0 Add. Sense: Unrecovered read error
Jul 20 16:45:54 bss7 kernel: sd 0:0:42:0: [sdaq] tag#0 CDB: Read(10) 28 00 91 87 b7 84 00 00 1c 00
Jul 20 16:45:54 bss7 kernel: blk_update_request: critical medium error, dev sdaq, sector 19532725280
Jul 20 16:45:54 bss7 kernel: zio pool=storage705 vdev=/dev/disk/by-id/wwn-0x5000cca267051be0-part1 error=61 type=1 offset=10000746954752 size=114688 flags=80bc0
Jul 20 16:46:00 bss7 kernel: sd 0:0:42:0: [sdaq] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=5s
Jul 20 16:46:00 bss7 kernel: sd 0:0:42:0: [sdaq] tag#1 Sense Key : Medium Error [current] [descriptor] 
Jul 20 16:46:00 bss7 kernel: sd 0:0:42:0: [sdaq] tag#1 Add. Sense: Unrecovered read error
Jul 20 16:46:00 bss7 kernel: sd 0:0:42:0: [sdaq] tag#1 CDB: Read(10) 28 00 91 87 b7 82 00 00 02 00
Jul 20 16:46:00 bss7 kernel: blk_update_request: critical medium error, dev sdaq, sector 19532725264
Jul 20 16:46:00 bss7 kernel: zio pool=storage705 vdev=/dev/disk/by-id/wwn-0x5000cca267051be0-part1 error=61 type=1 offset=10000746946560 size=8192 flags=b08c1
Jul 20 16:46:00 bss7 zed: eid=90 class=io pool_guid=0x295229136564F2C7 vdev_path=/dev/disk/by-id/wwn-0x5000cca267051be0-part1
Jul 20 16:46:00 bss7 zed: eid=91 class=io pool_guid=0x295229136564F2C7 vdev_path=/dev/disk/by-id/wwn-0x5000cca267051be0-part1
Jul 20 16:46:12 bss7 kernel: WARNING: MMP writes to pool 'storage705' have not succeeded in over 45170 ms; suspending pool. Hrtime 4752884091085041
Jul 20 16:46:12 bss7 kernel: WARNING: Pool 'storage705' has encountered an uncorrectable I/O failure and has been suspended.
Jul 20 16:46:12 bss7 zed: eid=92 class=io_failure pool_guid=0x295229136564F2C7

Jul 21 '21 21:07 jlbl

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

Jul 31 '22 01:07 stale[bot]

We also had a similar case recently: A single failing SSD of a ZFS-based Lustre metadata server was causing the entire zpool to get SUSPENDED by the MMP timeout (and before a spare SSD could take over).

However, when we checked the controller event log (/opt/MegaRAID/storcli/storcli64 /c1 show events) we noticed something interesting: It seems that the Broadcom / LSI MegaRAID SAS-3 3316 [Intruder] controller(!) rebooted and - we assume - access to all six SSDs in the zpool was lost for approx 66 seconds (that's how long the controller reboot took) and this was too long for MMP (with zfs_multihost_fail_intervals=10)

Aug 02 '23 16:08 knweiss

zfs zfs copied to clipboard

Disk failure with multihost enabled leads to suspended pool

System information

Describe the problem you're observing

Include any warning/errors/backtraces from the system logs

zfs
zfs copied to clipboard