zfs zed and udev thrashing with repeated online events

System information

Type	Version/Name
Distribution Name	Gentoo
Distribution Version	(up to date)
Linux Kernel	4.9.76-gentoo-r1
Architecture	x86_64
ZFS Version	0.7.6
SPL Version	0.7.6
udev version	217

Describe the problem you're observing

In this recently built system, I often find the disks thrashing with no load on the system. iotop shows zed and udevd are the only things touching the disk. I haven't yet worked out what triggers it, but stopping and restarting zed stops the thrashing. This sounds like it might be related to #6667 and potentially also #7132, but I don't understand why the udev events are being generated.

udevadm monitor shows a constant stream of changes on the ZFS partition of three of the four disks in my (2x2 mirror) array:

KERNEL[800477.625559] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:2/end_device-8:2/target8:0:2/8:0:2:0/block/sdd/sdd2 (block)
UDEV  [800477.641848] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:2/end_device-8:2/target8:0:2/8:0:2:0/block/sdd/sdd2 (block)
KERNEL[800477.674069] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:0/end_device-8:0/target8:0:0/8:0:0:0/block/sdb/sdb2 (block)
UDEV  [800477.686767] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:0/end_device-8:0/target8:0:0/8:0:0:0/block/sdb/sdb2 (block)
KERNEL[800477.717713] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:3/end_device-8:3/target8:0:3/8:0:3:0/block/sde/sde2 (block)
UDEV  [800477.729357] change   /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:3/end_device-8:3/target8:0:3/8:0:3:0/block/sde/sde2 (block)

Meanwhile zed is logging away about the same devices coming online, eg.:

Mar 29 21:40:53 obtainium zed[6393]: zfsdle_vdev_online: setting device '/dev/sdd2' to ONLINE state in pool 'base'
Mar 29 21:40:53 obtainium zed[6393]: zfs_slm_event: EC_dev_status.dev_dle
Mar 29 21:40:53 obtainium zed[6393]: zfsdle_vdev_online: searching for 'pci-0000:01:00.0-sas-phy2-lun-0' in 'base'
Mar 29 21:40:53 obtainium zed[6393]: zed_disk_event:
Mar 29 21:40:53 obtainium zed[6393]: ^Iclass: EC_dev_status
Mar 29 21:40:53 obtainium zed[6393]: ^Isubclass: dev_dle
Mar 29 21:40:53 obtainium zed[6393]: ^Idev_name: /dev/sdd2
Mar 29 21:40:53 obtainium zed[6393]: ^Ipath: /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:2/end_device-8:2/target8:0:2/8:0:2:0/block/sdd/sdd2
Mar 29 21:40:53 obtainium zed[6393]: ^Idevid: scsi-[redacted]-part2
Mar 29 21:40:53 obtainium zed[6393]: ^Iphys_path: pci-0000:01:00.0-sas-phy4-lun-0
Mar 29 21:40:53 obtainium zed[6393]: ^Idev_size: 5983994712064
Mar 29 21:40:53 obtainium zed[6393]: ^Ipool_guid: [redacted]
Mar 29 21:40:53 obtainium zed[6393]: ^Ivdev_guid: [redacted]

There are no other zed log entries in my syslog during this period (eg config_sync).

Describe how to reproduce the problem

Wait. Does not happen immediately at boot.

Mar 29 '18 10:03 abrasive

It seems to be possible to trigger this by starting and stopping a scrub on the pool.

Mar 29 '18 11:03 abrasive

Another reliable way to trigger this is to issue:

sg_logs -t  /dev/disk/by-path/pci-0000:01:00.0-sas-phy0-lun-0

This then starts spamming events on whichever disk was queried. This action causes a udev change event for the block device and for each of its partitions.

Mar 29 '18 23:03 abrasive

strace shows that zed is doing a BLKFLSBUF ioctl which is causing the udev change event.

Its stack traces seem to indicate that this is due to zpool_relabel_disk being called from zpool_vdev_online.

Disabling autoexpand on the pool prevents the problem from occurring.

Mar 29 '18 23:03 abrasive

@abrasive thank you for identifying the root cause here. If possible would you mind verifying the proposed fix in #7393.

Apr 05 '18 19:04 behlendorf

Thanks for the patch, @behlendorf! Unfortunately it does not work. The change events seem to be generated upon the close(fd).

I note, after some experimentation, that if you close a write filehandle to a disk partition, you get a udev change event even if you didn't write anything. This doesn't happen with read handles. For example sudo dd if=/dev/null of=/dev/sda will do the job.

Apr 06 '18 11:04 abrasive

I believe PR #7629 may resolve this by filtering out all udev change event for partitions.

Jun 19 '18 21:06 behlendorf

Thanks for the update, @behlendorf. I've reenabled autoexpand on my pool but I can no longer reproduce the problem! zfs, udev and kernel all remain at the original versions. No disks have changed either.

I'm happy to dig further if you still want to pursue this one but if so I could do with some help figuring out what triggering factor might have changed.

Jun 20 '18 02:06 abrasive

@behlendorf I can consistently hit that problem by using the reproducer from @abrasive - ie: autoexpand=on and sg_logs -t <disk>. From my understading, PR #7629 was reverted due to corruption issues. Can you confirm that we should not use autoexpand to properly avoid the issue described here? Or do other workarounds exist? Thanks.

Dec 31 '18 11:12 shodanshok

@shenyan1 that's right for the 0.7.x releases. For now, I'd suggest leaving autoexpand=off. The required fix has been integrated to master to resolve this issue. If you're in a position where you can verify the correct behavior for your environment using latest 0.8.0-rc2 release candidate I'd recommend giving it a try.

Jan 03 '19 00:01 behlendorf

strace shows that zed is doing a BLKFLSBUF ioctl which is causing the udev change event.

Its stack traces seem to indicate that this is due to zpool_relabel_disk being called from zpool_vdev_online.

Disabling autoexpand on the pool prevents the problem from occurring.

this is the solution. Setting the autoexpand property to off removes the problems and no more config sync messages

Feb 06 '22 21:02 xgiovio

reopening and self-assigning so I can track this with #7132

Jul 29 '22 19:07 tonyhutter

zfs zfs copied to clipboard

zed and udev thrashing with repeated online events

System information

Describe the problem you're observing

Describe how to reproduce the problem

zfs
zfs copied to clipboard