zfs
zfs copied to clipboard
zed and udev thrashing with repeated online events
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Gentoo |
| Distribution Version | (up to date) |
| Linux Kernel | 4.9.76-gentoo-r1 |
| Architecture | x86_64 |
| ZFS Version | 0.7.6 |
| SPL Version | 0.7.6 |
| udev version | 217 |
Describe the problem you're observing
In this recently built system, I often find the disks thrashing with no load on the system. iotop shows zed and udevd are the only things touching the disk. I haven't yet worked out what triggers it, but stopping and restarting zed stops the thrashing. This sounds like it might be related to #6667 and potentially also #7132, but I don't understand why the udev events are being generated.
udevadm monitor shows a constant stream of changes on the ZFS partition of three of the four disks in my (2x2 mirror) array:
KERNEL[800477.625559] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:2/end_device-8:2/target8:0:2/8:0:2:0/block/sdd/sdd2 (block)
UDEV [800477.641848] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:2/end_device-8:2/target8:0:2/8:0:2:0/block/sdd/sdd2 (block)
KERNEL[800477.674069] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:0/end_device-8:0/target8:0:0/8:0:0:0/block/sdb/sdb2 (block)
UDEV [800477.686767] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:0/end_device-8:0/target8:0:0/8:0:0:0/block/sdb/sdb2 (block)
KERNEL[800477.717713] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:3/end_device-8:3/target8:0:3/8:0:3:0/block/sde/sde2 (block)
UDEV [800477.729357] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:3/end_device-8:3/target8:0:3/8:0:3:0/block/sde/sde2 (block)
Meanwhile zed is logging away about the same devices coming online, eg.:
Mar 29 21:40:53 obtainium zed[6393]: zfsdle_vdev_online: setting device '/dev/sdd2' to ONLINE state in pool 'base'
Mar 29 21:40:53 obtainium zed[6393]: zfs_slm_event: EC_dev_status.dev_dle
Mar 29 21:40:53 obtainium zed[6393]: zfsdle_vdev_online: searching for 'pci-0000:01:00.0-sas-phy2-lun-0' in 'base'
Mar 29 21:40:53 obtainium zed[6393]: zed_disk_event:
Mar 29 21:40:53 obtainium zed[6393]: ^Iclass: EC_dev_status
Mar 29 21:40:53 obtainium zed[6393]: ^Isubclass: dev_dle
Mar 29 21:40:53 obtainium zed[6393]: ^Idev_name: /dev/sdd2
Mar 29 21:40:53 obtainium zed[6393]: ^Ipath: /devices/pci0000:00/0000:00:01.0/0000:01:00.0/host8/port-8:2/end_device-8:2/target8:0:2/8:0:2:0/block/sdd/sdd2
Mar 29 21:40:53 obtainium zed[6393]: ^Idevid: scsi-[redacted]-part2
Mar 29 21:40:53 obtainium zed[6393]: ^Iphys_path: pci-0000:01:00.0-sas-phy4-lun-0
Mar 29 21:40:53 obtainium zed[6393]: ^Idev_size: 5983994712064
Mar 29 21:40:53 obtainium zed[6393]: ^Ipool_guid: [redacted]
Mar 29 21:40:53 obtainium zed[6393]: ^Ivdev_guid: [redacted]
There are no other zed log entries in my syslog during this period (eg config_sync).
Describe how to reproduce the problem
Wait. Does not happen immediately at boot.
It seems to be possible to trigger this by starting and stopping a scrub on the pool.
Another reliable way to trigger this is to issue:
sg_logs -t /dev/disk/by-path/pci-0000:01:00.0-sas-phy0-lun-0
This then starts spamming events on whichever disk was queried. This action causes a udev change event for the block device and for each of its partitions.
strace shows that zed is doing a BLKFLSBUF ioctl which is causing the udev change event.
Its stack traces seem to indicate that this is due to zpool_relabel_disk being called from zpool_vdev_online.
Disabling autoexpand on the pool prevents the problem from occurring.
@abrasive thank you for identifying the root cause here. If possible would you mind verifying the proposed fix in #7393.
Thanks for the patch, @behlendorf! Unfortunately it does not work. The change events seem to be generated upon the close(fd).
I note, after some experimentation, that if you close a write filehandle to a disk partition, you get a udev change event even if you didn't write anything. This doesn't happen with read handles. For example sudo dd if=/dev/null of=/dev/sda will do the job.
I believe PR #7629 may resolve this by filtering out all udev change event for partitions.
Thanks for the update, @behlendorf. I've reenabled autoexpand on my pool but I can no longer reproduce the problem! zfs, udev and kernel all remain at the original versions. No disks have changed either.
I'm happy to dig further if you still want to pursue this one but if so I could do with some help figuring out what triggering factor might have changed.
@behlendorf I can consistently hit that problem by using the reproducer from @abrasive - ie: autoexpand=on and sg_logs -t <disk>. From my understading, PR #7629 was reverted due to corruption issues. Can you confirm that we should not use autoexpand to properly avoid the issue described here? Or do other workarounds exist? Thanks.
@shenyan1 that's right for the 0.7.x releases. For now, I'd suggest leaving autoexpand=off. The required fix has been integrated to master to resolve this issue. If you're in a position where you can verify the correct behavior for your environment using latest 0.8.0-rc2 release candidate I'd recommend giving it a try.
straceshows thatzedis doing aBLKFLSBUFioctl which is causing the udev change event.Its stack traces seem to indicate that this is due to
zpool_relabel_diskbeing called fromzpool_vdev_online.Disabling
autoexpandon the pool prevents the problem from occurring.
this is the solution. Setting the autoexpand property to off removes the problems and no more config sync messages
reopening and self-assigning so I can track this with #7132