zfs_exporter icon indicating copy to clipboard operation
zfs_exporter copied to clipboard

Feature request: publish detailed health metrics

Open pavilalopes opened this issue 3 years ago • 3 comments

Currently, only the overall zpool health status is published. It would be useful if the individual vdev and disk status were also published.

For example, this status

$ zpool status
...
config:
        NAME                        STATE     READ WRITE CKSUM
        zpool1                      ONLINE       0     0     0
          raidz1-0                  ONLINE       0     0     0
            wwn-0x5000c500b3b2f8c0  ONLINE       0     0     0
            wwn-0x5000c500b3b53463  ONLINE       0     0     0
            wwn-0x5000c500b3b33354  ONLINE       0     0     0

could lead to these metrics being published:

zfs_pool_health{pool="zpool1"} 0
zfs_pool_vdev_health{pool="zpool1", vdev="raidz1-0"} 0
zfs_pool_disk_health{pool="zpool1", vdev="raidz1-0" disk="wwn-0x5000c500b3b2f8c0"} 0
zfs_pool_disk_health{pool="zpool1", vdev="raidz1-0" disk="wwn-0x5000c500b3b53463"} 0
zfs_pool_disk_health{pool="zpool1", vdev="raidz1-0" disk="wwn-0x5000c500b3b33354"} 0

This would make it possible to build more informative dashboards. With only the pool health status, the operator still has to log into the server to find out which/how many devices are faulted.

pavilalopes avatar Jul 20 '21 20:07 pavilalopes

Unfortunately the ZFS library we use does not provide access to this information, most likely because the zpool command does not provide a flag to enable machine-parseable output. I'd consider exporting this data if it was available upstream, however I don't expect to contribute this functionality myself in the foreseeable future.

pdf avatar Jul 20 '21 23:07 pdf

OpenZFS has now added an "influxdb" command to provide machine-parseable output, which may be useful for this. https://github.com/openzfs/zfs/pull/10786

HubbeKing avatar Sep 20 '21 13:09 HubbeKing

We no longer rely on an upstream ZFS library, however I'm not certain that we can rely on the influxdb output as that is ZFS version-dependent, and I'd like to maintain good host version compatibility. The alternative though is ugly text parsing, and I'm not eager to tackle this any time soon, though it's certainly doable.

pdf avatar Nov 16 '21 10:11 pdf