zfs_exporter
zfs_exporter copied to clipboard
Feature request: publish detailed health metrics
Currently, only the overall zpool health status is published. It would be useful if the individual vdev and disk status were also published.
For example, this status
$ zpool status
...
config:
NAME STATE READ WRITE CKSUM
zpool1 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
wwn-0x5000c500b3b2f8c0 ONLINE 0 0 0
wwn-0x5000c500b3b53463 ONLINE 0 0 0
wwn-0x5000c500b3b33354 ONLINE 0 0 0
could lead to these metrics being published:
zfs_pool_health{pool="zpool1"} 0
zfs_pool_vdev_health{pool="zpool1", vdev="raidz1-0"} 0
zfs_pool_disk_health{pool="zpool1", vdev="raidz1-0" disk="wwn-0x5000c500b3b2f8c0"} 0
zfs_pool_disk_health{pool="zpool1", vdev="raidz1-0" disk="wwn-0x5000c500b3b53463"} 0
zfs_pool_disk_health{pool="zpool1", vdev="raidz1-0" disk="wwn-0x5000c500b3b33354"} 0
This would make it possible to build more informative dashboards. With only the pool health status, the operator still has to log into the server to find out which/how many devices are faulted.
Unfortunately the ZFS library we use does not provide access to this information, most likely because the zpool
command does not provide a flag to enable machine-parseable output. I'd consider exporting this data if it was available upstream, however I don't expect to contribute this functionality myself in the foreseeable future.
OpenZFS has now added an "influxdb" command to provide machine-parseable output, which may be useful for this. https://github.com/openzfs/zfs/pull/10786
We no longer rely on an upstream ZFS library, however I'm not certain that we can rely on the influxdb
output as that is ZFS version-dependent, and I'd like to maintain good host version compatibility. The alternative though is ugly text parsing, and I'm not eager to tackle this any time soon, though it's certainly doable.