glances icon indicating copy to clipboard operation
glances copied to clipboard

ZFS Monitoring

Open hndrewaall opened this issue 8 years ago • 17 comments

It would be great to have monitoring/alerting for ZFS pools. Set alerts for degraded state, watch scrub/repair status, etc.

hndrewaall avatar Jun 14 '16 17:06 hndrewaall

Testbed for Linux user (or / for developer without a ZFS pool):

$ sudo dd if=/dev/zero of=/file1 count=100000 bs=1024
$ sudo zpool create zsfpool /file1
$ sudo zpool list
NAME      SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
testzfs      -      -      -      -      -  FAULTED  -
zsfpool    93M  95,5K  92,9M     0%  1.00x  ONLINE  -

Also be sure to uncomment the following line in the Glances conf file:

[fs]
allow=zfs

nicolargo avatar Jun 25 '16 13:06 nicolargo

First issue, only root could grab ZFS pools status:

$ zpool status zsfpool
connect: Permission non accordée
internal error: failed to initialize ZFS library
$ sudo zpool status zsfpool
  pool: zsfpool
 state: ONLINE
 scrub: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    zsfpool     ONLINE       0     0     0
      /file1    ONLINE       0     0     0

errors: No known data errors

nicolargo avatar Jun 25 '16 13:06 nicolargo

There are other examples of data that require root (sensor info) no?

hndrewaall avatar Jan 02 '17 05:01 hndrewaall

Nope, Sensors did not need root rights...

One workaround is to configure the sudoers file to not ask for password when the 'sudo zpool status zsfpool' command line is asked. Not a very big fan...

nicolargo avatar Jan 02 '17 07:01 nicolargo

decent workaround on this one is sudoers.d conf file with no password on just that command zpool status

anyway i just found out about this project and from a quick look it seems like a keeper..

bjornstromberg avatar Aug 21 '18 12:08 bjornstromberg

Salutations!

I've managed to edit my /etc/sudoers.d/zfs to allow sudo zpool status without requiring a password, but I'm still not sure how to verify that glances is able to do what it needs.

Should I see a thing saying my pool is OK if there's nothing wrong? Or will I see a thing only if something is wrong? I have set allow=zfs in my /etc/glances/glances.conf

Thanks for such an awesome project!

-Travis

travnewmatic avatar Oct 18 '18 10:10 travnewmatic

Hello, I am running a recent linux mint, and setting up ZFS. I notice I am able to run zpool status and other zfs enumeration commands as a non-root user - does this imply that glances can now have opportunity to better monitor ZFS status?

kr4z33 avatar May 14 '20 20:05 kr4z33

I confirm that on ubuntu 20.04 (out of the box) zpool and zfs commands can be run as a regular user.

senorsmile avatar Jun 09 '20 23:06 senorsmile

Is there a follow up to this?

fusionstream avatar Aug 21 '20 16:08 fusionstream

Ok, tested on Ubuntu 20.04 and the zpool status zsfpool command line could be executed as a regular user.

I need to understand what kind of additional information (specifics to ZFS) do you want to display in Glances.

For the moment the pool is displayed as a standard mount mount:

Screenshot from 2020-08-22 12-14-22

nicolargo avatar Aug 22 '20 10:08 nicolargo

Perhaps for me the most critical would be: on a per pool basis,

  • general status (ONLINE | DEGRADED | etc)
  • scrub status (running and scrubbed xxB; completed WHEN and repaired xxB)
  • which drive in the pool has read/write/cksum errors

then perhaps cool informational stuff would be:

  • pool/vdev config
  • disk usage via zfs list (the way I mounted my drives means the usual disk usage methods shows incorrect values)
  • zfs get compressratio
  • dedup table info zpool status -D
  • dedup ratio zpool get dedup
  • the previous 2 dependent on zfs get dedup because apparently at some point I turned off dedup on all my pools. I don't recall doing this so maybe it was from an update. Or maybe it was indeed me.

fusionstream avatar Aug 24 '20 03:08 fusionstream

zpool list also has some cool informational stuff in a single line per pool which covers point 2 and 5 of the informational list above but critically it doesn't tell you if dedup is on or off (I have a 1.00x dedup ratio because dedup is off). It also has info for FRAG and general health status

fusionstream avatar Aug 24 '20 03:08 fusionstream

Thanks @fusionstream It is a lot of information for the space available in the sidebar... We need to make one mockup with all states.

nicolargo avatar Aug 27 '20 08:08 nicolargo

no problem. I'm using it solely in Home Assistant at this time so full disclosure the space issue is something I will not yet experience fully. What's a dome mockup and how can I help?

fusionstream avatar Aug 29 '20 05:08 fusionstream

@fusionstream can you make a mockup using a basic text editor ?

nicolargo avatar Oct 10 '20 11:10 nicolargo

I'll take a stab at it, the following will be incomplete, but perhaps a useful start for discussion:

FILE SYS      Used  Total
_ocker/aufs   182G   227G
-BEGIN ZFS MOCK-UP STUFF-
Zpool   _truncatezpoolnam

That last line would be the zpool name (truncated as required to fit) the following would be repeated for each zpool on the system.

ONE of the following lines containing status states for the pool would follow, the xxxx's would replaced with the actual numbers, typically here.

ONLINE       xxxxG  xxxxG
DEGRADED     xxxxG  xxxxG
SUSPENDED    xxxxG  xxxxG
FAULTED
UNAVAILABLE
OFFLINE

the zpool may be in various states of scrubbing or resilvering, one of the following groups would follow

SCRUB RUNNING      __.__%
REPAIRED            xxxxB

SCRUB COMPLETE--no errors
yy-mm-ddThh:mm:ss

SCRUB COMPLETE --- errors
yy-mm-ddThh:mm:ss
REPAIRED            xxxxB

RESILVER COMPLETE--No Err
yy-mm-ddThh:mm:ss

RESILVER COMPLETE--errors
yy-mm-ddThh:mm:ss
REPAIRED            xxxxB

RESILVER RUNNING   __.__%
REPAIRED            xxxxB

following this would be the configuration of the zpool, this can be presented in several ways, there should be some mechanism to toggle among the options here, or others. The following samples assume a pool of two mirrored VDEVs some thought would be required to accommodate other configurations and vdev types (logs, RAIDZ, etc) I am less familiar with those, so I am going to keep the scope limited in this mock-up. With large pools this may consume vertical real estate, how to handle that is a problem for later.

Another command available to non-root is the following: zpool iostat -v some of this would come from there, and could be refreshed for a live view of the movings of data which would be of interest.

in these samples "_xxx..."" is a truncated drive name. these may be in the form of sda, sdb, etc or longer disk ID strings that will need truncation to fit.

"nnn" is a number

with no errors, capacity may be of primary concern for someone mucking about in configuring a zpool:

config_cap.  alloc   free
  Mirror-0   xxxxG  xxxxG
    _xxxxxx      -      -
    _xxxxxx      -      -
   Mirror-1  xxxxG  xxxxG
    _xxxxxx      -      -
    _xxxxxx      -      - 

for someone keeping track of a pool in production might be interested to see IO performance, presented like the following, displaying operations:

config_ops.   read  write
  Mirror-0     nnn    nnn
    _xxxxxxxx  nnn    nnn
    _xxxxxxxx  nnn    nnn
   Mirror-1    nnn    nnn
    _xxxxxxxx  nnn    nnn
    _xxxxxxxx  nnn    nnn

or the following as bandwidth:

config_BW.    read  write
  Mirror-0     nnn    nnn
    _xxxxxxxx  nnn    nnn
    _xxxxxxxx  nnn    nnn
   Mirror-1    nnn    nnn
    _xxxxxxxx  nnn    nnn
    _xxxxxxxx  nnn    nnn 

If the pool status is not "ONLINE" config with states would likely be of interest:

Config              State        
mirror-0
  _xxxxxxxx   UNAVAILABLE
  _xxxxxxxx   UNAVAILABLE
mirror-n
  _xxxxxxxx
  replacing      DEGRADED
    _xxxxxx       OFFLINE
    _xxxxxx        ONLINE

or the following error counts on Read/Write/Checksum

Config_Err.   R   W   CHK    
  Mirror-0   nn  nn   nnn  
    _xxxxxx  nn  nn   nnn
    _xxxxxx  nn  nn   nnn
   Mirror-1  nn  nn   nnn
    _xxxxxx  nn  nn   nnn
    _xxxxxx  nn  nn   nnn

Following the display of one of the above configuration presentations, zfs file systems would be displayed. each one may or may not be mounted, which might be indicated by a color change or something. This information comes from the zfs list command the 'AVAIL' part of that output is not of interest here, because it would be displayed at the pool level. the filesystem names will need to be truncated to fit:

FILE SYS      Used  Refer
_zfsFileSys  xxxxG  xxxxG
_zfsFileSys  xxxxG  xxxxG
_zfsFileSys  xxxxG  xxxxG
...
...

kr4z33 avatar Nov 14 '20 13:11 kr4z33

Postpone because the information needed could not be integrated in the current FS Plugin. The proposal is to stay with the current feature in Glances v3.x:

  • Edit the /etc/sudoers.d/zfs file to allow standard user to run zsf monitoring command
  • Edit the Glances configuration file [fs] section with allow=zsf

The result is the following in the current develop branch:

image

In Glances version 4 a dedicated plugin should be created (see branch https://github.com/nicolargo/glances/tree/glancesv4).

Contributors are welcome.

nicolargo avatar Sep 25 '22 17:09 nicolargo