dd-agent icon indicating copy to clipboard operation
dd-agent copied to clipboard

Feature request: Find & report disk errors

Open miketheman opened this issue 9 years ago • 1 comments

We have a server that had a bad disk setup due to mishandling when it was created, and didn't find out until we poked at the server.

A nagios check does this by looking at the mount list, and stat'ing the entries, and reports the error.

$ /usr/lib/nagios/plugins/check_disk -vvv -e
calling stat on /
For /, used_pct=27 free_pct=73 used_units=2527 free_units=6973 total_units=9936 used_inodes_pct=25 free_inodes_pct=75 fsp.fsu_blocksize=4096 mult=1048576
Freespace_units result=0
...
calling stat on /data
stat failed on /data
DISK CRITICAL - /data is not accessible: Input/output error

This probably should go in disk.py somewhere, but I am not certain of the best way to do this. A simple example:

>>> os.stat("/")
posix.stat_result(st_mode=16877, st_ino=2, st_dev=51713L, st_nlink=22, st_uid=0, st_gid=0, st_size=4096, st_atime=1458812283, st_mtime=1458023647, st_ctime=1458023647)
>>> os.stat("/data")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 5] Input/output error: '/data'

refs: https://github.com/monitoring-plugins/monitoring-plugins/blob/master/plugins/check_disk.c#L634-L636 https://github.com/monitoring-plugins/monitoring-plugins/blob/master/plugins/check_disk.c#L908-L909 https://github.com/monitoring-plugins/monitoring-plugins/blob/master/plugins/check_disk.c#L969-L980

miketheman avatar Mar 24 '16 22:03 miketheman