nhc
nhc copied to clipboard
check lustre filesystem health
how can i use nhc to check my lustre file system theath
when i set it use "* || check_cmd_output -t 5 -m '135T' -e '/usr/bin/lfs df -h|grep filesystem|grep T'" it show me ERROR: nhc: Health check failed: check_cmd_output: 4 returned by "/usr/bin/lfs df -h|grep filesystem|grep T".
When executing a pipeline, the overall return code is based on the exit status of the last process. I'm not sure what would cause grep
to return a 4; the documentation for return codes for GNU GREP is here: https://www.gnu.org/software/grep/manual/grep.html#Exit-Status
If this can help you, this is what we have in our check Lustre health script on the clients
function check_lfs_servers(){
lfs_check=$(/usr/bin/lfs check servers 2>&1 >/dev/null)
if [[ -z $lfs_check ]] ; then
return 0
else
die 1 "Could not reach at least one MDT or OST"
return 1
fi
}
I can actually reproduce this behavior: it's not grep
that returns 4, it's /usr/bin/lfs
, because the pipe and following commands are interpreted as arguments to the lfs
command.
This can be reproduced with something more verbose, like ls
:
$ nhc -e "check_cmd_output -m '/foo/' -e 'ls -al /tmp/a | grep bar'"
ls: cannot access |: No such file or directory
ls: cannot access grep: No such file or directory
ls: cannot access bar: No such file or directory
ERROR: nhc: Health check failed: check_cmd_output: 2 returned by "ls -al /tmp/a | grep bar"
It shows that ls
tries to list files named |
, grep
and bar
.
So piping doesn't seem to work easily with nhc_check_cmd
.
Is there a workaround, other than defining a whole separate check in a script file?