marie-ai
marie-ai copied to clipboard
Implement health check registry
Implement health check registry.
Registry needs to support following types:
- http
- script
- tcp
- sql
Example configuration:
monitor.json
{
"shell" :"/bin/bash",
"args": [
"./monitor/memory.sh"
],
"id": "pan.memory.0",
"name": "pan.memory",
"interval": "PT1M",
"ttl": "PT1M",
"timeout": "PT10S",
"tags": [
"agent",
"system",
"pan",
"memory"
],
"webhooks": [
{
"name": "",
"uri": "",
"payload": "",
}
],
"__type__": "script"
}
Webhooks: If any health check returns a Failure result, this collections will be used to notify the error status. (Payload is the json payload and must be escaped.)
memory.sh
#!/bin/bash
process="SAMPLE"
# percentage
failureval=95
critivalval=90
total_memory=$(free | grep Mem: | awk '{print $2}')
used_memory=$(free | grep buffers/cache: | awk '{print $3}')
memory_use=`echo $used_memory $total_memory | awk '{print $1 / $2 * 100}'`
memory_use=${memory_use%%.*}
printf "Checking $process memory on $(hostname -i)... "
if [ "$memory_use" -ge "$failureval" ]; then
echo "FAILED. $memory_use%."
exit 2
elif [ "$memory_use" -ge "$critivalval" ]; then
echo "CRITICAL. $memory_use%."
exit 1
elif [ "$memory_use" -le "$critivalval" ]; then
echo "PASSED. $memory_use%."
exit 0
fi
echo "Memory use could not be determined."
exit 2
Shell scripts will have following error codes: 0 = Passed 1 = Critical 2 = Failed
Example:
if [ "$memory_use" -ge "$failureval" ]; then
echo "FAILED. $memory_use%."
exit 2
elif [ "$memory_use" -ge "$critivalval" ]; then
echo "CRITICAL. $memory_use%."
exit 1
elif [ "$memory_use" -le "$critivalval" ]; then
echo "PASSED. $memory_use%."
exit 0
fi