nagios-plugin-check_borg
nagios-plugin-check_borg copied to clipboard
Do not "STATE_UNKNOWN" on unfinished backups
Hi there,
I've made a hack so that check_borg send "STATE_OK" when the backups aren't finished and there is a process running that have the same BORG_REPO as the one we're looking to check.
It might be a bit overly complicated with the date to convert the ps output, but it works for us.
G
What about borg with-lock true
?
If there's an operation running on the borg repository blocking our commands, borg with-lock
will give an exact result.
What about
borg with-lock true
?If there's an operation running on the borg repository blocking our commands,
borg with-lock
will give an exact result.
I tried that first but I couldn't get with-lock to work (on borg 1.1.8). Could anyone point me to a command that would get the status of a locked backup ?
I've at least documented the complexity, and realized the code had issues on finished backups (crashing).
Thanks a lot!
It's still not the best approach, but better than the first for sure. The best approach IMHO would be to use borg itself to check for locks as @bebehei already mentioned. Due to lack of time, I have not been able to check borg with-lock true
, but the documentation sounds promising.
The current implementation of this PR is just checking if there is a process running on your local machine.
It doesn't catch if there is a process running on another machine.
But let's take a step back at first and think about the design of the plugin. I've got a few thought on this
- It's fine if there is a current snapshot (borg list successfully)
- It's problematic if there is **no current snapshot** (borg list successfully)
- It's fine if there is currently a backup running. (borg list failed)
- It gets problematic if the backup is **running too long**. (borg list failed)
So we've got a matrix of possibilities and we should check either if there is a backup running and in time or if there is a list and it got current snapshots.
We could implement this either via a nagios specific-way or we could implement this in the plugin.
In nagios this would be easy quickfix. We just explicitly use UNKNOWN
state in the nagios plugin and emit this if it's running. Additionally you could use a "scheduled flexible downtime", which is scheduled to be started right with the cronjob. And if the plugin switches to UNKNOWN
, it automatically gets downtimed for a specified amount of time. So you ignore your backup during the backup time.
If we implement this in the plugin, we have to check the timestamp of the lock. According to my research, there is no borg
interface right now, giving you lock information. There is only lock.roster
in the repository during backup. It's a JSON-FIle with a timestamp. Here's a current example:
{"exclusive": [["falafel@158445875416263", 22722, 0]]}#
You have to remove the last 5 chars of the Timestamp to have a Unix-timestamp.
So whenever a backup is running, you check the roster and check if there it's in range. Depending on the range the plugin then states OK/WARN/CRIT.
I like the second solution. It bloats up the plugin, but it works correctly. And with an interface from the actual borg executable (e.g. borg lock-info
), this would be awesome.
Hi there,
Thanks for being so responsive !
You are right. This patch works for us, as we run it as a local check on the host that does push the backups. I was sending to y'all as a courtesy in case that helps. I'll read the code for with-lock on borg side to figure out how that works and maybe make a feature request to get the "running since" value from borg.
G