node_exporter
node_exporter copied to clipboard
expose scheduled shutdown times
I've been struggling with porting a monitoring check from Nagios to Prometheus. What it does is raise a flag if there's a shutdown scheduled on a server. It does this through this horrendous NRPE check:
command[dsa2_shutdown]=if /usr/lib/nagios/plugins/check_procs -w 1: -u root -C shutdown > /dev/null || /usr/lib/nagios/plugins/check_procs -w 1: -u root -a /lib/systemd/systemd-shutdownd > /dev/null || ( busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown 2> /dev/null | sed 's/[^"]*"//;s/".*//' | grep -v dry- | grep . ); then echo 'system-in-shutdown'; else echo 'no shutdown running' ; fi
i hope you can unsee this one day.
we can probably get rid of all the check_procs stuff there and assume systemd, at least that's what we're asserting it, which turns this into something like:
busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown
and in fact, I wrote a Python script that would extract a metric out of that nicely:
#!/usr/bin/python3
import logging
import shlex
from subprocess import CalledProcessError, PIPE, run
def test_parse_dbus():
no_sched = '(st) "" 18446744073709551615'
assert parse_dbus(no_sched) == ("", 0)
sched_reboot = '(st) "reboot" 1725477267406843'
assert parse_dbus(sched_reboot) == ("reboot", 1725477267.406843)
sched_reboot_round = '(st) "reboot" 1725477267506843'
assert parse_dbus(sched_reboot_round) == ("reboot", 1725477267.506843)
# theoritical: i've seen the metric "0" with the label "suspend"
# before adding this test. i couldn't reproduce by suspending my
# laptop, so i'm not sure wtf happened there.
sched_suspend = '(st) "suspend" 0'
assert parse_dbus(sched_suspend) == ("", 0)
garbage = '(st) "reboot" 1725477267506843 jfdklafjds'
assert parse_dbus(garbage) == ("", 0)
assert parse_dbus("(st) ...") == ("", 0)
assert parse_dbus("") == ("", 0)
def parse_dbus(output: str) -> tuple[str, float]:
logging.debug("parsing DBus output: %s", output)
try:
_, kind, timestamp_str = output.split(maxsplit=2)
except ValueError as exc:
logging.warning("could not parse DBus output: %r (%s)", output, exc)
return "", 0
kind = kind.replace('"', "")
try:
timestamp = int(timestamp_str) / 1000000
except ValueError as exc:
logging.warning(
"could not parse DBus timestamp: %r (%s)",
timestamp_str,
exc,
)
return "", 0
logging.debug("found kind %r, timestamp %r", kind, timestamp)
if kind and timestamp:
return kind, timestamp
else:
return "", 0
def main():
cmd = shlex.split(
"busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown" # noqa: E501
)
try:
proc = run(cmd, check=True, stdout=PIPE, encoding="ascii")
except CalledProcessError as exc:
logging.warning("could not call command %r: %s", shlex.join(cmd), exc)
kind, timestamp = "", 0
else:
kind, timestamp = parse_dbus(proc.stdout)
print("# HELP node_shutdown_scheduled_timestamp_seconds time of the next scheduled reboot, or zero")
print("# TYPE node_shutdown_scheduled_timestamp_seconds gauge")
if timestamp:
print(
"node_shutdown_scheduled_timestamp_seconds{kind=%s} %s" % (kind, timestamp)
)
else:
print("node_shutdown_scheduled_timestamp_seconds 0")
if __name__ == "__main__":
main()
the problem is there's nowhere to call this thing from: shutdown(8) doesn't have any post hooks, and i don't think systemd will fire any specific service when a shutdown is scheduled... there are some dbus signal sent around though, namely ScheduledShutdown which we can get with:
busctl get-property org.freedesktop.login1 /org/freedesktop/login1 org.freedesktop.login1.Manager ScheduledShutdown
... which is essentially what we're doing above.
But i figured a better place to do this would be in the node exporter itself, since it's already a daemon just sitting there.
Getting that property should be reasonably easy to do in the systemd collector.
I created https://github.com/prometheus/node_exporter/pull/3111 as a draft. It doesn't work. I don't think the dbus API we have supports that generic call.
awesome work, thanks! i've followed up there.