process-exporter
process-exporter copied to clipboard
Alerting on missing processes upon startup
Hello,
First of all, thank you for creating and maintaining this project! It's been a big help for us.
We have an alert set up like this to fire if any expected processes are not running:
alert: ProcessNotRunning
expr: namedprocess_namegroup_num_procs
< 1
for: 1m
labels:
severity: page
annotations:
description: '{{ $labels.groupname }} process missing on {{ $labels.instance }}'
summary: '{{ $labels.groupname }} process on {{ $labels.instance }} has not been
running for 1 min.'
One thing I just noticed is that if some expected processes are not running at the time when I start process-exporter then I get no metrics about them. This means that if an instance restarts, we need to manually check that all expected processes are running.
Wondering if I'm doing something wrong, or if there is another way to go about this. Everything works great if the process is already running at the time when I start process_exporter.
Here is how I'm starting process-exporter (using dummy processes top and tail for demo purposes):
docker run \
-d \
--privileged \
--name process_exporter \
-v /proc:/host/proc \
-p 9256:9256 \
ncabatoff/process-exporter:0.3.9 \
--procfs /host/proc \
-procnames top,tail
Thanks! Dave
Very happy to hear that you find it useful. I don't think you're doing anything wrong. If I understand your problem correctly it's a classic issue with Prometheus-style monitoring. Unless Prometheus has seen a given metric recently, it doesn't know about it. Thus the alert rule never fires initially, because namedprocess_namegroup_num_procs is not < 1: it's undefined.
If you know the names of the namegroups you care about, you could also check for their absence, e.g.
expr: namedprocess_namegroup_num_procs{groupname="X"} < 1 or absent(namedprocess_namegroup_num_procs{groupname="X"})
That's still not quite enough, because for a binary operation like or to work the labelsets must match exactly on either side. One easy way to make that work (by eliminating labels) is to use an aggregation like sum(), e.g.
expr: sum(namedprocess_namegroup_num_procs{groupname="X"}) < 1 or absent(namedprocess_namegroup_num_procs{groupname="X"})
There are other less hacky options like ignoring or on, see the operators page for details.
If you don't want to hardcode the names of the namegroups you care about, I don't have any good suggestions with the existing process-exporter, but if you think of any I'd love to hear them.
One possibility might be extend process-exporter so that any names given explicitly (as -procnames or with untemplated "name" values in the config file) would get num_procs pre-populated with a value of zero even if no processes are found. I'm going to leave this issue open, I might come back and implement that at some point.
Thanks for the detailed response @ncabatoff - makes total sense.
num_procs pre-populated with a value of zero even if no processes are found.
This would be the most ideal solution for me, but I understand it's extra work. I'd be grateful if you'd consider this enhancement for a future release. Thanks!
Understood @geekdave, I'll see what I can do. I'm very open to adding this feature, just have other more pressing issues at the moment.
Prepopulating with zero using a runtime flag or config option would be great. Any progress on that yet?
We could use an up like metric for processes
Any update on this?