process-exporter icon indicating copy to clipboard operation
process-exporter copied to clipboard

Alerting on missing processes upon startup

Open geekdave opened this issue 7 years ago • 6 comments

Hello,

First of all, thank you for creating and maintaining this project! It's been a big help for us.

We have an alert set up like this to fire if any expected processes are not running:

alert: ProcessNotRunning
expr: namedprocess_namegroup_num_procs
  < 1
for: 1m
labels:
  severity: page
annotations:
  description: '{{ $labels.groupname }} process missing on {{ $labels.instance }}'
  summary: '{{ $labels.groupname }} process on {{ $labels.instance }} has not been
    running for 1 min.'

One thing I just noticed is that if some expected processes are not running at the time when I start process-exporter then I get no metrics about them. This means that if an instance restarts, we need to manually check that all expected processes are running.

Wondering if I'm doing something wrong, or if there is another way to go about this. Everything works great if the process is already running at the time when I start process_exporter.

Here is how I'm starting process-exporter (using dummy processes top and tail for demo purposes):

  docker run \
   -d \
   --privileged \
   --name process_exporter \
   -v /proc:/host/proc \
   -p 9256:9256 \
   ncabatoff/process-exporter:0.3.9 \
   --procfs /host/proc \
   -procnames top,tail

Thanks! Dave

geekdave avatar Aug 29 '18 16:08 geekdave

Very happy to hear that you find it useful. I don't think you're doing anything wrong. If I understand your problem correctly it's a classic issue with Prometheus-style monitoring. Unless Prometheus has seen a given metric recently, it doesn't know about it. Thus the alert rule never fires initially, because namedprocess_namegroup_num_procs is not < 1: it's undefined.

If you know the names of the namegroups you care about, you could also check for their absence, e.g.

expr: namedprocess_namegroup_num_procs{groupname="X"} < 1 or absent(namedprocess_namegroup_num_procs{groupname="X"})

That's still not quite enough, because for a binary operation like or to work the labelsets must match exactly on either side. One easy way to make that work (by eliminating labels) is to use an aggregation like sum(), e.g.

expr: sum(namedprocess_namegroup_num_procs{groupname="X"}) < 1 or absent(namedprocess_namegroup_num_procs{groupname="X"})

There are other less hacky options like ignoring or on, see the operators page for details.

If you don't want to hardcode the names of the namegroups you care about, I don't have any good suggestions with the existing process-exporter, but if you think of any I'd love to hear them.

One possibility might be extend process-exporter so that any names given explicitly (as -procnames or with untemplated "name" values in the config file) would get num_procs pre-populated with a value of zero even if no processes are found. I'm going to leave this issue open, I might come back and implement that at some point.

ncabatoff avatar Aug 30 '18 23:08 ncabatoff

Thanks for the detailed response @ncabatoff - makes total sense.

num_procs pre-populated with a value of zero even if no processes are found.

This would be the most ideal solution for me, but I understand it's extra work. I'd be grateful if you'd consider this enhancement for a future release. Thanks!

geekdave avatar Aug 30 '18 23:08 geekdave

Understood @geekdave, I'll see what I can do. I'm very open to adding this feature, just have other more pressing issues at the moment.

ncabatoff avatar Aug 31 '18 00:08 ncabatoff

Prepopulating with zero using a runtime flag or config option would be great. Any progress on that yet?

infernix avatar Jan 17 '19 13:01 infernix

We could use an up like metric for processes

TarekAS avatar Nov 26 '19 10:11 TarekAS

Any update on this?

AdityaNaresh avatar Oct 26 '22 06:10 AdityaNaresh