pushgateway icon indicating copy to clipboard operation
pushgateway copied to clipboard

Specify the instance grouping key in the message body instead of the URL

Open xkilian opened this issue 5 years ago • 5 comments

updated with a clearer suggestion of a solution

Feature request

Permit specifying the instance in the body of the message and that it can be used as a grouping key in addition to the job name.

NxLog(for example) which can send http output to pushgateway, cannot have dynamic variables in the URL itself. This would provide a more open and flexible method of sending data to the pushgateway.

The message body would contain a label which would be considered as a grouping key in addition to the job. This label should have strict rules in how it can be recognized as an additionnal grouping key. (label prefix name, particular label name. or other method.

Bug Report

What did you do?

Configure NxLog to send data:

http://10.10.0.1:9091/metrics/job/job_name/ metricname{instance="1234",label2="value2"} 50

What did you expect to see?

Grouping key job_name, instance.

What did you see instead? Under which circumstances? Grouping key is only: job_name Instance in this case is considered a regular label and not a grouping key.

If specifying a label with a particular name or semantic (KEY_name, or instance) in the body, I would expect it to be used as an additionnal grouping key.

Ubuntu 18.04LTS, pushgateway version 1.2

xkilian avatar May 25 '20 22:05 xkilian

It's more complicated than that. It is very common that the body contains labels that should not be part of the grouping key. This needed a different format in the body or some other trick, possibly involving a separate endpoint for this kind of push.

We actually considered other ways of encoding the grouping labels when we ran into encoding problems with the RESTful style. Back then, the decision was to stick with the encoding in the URL path (but introduce base64 encoding for problematic characters). That worked out quite well.

The motivation behind the decision was to keep things simple and consistent. It would be confusing to have multiple ways of doing the same thing. I don't want to reject this request outright, but you should know that there is a high bar because this might be just overhead for a large majority of users to satisfy a (presumably) very niche use case. And also because this will be a bit tricky to get done in a correct and easily usable way.

beorn7 avatar May 26 '20 15:05 beorn7

Thank you for the explanation. I very much agree that it opens up confusion with two ways to declare grouping keys. if perhaps it is the first key and that it is named groupingkey_instance or some such to make it clear that it is not just a label. I think, the pushgateway as it is, is acceptable. It should be kept simple and there are alternatives.

In the end to solve the problem, we actually went with a Redis workflow:

  • A Redis store that acts for all intents as a pushgateway with the following bonuses that it supports expiring keys and it can ingest using a basic TCP stream(push). No need to deal with URLs and grouping keys.
  • A custom exporter was written that proxys requests from Prometheus to Redis, reformats the result and returns it to Prometheus in the Prometheus exposition format with all the right labels, metric name, etc. About 40 lines of python code using the Prometheus library and a redis client library with all the instrumentation! Yé. Works just like a blackbox exporter, but for Redis.

As a side note, there are valid cases for expiring keys. In this case, it is to follow the data from a vehicule. When the vehicule returns to the depot and is shutdown, the last value should NOT be persisted for X hours until the vehicule wakes up again. Anyway, Redis has all the bells, whistles and performance to handle our number of key/values and could be considered a good alternative to the pushgateway for data needing to be persisted.

xkilian avatar May 28 '20 04:05 xkilian

I updated the issue description with a better suggestion of a solution if others find a need to go forward with this request.

xkilian avatar May 28 '20 05:05 xkilian

As a side note, there are valid cases for expiring keys. In this case, it is to follow the data from a vehicule. When the vehicule returns to the depot and is shutdown, the last value should NOT be persisted for X hours until the vehicule wakes up again.

The vehicle should delete its own metric from the PGW when it shuts down. Just letting it disappear after a while would mean that a vehicle failure would look the same as a vehicle shutting down regularly.

This is the old debate about a timeout for pushed metrics. See README.md.

beorn7 avatar May 28 '20 19:05 beorn7

Good point. Striked out my previous comment.

xkilian avatar May 30 '20 02:05 xkilian