client_rust Implement timestamp encoder

The OpenMetrics spec allows for an optional timestamp field to be included along with metrics. This is generally discouraged, as Prometheus effectively determines the real measurement time when it scrapes for metrics, but adding it does make sense in combination with the experimental backfilling feature.

This PR introduces an alternative text encoder that reuses most of the existing logic and simply injects an additional timestamp value to the appropriate MetricPoints.

Oct 20 '23 09:10 asmello

Ok, I just realised this overlaps with #129, but I'm not sure adding timestamps to const metrics solves the backfill use case fully, unless we're supposed to create const types from the normal ones?

Oct 20 '23 10:10 asmello

To provide some additional context, my end goal is to be able to capture metrics while disconnected from a Prometheus instance. Producing the OpenMetrics format with timestamps is how Prometheus supports backfilling, so my initial idea was to just make this crate support encoding in that format so I could use it for this use-case.

I eventually realised that the OpenMetrics format doesn't quite work as I needed it to, though, at least not directly. In order to collect a time-series of metric points for backfilling, it's not sufficient to keep appending timestamped expositions to a file, because of a few constraints the spec imposes:

The file has to end with # EOF (and there can only be one such line)
The descriptors have to immediately precede each cluster of points from the same metric family
There can be no interleaving of metric families (or points in the same label group)

So in order to use OpenMetrics directly, one would need to first decode the metrics and then produce a combined exposition containing both the old and the new points, effectively regenerating the entire file with each update. That's a fairly convoluted and inefficient way to operate.

What I ended up going with instead is going with a slightly relaxed version of the OpenMetrics grammar that allows for violating the above constraints. This way I can keep naively appending to the same file, and then run a post-processor to convert the output to strict OpenMetrics.

Unfortunately, this approach is not possible with prometheus_client because it doesn't expose a gather() API like the older prometheus crate does. We could potentially use a different intermediate representation, however, where we keep appending full expositions (descriptors, EOF line and all) each update. That's even more inefficient, but technically works.

By the way, what is the reasoning behind the decision to make access to metrics in the registry internal? Seems like you intentionally limit it so only this crate can implement encoders. If something like a gather() method on the Registry existed perhaps this timestamp-aware encoder wouldn't be necessary on this side.

Nov 13 '23 19:11 asmello

By the way, what is the reasoning behind the decision to make access to metrics in the registry internal? Seems like you intentionally limit it so only this crate can implement encoders. If something like a gather() method on the Registry existed perhaps this timestamp-aware encoder wouldn't be necessary on this side.

The reason I expose as little as possible is two fold:

It keeps the crate simple, a small surface makes it easier to understand the crate for the average user.
I am the only maintainer, maintaining the crate in my spare time. The more I expose, the more I have to maintain, the harder it is to make changes as each results in a breaking change if exposed.

Unfortunately (2) is the most important here.

Nov 22 '23 09:11 mxinden

The reason I expose as little as possible is two fold:

It keeps the crate simple, a small surface makes it easier to understand the crate for the average user.

I am the only maintainer, maintaining the crate in my spare time. The more I expose, the more I have to maintain, the harder it is to make changes as each results in a breaking change if exposed.

Unfortunately (2) is the most important here.

This seems perfectly fair, but it also seems to me that giving more power to the library users would probably reduce work for you as a maintainer, since there would be less need of changes at the library core to account for niche use cases. To my understanding, the OpenMetrics data model is pretty stable, so exposing that part doesn't seem like a bad deal.

Just food for thought. I'm no longer blocking on this PR. Happy to close if you don't think it's headed in a productive direction - but also happy to make any changes you think might get it to the merging bar.

Nov 22 '23 18:11 asmello

I absolutely understand that this is not a priority and just wanted to add our use case to the discussion.

We have an IoT edge device that can be offline for hours or even days. In such a case we write the metrics to an SQLite database so that we can backfill it later into our cloud. So our use case is more about converting an SQLite table to an OpenMetrics file. In this case we would need to add the timestamp from the Database to every metric.

With the rust-prometheus library this is possible with a workaround similar to the one from asmello: https://github.com/tikv/rust-prometheus/issues/423

Not sure if our approach is the ideal and we are still in phase of determining the best strategies for our device metrics.

Apr 08 '24 11:04 kmeinhar

We have an IoT edge device that can be offline for hours or even days. In such a case we write the metrics to an SQLite database so that we can backfill it later into our cloud.

Why don't you use the Prometheus Push Gateway?

https://prometheus.io/docs/practices/pushing/

Apr 08 '24 13:04 mxinden

Why don't you use the Prometheus Push Gateway?

https://prometheus.io/docs/practices/pushing/

The use case for that is different, it's for getting around network limitations rather than backfilling. In fact, since Prometheus commits head blocks every 3 hours, the push gateway would fail to push any data that is older than that (which is likely to be the case if you have a need to backfill edge devices). I think the push gateway would only be useful to get around some transient connectivity issues (possibly, I'm not sure sends are can be made reliable or if they're send-and-forget).

Apr 08 '24 17:04 asmello