opentelemetry-collector-contrib [exporter/prometheus] Support Prometheus Created Timestamp feature

trafficstars

Component(s)

exporter/prometheus

Is your feature request related to a problem? Please describe.

Hello, prometheus has an experimental feature called "Created Timestamp" which solves the problems outlined in this document

Describe the solution you'd like

I think using this involves passing StartTimeUnixNano from the OTEL data model as the Created Timestamp in the prometheus data model. I've got it working locally with just these changes to prometheusexporter/collector.go:

func (c *collector) convertSum(metric pmetric.Metric, resourceAttrs pcommon.Map) (prometheus.Metric, error) {
  ip := metric.Sum().DataPoints().At(0)
  ...
  m, err := prometheus.NewConstMetricWithCreatedTimestamp(desc, metricType, value, ip.StartTimestamp().AsTime(), attributes...)
  ...
}

Describe alternatives you've considered

No response

Additional context

Not sure if the prometheus maintainers think the feature is stable enough to enable by default in the OTEL collector. Or maybe a feature flag makes sense? If not, I can just add this in a fork of the collector for now

cc machine424

Apr 18 '24 15:04 swar8080

Pinging code owners:

exporter/prometheus: @Aneurysm9

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Apr 18 '24 15:04 github-actions[bot]

Hi @swar8080 I saw that you use exporter/prometheus. Could you tell me whether you use OTEL as receiver without prometheus receiver? Because I want to use that configuration only OTEL receiver and prometheus exporter for metrics but it does not work as I expected. I don't see any metrics in prometheus. I see them in OTEL logs. So I wondered whether you have success with such configuration and maybe can halp me? :)

Apr 22 '24 10:04 sterziev88

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/prometheus: @Aneurysm9

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Jun 24 '24 03:06 github-actions[bot]

Hey 👋

I'm one of the folks who implemented created timestamp in Prometheus. Happy to tackle this one if nobody is working on it already :)

Jun 26 '24 19:06 ArthurSens

Linking https://github.com/prometheus/client_golang/issues/1535 as a dependency since OTEL collector uses const histogram and const summaries from the prometheus client library, which currently don't support created timestamp

Jun 26 '24 19:06 swar8080

@Aneurysm9 @ArthurSens want to callout a possible challenge

I'm not sure StartTimeUnixNano can be used for created timestamp. That's because the java OTEL sdk (and others I assume) doesn't set StartTimeUnixNano the time the metric series was created. Instead, it seems to use the creation time of the OTEL Meter that was used to create the series. The Meter may be created at application start-up and then series are generated dynamically later on, so the StartTimeUnixNano may be way in the past. That causes problems with the created timestamp feature since zero is injected way in the past and often isn't picked-up by increase and rate.

I'm getting around this in our OTEL collector fork by setting StartTimeUnixNano to 10 seconds in the past when prometheusexporter gets a cache miss in collector.go#accumulateSum. Not sure that's a good solution though

Jun 27 '24 16:06 swar8080

Coming from the Prometheus ecosystem, I'm still catching up on OTel implementations and specifications, so take what I say with a bit of salt 😬

From the Specification, I understand that the StartTimeUnixNano is related to a single data point. For cumulative data points, it's the time of the first observation of each data point. If the java SDK is correlating StartTimeUnixNano with the Meter creation, instead of data point observation, could it be that the java implementation is simply wrong?

Jun 28 '24 00:06 ArthurSens

Start timestamp is currently under-specified in SDKs, and is currently primarily used to correctly calculate rates in the presence of restarts: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#temporality. SDKs don't currently have the ability to delete timeseries today, so the process start time (or instrument creation time) is "technically correct" as the earliest time a point could have been recorded. It prevents SDKs from needing to pay the cost of tracking per-series start times.

From my reading of the spec, the hard requirement for SDKs comes from here: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#metricreader:

For instruments with Cumulative aggregation temporality, successive data points received by successive calls to MetricReader.Collect MUST repeat the same starting timestamps (e.g. (T0, T1], (T0, T2], (T0, T3]).

I think you misread the spec slightly, it says:

The StartTimeUnixNano of each point matches the StartTimeUnixNano of the initial observation.

Not that this is not saying that the StartTimeUnixNano of each point matches the (end) TimeUnixNano of the initial observation. In other words, the start time needs to be repeated across observations, but the start time doesn't need to be close to the initial end time.

@ArthurSens what did you have in mind / assume for what the start time should be set to? The instrument creation time, or the timeseries creation time?

Jun 28 '24 13:06 dashpole

Thanks for the clarification!

@ArthurSens what did you have in mind / assume for what the start time should be set to? The instrument creation time, or the timeseries creation time?

With each time series creation time.

The StartTimeUnixNano of each point matches the StartTimeUnixNano of the initial observation.

A point in OTel is what we call a sample in Prometheus, right? I guess what is unclear is "initial observation of what?".

Looking at the Proto, I can see a few things different when comparing to OpenMetrics indeed.

Just to make sure I'm understanding correctly the prom-otel translation:

OTel NumberDataPoint <-> Prometheus Sample
OTel Sum <-> Prometheus Counter

If the above comparison is correct, I can see how in OpenMetrics the created timestamp is correlated with the timeseries, the comment there even says The time values began being collected for this counter. In OTel, StartTimeUnixNano is correlated with the Sample, and not the counter itself, which kinda proves that they are different things...?

Jun 28 '24 18:06 ArthurSens

I wonder how "correct" each approach is 🤔

Let's say an application starts at a time T = 0, some HTTP requests are happening and they are all successful (i.e. status code 200). After 30s (T+30), the first request with code 404 happens and it continues to happen every 1s.

1 minute after T HTTP requests with status code 404 would have happened 30 times, but let's see how different the measurement would be if we use T or T+30 as the start time.

If StartTimeUnixNano = T, $rate = \frac{30} {60 - 0} = 0.5$ req/sec
If StartTimeUnixNano = T+30, $rate = \frac{30} {60-30} = 1$ req/sec

The second one looks more accurate to me 🤨

The correctness that OTel wanted to achieve with StartTimeUnixNano, was it something different than what I've described here? What was the thing that OTel wanted to do correctly when adding StartTimeUnixNano to the spec?

Jun 28 '24 18:06 ArthurSens

A point in OTel is what we call a sample in Prometheus, right? I guess what is unclear is "initial observation of what?".

I think "initial observation" here is meant to mean "initial timeseries" or "initial sample". If it is push, it would mean the first time the timeseries was pushed, and if it is pull, it would mean the first time it is scraped.

Your comparison is mostly correct (NumberDataPoint is only for gauges and sums, and there are other datapoint types for Histogram, ExponentialHistogram, and Summary).

proves that they are different things...?

Structurally, I think the data models have the same information. The important thing for both is that the start/created timestamp is associated with a single value, and a single set of attributes.

An OpenTelemetry Sum can contains one or more NumberDataPoints, each with attributes, a value, and a start/end timestamp.
A Prometheus Counter MetricFamily contains one or more Samples, each with labels, a value, and a created/end timestamp.

The protobufs are structured slightly quite differently, though. The OpenMetrics proto format, for example, assumes that all points have the same timestamp (which makes sense for scraped metrics), whereas OTLP metrics allows each point to have a different timestamp. But I don't think that changes things for this discussion.

The correctness that OTel wanted to achieve with StartTimeUnixNano, was it something different than what I've described here? What was the thing that OTel wanted to do correctly when adding StartTimeUnixNano to the spec?

Thats exactly the question. The start timestamp today is a tool to detect resets. A different start timestamp means your cumulative values reset.

Note that Prometheus' implementation of creation time satisfies OpenTelemetry's definition of start time, since the start time changes when cumulatives are reset, and does not change if it does not reset. OTel SDK authors would be allowed to implement Prometheus' definition of start time if they wanted to. But none have chosen to do so since it isn't required, and tracking per-series start time is more work.

Basically, we could propose a change to the OTel SDK spec which recommends that start times be set differently, but we just need to make the case (as you do in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/32521#issuecomment-2197445532).

The other question we need to figure out is: what should the start time be set to when the first observation is made? The current time (which would produce data points with identical start + end)? The previous export time?

Jun 28 '24 19:06 dashpole

Thanks a lot for all the info, David! It helps a lot :)

To move forward here, I guess it should be fine to translate OTLP's StartTimeUnixNano to Prometheus' Created Timestamps as long as we document this difference for now.

Long term plan would be to bring this discussion with the wider community and suggest that OTel SDKs implement Prometheus definition, by default or through extra configuration.

Does that sound good? :)

Jun 28 '24 19:06 ArthurSens

That sounds good to me.

Jun 28 '24 19:06 dashpole

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/prometheus: @Aneurysm9 @dashpole

See Adding Labels via Comments if you do not have permissions to add labels yourself.

Aug 28 '24 03:08 github-actions[bot]

Not inactive. Related PR is #34596

Aug 28 '24 15:08 ArthurSens

opentelemetry-collector-contrib opentelemetry-collector-contrib copied to clipboard

[exporter/prometheus] Support Prometheus Created Timestamp feature

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard