agent Add `track_timestamps_staleness` to prometheus.scrape

Request

The track_timestamps_staleness feature was added to Prometheus recently. To pick it up, we would need to upgrade to Prometheus v2.48 or above.

track_timestamps_staleness is false by default in Prometheus, because it is considered a breaking change. We should consider whether it could be set to true in the Agent by default. For example, does setting this to false have any benefits? Are there any implications to alerts which rely on these metrics, and would that be s serious breaking change to users?

Also, we need to go into more detail in the docs about what metrics with explicit timestamps are, and what staleness markers are, so that users can make a decision how to configure track_timestamps_staleness. This is what the Prometheus docs say for track_timestamps_staleness:

# track_timestamps_staleness controls whether Prometheus tracks staleness of
# the metrics that have an explicit timestamps present in scraped data.
#
# If track_timestamps_staleness is set to "true", a staleness marker will be
# inserted in the TSDB when a metric is no longer present or the target
# is down.
[ track_timestamps_staleness: <boolean>  | default = false ]

I personally didn't even know there could be "metrics that have an explicit timestamps present in scraped data". It is also not clear to me what the implications of adding a staleness marker are.

Use case

Setting track_timestamps_staleness to true could fix a bug where container metrics linger for 5 minutes after a container dies.

Dec 06 '23 10:12 ptodev

Note that there is also an open PR for the Agent to add a Prometheus feature which is not available in Prometheus 2.48. Maybe we could upgrade the Agent straight to Prometheus 2.49 so that we can resolve both #5921 and #5905?

Dec 06 '23 10:12 ptodev

cAdvisor puts a timestamp on its metrics; it does this because it sources the data asynchronously so it doesn't want the Prometheus scrape timestamp.

Adding a staleness marker just makes the series stop. This is what happens for every other metric that uses Prometheus' timestamp, when it goes away.

Dec 18 '23 18:12 bboreham

Adding a staleness marker just makes the series stop. This is what happens for every other metric that uses Prometheus' timestamp, when it goes away.

Yes, but unlike non-timestamped metrics, such a target could expose a sample with an old timestamp in a scrape or two. Wouldn't this be a problem if we had already inserted a staleness marker by then?

Dec 19 '23 08:12 ptodev

No, there is no problem going from {1, 2, stale} to {1, 2, 3, stale}. Or to {1, 2, stale, 3}.

But there is a knob to turn it off, in case you run into trouble. My contention is that 99% of the world wants it on, even though Prometheus v2 will default it off for compatibility reasons.

Jan 05 '24 11:01 bboreham

#5972 was merged, but we will need to raise a new PR for this, since #5972 intentionally did not expose a track_timestamps_staleness config entry.

Jan 05 '24 14:01 ptodev

This issue has not had any activity in the past 30 days, so the needs-attention label has been added to it. If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue. The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity. Thank you for your contributions!

Feb 06 '24 00:02 github-actions[bot]

cc @ptodev did we actually end up concluding the discussion on #5972?

Do we plan to wire in this new field, and what should the default be after all?

Feb 20 '24 13:02 tpaschalis

Yes, @bboreham added it in #6317. So far on the main branch, track_timestamps_staleness is set to false by default. I'll close this issue and if there is a need we could reopen it or open another one.

Feb 21 '24 10:02 ptodev

agent agent copied to clipboard

Add `track_timestamps_staleness` to prometheus.scrape

Request

Use case

agent
agent copied to clipboard