agent
agent copied to clipboard
Add `track_timestamps_staleness` to prometheus.scrape
Request
The track_timestamps_staleness
feature was added to Prometheus recently. To pick it up, we would need to upgrade to Prometheus v2.48 or above.
track_timestamps_staleness
is false
by default in Prometheus, because it is considered a breaking change. We should consider whether it could be set to true
in the Agent by default. For example, does setting this to false
have any benefits? Are there any implications to alerts which rely on these metrics, and would that be s serious breaking change to users?
Also, we need to go into more detail in the docs about what metrics with explicit timestamps are, and what staleness markers are, so that users can make a decision how to configure track_timestamps_staleness
. This is what the Prometheus docs say for track_timestamps_staleness
:
# track_timestamps_staleness controls whether Prometheus tracks staleness of
# the metrics that have an explicit timestamps present in scraped data.
#
# If track_timestamps_staleness is set to "true", a staleness marker will be
# inserted in the TSDB when a metric is no longer present or the target
# is down.
[ track_timestamps_staleness: <boolean> | default = false ]
I personally didn't even know there could be "metrics that have an explicit timestamps present in scraped data". It is also not clear to me what the implications of adding a staleness marker are.
Use case
Setting track_timestamps_staleness
to true
could fix a bug where container metrics linger for 5 minutes after a container dies.
Note that there is also an open PR for the Agent to add a Prometheus feature which is not available in Prometheus 2.48. Maybe we could upgrade the Agent straight to Prometheus 2.49 so that we can resolve both #5921 and #5905?
cAdvisor puts a timestamp on its metrics; it does this because it sources the data asynchronously so it doesn't want the Prometheus scrape timestamp.
Adding a staleness marker just makes the series stop. This is what happens for every other metric that uses Prometheus' timestamp, when it goes away.
Adding a staleness marker just makes the series stop. This is what happens for every other metric that uses Prometheus' timestamp, when it goes away.
Yes, but unlike non-timestamped metrics, such a target could expose a sample with an old timestamp in a scrape or two. Wouldn't this be a problem if we had already inserted a staleness marker by then?
No, there is no problem going from {1, 2, stale} to {1, 2, 3, stale}. Or to {1, 2, stale, 3}.
But there is a knob to turn it off, in case you run into trouble. My contention is that 99% of the world wants it on, even though Prometheus v2 will default it off for compatibility reasons.
#5972 was merged, but we will need to raise a new PR for this, since #5972 intentionally did not expose a track_timestamps_staleness
config entry.
This issue has not had any activity in the past 30 days, so the needs-attention
label has been added to it.
If the opened issue is a bug, check to see if a newer release fixed your issue. If it is no longer relevant, please feel free to close this issue.
The needs-attention
label signals to maintainers that something has fallen through the cracks. No action is needed by you; your issue will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!
cc @ptodev did we actually end up concluding the discussion on #5972?
Do we plan to wire in this new field, and what should the default be after all?
Yes, @bboreham added it in #6317. So far on the main branch, track_timestamps_staleness
is set to false
by default. I'll close this issue and if there is a need we could reopen it or open another one.