pelorus
pelorus copied to clipboard
Consider dropping / not accepting events that are above threashold
Problem:
- Currently some of the metrics are registered with and some without timestamp.
- Some of the metrics are not registered by Prometheus, because the event happened above accepted threashold
- For a new Pelorus exporter deployment the time of the event without timestamp is the time when the exporter was created and not actual time of the event, so if we have a failure that occurred a year ago it will be visible as current failure. This may be problematic to calculate our Mean Time to Restore or Change Failure Rate properly for a given time range.
For some of the metrics a timestamp is a value, here is the list:
Failure (Create / Resolution events) - timestamp as a value
failure_creation_timestamp (app, issue_number, timestamp)
failure_resolution_timestamp (app, issue_number, timestamp)
Commit Time
commit_timestamp (namespace, app, commit_hash, image_sha) timestamp
Deploy Time
deploy_timestamp (namespace, app, image_sha) timestamp
deployment_active (namespace, app, image_sha)
Prometheus accepts metrics which are without timestamps or the timestamp is within it's threshold, with it's error:
"Error on ingesting samples that are too old or are too far into the future"
Corresponding PR for the webhook deploytime: #943
+1