pelorus icon indicating copy to clipboard operation
pelorus copied to clipboard

Consider dropping / not accepting events that are above threashold

Open mpryc opened this issue 1 year ago • 2 comments

Problem:

  1. Currently some of the metrics are registered with and some without timestamp.
  2. Some of the metrics are not registered by Prometheus, because the event happened above accepted threashold
  3. For a new Pelorus exporter deployment the time of the event without timestamp is the time when the exporter was created and not actual time of the event, so if we have a failure that occurred a year ago it will be visible as current failure. This may be problematic to calculate our Mean Time to Restore or Change Failure Rate properly for a given time range.

For some of the metrics a timestamp is a value, here is the list:

Failure (Create / Resolution events) - timestamp as a value

    failure_creation_timestamp (app, issue_number, timestamp)
    failure_resolution_timestamp (app, issue_number, timestamp)

Commit Time

    commit_timestamp (namespace, app, commit_hash, image_sha) timestamp

Deploy Time

    deploy_timestamp (namespace, app, image_sha) timestamp
    deployment_active (namespace, app, image_sha)

Prometheus accepts metrics which are without timestamps or the timestamp is within it's threshold, with it's error:

"Error on ingesting samples that are too old or are too far into the future"

mpryc avatar Apr 14 '23 11:04 mpryc

Corresponding PR for the webhook deploytime: #943

mpryc avatar Apr 27 '23 11:04 mpryc

+1

weshayutin avatar May 02 '23 18:05 weshayutin