pelorus Consider dropping / not accepting events that are above threashold

Consider dropping / not accepting events that are above threashold

Open mpryc opened this issue 1 year ago • 2 comments

Problem:

Currently some of the metrics are registered with and some without timestamp.
Some of the metrics are not registered by Prometheus, because the event happened above accepted threashold
For a new Pelorus exporter deployment the time of the event without timestamp is the time when the exporter was created and not actual time of the event, so if we have a failure that occurred a year ago it will be visible as current failure. This may be problematic to calculate our Mean Time to Restore or Change Failure Rate properly for a given time range.

For some of the metrics a timestamp is a value, here is the list:

Failure (Create / Resolution events) - timestamp as a value

    failure_creation_timestamp (app, issue_number, timestamp)
    failure_resolution_timestamp (app, issue_number, timestamp)

Commit Time

    commit_timestamp (namespace, app, commit_hash, image_sha) timestamp

Deploy Time

    deploy_timestamp (namespace, app, image_sha) timestamp
    deployment_active (namespace, app, image_sha)

Prometheus accepts metrics which are without timestamps or the timestamp is within it's threshold, with it's error:

"Error on ingesting samples that are too old or are too far into the future"

Apr 14 '23 11:04 mpryc

Corresponding PR for the webhook deploytime: #943

Apr 27 '23 11:04 mpryc

May 02 '23 18:05 weshayutin