cortex-jsonnet icon indicating copy to clipboard operation
cortex-jsonnet copied to clipboard

Use new compaction interval metric in compaction failed alert.

Open cstyan opened this issue 4 years ago • 6 comments
trafficstars

Signed-off-by: Callum Styan [email protected]

What this PR does: Changes compaction failed alert to use new cortex_compactor_compaction_interval_seconds metric, so that we can more accurately alert when two compactions in a row have failed, regardless of whether the compaction interval is 2h (the value currently used within the increase function of the alert) or not.

Not sure if there should be a changelog entry here.

Checklist

  • [ ] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

cstyan avatar Apr 20 '21 23:04 cstyan

Can we go back to using increase(cortex_compactor_runs_failed_total[2h]) but with interval defined in the metric? I don't think that's possible though :(

pstibrany avatar Apr 21 '21 07:04 pstibrany

Can we go back to using increase(cortex_compactor_runs_failed_total[2h]) but with interval defined in the metric? I don't think that's possible though :(

Yeah, my understanding is that we can't do this in PromQL right now. @beorn7 would be able to confirm.

cstyan avatar Apr 21 '21 20:04 cstyan

Are you talking about using the value of a metric rather than the duration literal 2h in [2h]?

If that's the case, then the answer is: no, that's currently not possible in PromQL. I do think we should make that possible, but it's one of those "not as easy as it looks" problems. Right now, the PromQL engine can find out what time range the query will access in the TSDB, just from static analysis of the query. With allowing the value coming from another expression, you now have to evaluate that "inner query" first to find out what time range the "outer query" will need to access.

For reference: My brainstorming doc about timestamps and durations: https://docs.google.com/document/d/1jMeDsLvDfO92Qnry_JLAXalvMRzMSB1sBr9V7LolpYM/edit#heading=h.vmb7pe7hp12

beorn7 avatar Apr 21 '21 21:04 beorn7

May you confirm we can't query the last time (seconds) a metric has increased, right?

On Wed, Apr 21, 2021 at 11:19 PM Björn Rabenstein @.***> wrote:

Are you talking about using the value of a metric rather than the duration literal 2h in [2h]?

If that's the case, then the answer is: no, that's currently not possible in PromQL. I do think we should make that possible, but it's one of those "not as easy as it looks" problems. Right now, the PromQL engine can find out what time range the query will access in the TSDB, just from static analysis of the query. With allowing the value coming from another expression, you now have to evaluate that "inner query" first to find out what time range the "outer query" will need to access.

For reference: My brainstorming doc about timestamps and durations: https://docs.google.com/document/d/1jMeDsLvDfO92Qnry_JLAXalvMRzMSB1sBr9V7LolpYM/edit#heading=h.vmb7pe7hp12

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/grafana/cortex-jsonnet/pull/293#issuecomment-824364466, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAM7QEDKH24XWAEMR6UBJOTTJ46N3ANCNFSM43JGNJQQ .

pracucci avatar Apr 22 '21 07:04 pracucci

I believe there is no straightforward way to do this within PromQL. But perhaps an expert can prove me wrong here?

There is the timestamp() function to access the actual timestamp, but it only works on an instant vector. So whenever you apply it to an expression (e.g. to filter for the last sample before an increase or the first after an increase), it will only yield the evaluation timestamp, not the timestamp of any samples that fed into the expression.

beorn7 avatar Apr 22 '21 17:04 beorn7

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jun 15 '22 17:06 CLAassistant