Aligning the start and end time of the interval with the wall clock interval boundary for Ruler rules.

Open dilnawaj1 opened this issue 6 months ago • 1 comments

Is your feature request related to a problem? Please describe. When a Ruler recording rule is executed for an interval say 5min, execution time (10:07) for the query is picked up randomly. The interval (10:02 - 10:07) is based on the execution time and not aligned to wall clock.

It is very important for recorded rules to produce output aligned to wall clock, based on data from an aligned interval. This is the standard expectation and it also produces metrics produced by all rules to have standard interval boundaries (unlike the current implementation, where each recorded metric will have its own interval start/end time.

Describe the solution you'd like The expectation is the interval is independent of query execution time and always aligned to wall clock. Eg query for interval 10:00 - 10:05 with recorded time of 10:05. It can execute anytime between 10:05 and 10:10 query for interval 10:05 - 10:10 with recorded time of 10:10. It can execute anytime between 10:10 and 10:15

Describe alternatives you've considered Willing to consider any alternative. This seems to be fundamental feature, that cannot be worked around.

Additional context Add any other context or screenshots about the feature request here.

Jul 09 '25 07:07 dilnawaj1

Prometheus metrics have an associated staleness

To respect the 5 minutes default staleness in your example, it would need to run at exactly 10:05. it looks like you want to extend the staleness 5 more minutes(1), so that it can run till 10:10 but also keep the starting point at 10:05 (2). Both can be implemented, they do require a change in prometheus, not something we can do in cortex per se, I feel. you need to set this starting offset at the group level of rules.As rules run in groups sequentially in a single processor. See here So your suggestion could only apply if the group has only one rule. it makes no sense for a group of rules in prometheus.(3)

For cortex, there is a feature that let us break this prometheus limitation.

# Max concurrency for a single rule group to evaluate independent rules.
# CLI flag: -ruler.max-concurrent-evals
[max_concurrent_evals: <int> | default = 1]

I am counting 3 features already. You only have a way to do one of them in cortex (3). The other 2, I think you need a prometheus feature for those. Those changes would create complexity. I am not sure such feature would be accepted easily upstream.

I think this could be implemented separately without touching cortex implementation. You can read metrics in cortex and send metrics to cortex with timestamps via remote-write. it's not something we can put in cortex OSS, but it's a nice hack project to do.

this is something that might be interesting for this https://github.com/cortexproject/cortex/pull/4808

Jul 10 '25 01:07 friedrichg