carbon icon indicating copy to clipboard operation
carbon copied to clipboard

Allow the aggregator to consistently accept past data

Open bucko909 opened this issue 1 year ago • 4 comments

The discovery of #956 was due to replaying a huge swathe of past data into our aggregator instance and finding the data full of holes. I'd assumed it was due to early-expiry of old IntervalBuffer objects, but after writing this fix I found this wasn't the case. However, the behaviour is still likely incorrect when data is played over an aggregation interval. Suppose the aggregator config is:

a (60) = sum a

And the input is:

a 0 1
...
a 1 19
<aggregation timeout triggers here at time (MAX_AGGREGATION_INTERVALS + 1) * 60 or higher>
a 1 20
...
a 1 59

Then the aggregation will aggregate seconds 0-29 and output this (value 20) to the cache, then expire the interval as it's too old, then in the next timeout, aggregate seconds 31-60, output this to the cache and expire the interval again. The final value is the aggregate of seconds 20-59, which is 40 instead of the expected 60. There is no way to control when aggregation occurs, so replayed data will be inconsistent, and in this case could be any value between 1 and 60, though if the replay is faster than real-time, the most likely will be 60.

After these changes, we allow the system to keep the newest interval prior to the current time-frame, provided it's been submitted within the current time-frame. This means if old data is submitted in-order, as would be expected from a replay, the intervals will always be aggregated in a complete state at least once before expiry. This is somewhat robust to curiosities in the timestamps of the real-time submitting process (for example, if that process happens to round time down, or be pushing a prediction of future data or similar), as the number of kept intervals is based on submission time rather than interval time.

bucko909 avatar Aug 21 '24 16:08 bucko909

Push just reset to preferred email.

bucko909 avatar Aug 21 '24 16:08 bucko909

@bucko909 : linter says that current_interval is not in use, could you please check? Ignore Deepsource, it should be removed

deniszh avatar Aug 24 '24 08:08 deniszh

Yep, I think it was just a copy/paste error. Apologies for that.

bucko909 avatar Aug 29 '24 15:08 bucko909

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 50.68%. Comparing base (fdc56f6) to head (7ea6ce9). Report is 25 commits behind head on master.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #958      +/-   ##
==========================================
+ Coverage   50.63%   50.68%   +0.04%     
==========================================
  Files          36       36              
  Lines        3446     3453       +7     
  Branches      535      527       -8     
==========================================
+ Hits         1745     1750       +5     
- Misses       1574     1576       +2     
  Partials      127      127              

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Aug 29 '24 15:08 codecov-commenter

Another great addition, thanks, @bucko909 !

deniszh avatar Feb 02 '25 16:02 deniszh