timescaledb-toolkit
timescaledb-toolkit copied to clipboard
Add counter_agg_partial
What's the functionality you would like to add
counter_agg(counter_agg_partial)
(implicitly performing the combine function on the aggregate, for re-aggregation purposes)
See #9 for some background (and maybe also #4 and #8).
The partial we create would be an intermediate form that stores:
- The first 2 and last 2 adjusted values within a time range (two values so that we can do the instantaneous calculations)
- the sum of any counter reset values within the range (where the
reset_value = val_before_reset + val_after_reset
, which is also equal to the adjusted_counter value at the last reset in the range)
ie for the example above:
first: 2020-12-02 21:47:50+00, 10
second: 2020-12-02 21:57:50+00, 120
penultimate: 2020-12-02 22:37:50+00, 8950
last: 2020-12-02 22:47:50+00 , 11960
reset_sum: 8950 // 3045 + 205 + 5200 + 500
We need the reset sum stored in order to determine if there was a counter reset when combining adjacent partials, so, if we were to combine this partial with another, we would use last - reset_sum
as the raw_last
value to compare to the first
value of the next partial in order to determine if we had a counter reset at the partial boundary.
Counter partials, and interaction with PG Aggregates
There are several items that need to be addressed as we define the aggregates for counters here as well:
- These will have combine functions, serialize, deserialize and final functions but will not be marked parallel safe. Which is a bit weird.
- We need to keep the full list of values in the combine state until we have an entire contiguous region, this could get somewhat memory intensive. We also need to determine how this interacts with PG aggregation conventions and where we want to reduce the state to our partial form. Normally, in PG you might get away with doing this in the final function only, but I think that's not a great idea here: Namely, for continuous aggregates we'll need to do it on serialization (otherwise we're storing the entire data set). For partitionwise aggregates, we'll need to do it at the combine stage if there's a combination, and if the combine function doesn't get called (I dont' know if there are codepaths where the partial -> final) we'll need to do it at the finalfunc stage.
- The FINALFUNCMODIFY flag will need to be set to SHAREABLE I think (see: https://www.postgresql.org/docs/current/sql-createaggregate.html#SQL-CREATEAGGREGATE-NOTES), because the finalfunc may
- I think the proper solution here is to have an internal state which is either in expanded or reduced form and the first thing we do on the serialize, combine or final func is call reduce which will only act if it's not already reduced. The finalfunc actually would produce a defined type potentially then, rather than just internal/bytes.
- setting finalfuncmodify to SHAREABLE will mean that this won't work in a window function context. We could create a separate window function having similar behavior (and only working on sets ordered by the time column, ie counter_window), which would have significantly more efficient behavior as it can keep one running list and it might be useful here, but I'm not 100% sure. I think we would need to keep the entire list of tuples, which would not be great memory wise, though, that's mainly if we want to support the inverse modes, but it could do cumulative diffs, which is a really nice feature. This might be something to save until we can do it on a timeseries, but not 100% sure.
Hmmm...this may be something that was leftover from before, and we changed the name and it was somewhat confusing and we never updated the old issue. We decided to use rollup
as the syntax for this, unless this is different than rollup
in some way that I'm missing?