timescaledb-toolkit `delta(counter_agg())` drops 1 delta per `time

Relevant system information:

OS: timescale forge & wsl2
PostgreSQL version: Ubuntu 14.2-1.pgdg21.04+1

TimescaleDB Toolkit version:

        Name         | Version |   Schema   
---------------------+---------+------------
 plpgsql             | 1.0     | pg_catalog 
 timescaledb         | 2.6.0   | public     
 timescaledb_toolkit | 1.5.1   | public

Installation method: Timescale Forge & docker

Describe the bug delta(counter_agg(time, value)) under-reports data when used with time_bucket. Found via inference, it appears that the first or last delta per bucket is dropped. This means that counters, when graphed via time_bucket + delta(counter_agg()) are always reporting incorrect data.

To Reproduce Steps to reproduce the behavior:

create extension if not exists timescaledb_toolkit;
drop table if exists repro;
create table repro(time timestamptz, n int);

insert into repro values
  (now(), 100),
  (now() - interval '10 seconds', 99),
  (now() - interval '20 seconds', 98),
  (now() - interval '30 seconds', 97),
  (now() - interval '40 seconds', 96),
  (now() - interval '50 seconds', 95),
  (now() - interval '60 seconds', 94),
  (now() - interval '70 seconds', 93),
  (now() - interval '80 seconds', 92),
  (now() - interval '90 seconds', 91),
  (now() - interval '100 seconds', 90),
  (now() - interval '110 seconds', 89),
  (now() - interval '120 seconds', 88),
  (now() - interval '130 seconds', 87),
  (now() - interval '140 seconds', 86),
  (now() - interval '150 seconds', 85),
  (now() - interval '160 seconds', 84),
  (now() - interval '170 seconds', 83),
  (now() - interval '180 seconds', 82),
  (now() - interval '190 seconds', 81),
  (now() - interval '200 seconds', 80),
  (now() - interval '210 seconds', 79),
  (now() - interval '220 seconds', 78),
  (now() - interval '230 seconds', 77),
  (now() - interval '240 seconds', 76),
  (now() - interval '250 seconds', 75),
  (now() - interval '260 seconds', 74),
  (now() - interval '270 seconds', 73),
  (now() - interval '280 seconds', 72),
  (now() - interval '290 seconds', 71),
  (now() - interval '300 seconds', 70);


with win_delta as (
  -- get each real row delta as simply as possible
  select
    time,
    n - lag(n) over (order by time) as change
  from repro
),
win as (
  -- combine the simple row deltas into 1 minute buckets and sum the deltas within the minute
  -- obviously we expect sum of "6" for each minute
  select time_bucket('1m', time) as time, sum(change) as window_change from win_delta group by 1
),
delt as (
  select
    -- More convenient timescale function; however real data is 1 bucket higher than whatever this reports
    time_bucket('1m', time) as time,
    delta(counter_agg(time, n))
  from repro
  group by 1
)
select win.time, window_change, delta
from win join delt on win.time = delt.time
order by 1

Expected behavior Either strategy reports the real delta:

          time          | window_change | delta
------------------------+---------------+-------
 2022-03-05 21:07:00+00 |             2 |     2
 2022-03-05 21:08:00+00 |             6 |     6
 2022-03-05 21:09:00+00 |             6 |     6
 2022-03-05 21:10:00+00 |             6 |     6
 2022-03-05 21:11:00+00 |             6 |     6
 2022-03-05 21:12:00+00 |             4 |     4

Actual behavior delta() misses data:

          time          | window_change | delta
------------------------+---------------+-------
 2022-03-05 21:07:00+00 |             2 |     2
 2022-03-05 21:08:00+00 |             6 |     5
 2022-03-05 21:09:00+00 |             6 |     5
 2022-03-05 21:10:00+00 |             6 |     5
 2022-03-05 21:11:00+00 |             6 |     5
 2022-03-05 21:12:00+00 |             4 |     3

Mar 05 '22 21:03 kvc0

@WarriorOfWire , really appreciate you for providing a concrete example here.

Mar 07 '22 21:03 jerryxwu

Here's my best guess as to what's happening, unfortunately it's not a solution yet, but I figure it'd be a good idea to checkpoint this for everyone to see.

While the two methods look similar, they have vastly different semantics, and I think this explains the difference in outputs. The first strategy

with diff as (
    SELECT
        time,
        n - lag(n) over(order by time) as change
    FROM repro
)
SELECT
    time_bucket('1 minute', time),
    sum(change) as delta
FROM diff
GROUP BY time_bucket('1 minute', time);

does:

Calculate the instantaneous rate of change at every pair of points in the dataset.
Truncates the timevalue to a granularity of 1 minute.
Buckets the dataset such that all values with the same truncated timestamp go into the same bucket.
Calculates the overall rate of change in each bucket as the sum of the the instantaneous rates of change within said bucket.

Meanwhile the second strategy

SELECT
    time_bucket('1 minute', time),
    delta(counter_agg(time, n))
FROM repro
GROUP BY time_bucket('1 minute', time);

does something very different:

Truncates the timevalue to a granularity of 1 minute.
Buckets the dataset such that all values with the same truncated timestamp go into the same bucket.
Gets the earliest and latest value within each bucket.
Calculates the overall rate of change in each bucket as the difference between the earliest and latest elements within the bucket.

The difference in what they’re calculating turns out to be benign in this instance, but the difference in order leads to a subtle issue: if you have a timeseries like

     time    |  n  
-------------+-----
 00:00:00+00 |  70
 00:00:10+00 |  71
 00:00:20+00 |  72
 00:00:30+00 |  73
 00:00:40+00 |  74
 00:00:50+00 |  75
 00:01:00+00 |  76
 00:01:10+00 |  77
 00:01:20+00 |  78
 00:01:30+00 |  79
 00:01:40+00 |  80
 00:01:50+00 |  81

if you calculate the deltas first you will end up with

     time    |  n  
-------------+-----
 00:00:00+00 |  0
 00:00:10+00 |  1
 00:00:20+00 |  1
 00:00:30+00 |  1
 00:00:40+00 |  1
 00:00:50+00 |  1
 00:01:00+00 |  1
 00:01:10+00 |  1
 00:01:20+00 |  1
 00:01:30+00 |  1
 00:01:40+00 |  1
 00:01:50+00 |  1

which bucketing will turn into

     time    |  n  
-------------+-----
 00:00:00+00 |  0
 00:00:10+00 |  1
 00:00:20+00 |  1
 00:00:30+00 |  1
 00:00:40+00 |  1
 00:00:50+00 |  1
 -- bucket change
 00:01:00+00 |  1
 00:01:10+00 |  1
 00:01:20+00 |  1
 00:01:30+00 |  1
 00:01:40+00 |  1
 00:01:50+00 |  1

and summation will turn into

     time    |  sum  
-------------+-----
 00:00:00+00 |   5
 -- bucket change
 00:01:00+00 |   6

if instead you bucket first, the bucketing will transform the original dataset into

     time    |  n  
-------------+-----
 00:00:00+00 |  70
 00:00:10+00 |  71
 00:00:20+00 |  72
 00:00:30+00 |  73
 00:00:40+00 |  74
 00:00:50+00 |  75
 -- bucket change
 00:01:00+00 |  76
 00:01:10+00 |  77
 00:01:20+00 |  78
 00:01:30+00 |  79
 00:01:40+00 |  80
 00:01:50+00 |  81

and the deltas will calculate

     time    |  n  
-------------+-----
 00:00:00+00 |  0
 00:00:10+00 |  1
 00:00:20+00 |  1
 00:00:30+00 |  1
 00:00:40+00 |  1
 00:00:50+00 |  1
 -- bucket change
 00:01:00+00 |  0
 00:01:10+00 |  1
 00:01:20+00 |  1
 00:01:30+00 |  1
 00:01:40+00 |  1
 00:01:50+00 |  1

(note the leading 0 in the second bucket) for final values

     time    |  sum  
-------------+-----
 00:00:00+00 |   5
 -- bucket change
 00:01:00+00 |   5

I think this is fundamental to the time_bucket()+GROUP BY pattern: it loses information. We have with_bounds() which can make up for it in some cases, but I don’t know yet if it can solve this one. There also might be potential solutions with window functions.

TLDR: I think the two queries are bucketing the data differently, and that's what's causing the different in output, but I'm not yet sure how to get the desired output with counter_agg().

Mar 08 '22 16:03 JLockerman

@JLockerman just to check my assumptions, this is in fact an issue right? I notice this is exactly the same query pattern as the docs show. Specifically this query from that doc:

SELECT measure_id,
    time_bucket('15 min'::interval, ts) as bucket,
    delta(
        counter_agg(ts, val)
    )
FROM example
GROUP BY measure_id, time_bucket('15 min'::interval, ts);

I expect this query also under-reports the actual value by 1 delta in each time bucket.

Mar 13 '22 16:03 kvc0

I have the same issue, I can't understand how to make the counter_agg work (or maybe I didn't understand what this function is supposed to do).

CREATE TABLE example (
    measure_id      BIGINT,
    ts              TIMESTAMPTZ ,
    val             DOUBLE PRECISION,
    PRIMARY KEY (measure_id, ts)
);

INSERT INTO example (measure_id, ts, val) VALUES 
	(1, '2022-01-01T00:00:00.000Z', 1),
	(1, '2022-01-01T00:10:00.000Z', 2),
	(1, '2022-01-01T00:15:00.000Z', 3),
	(1, '2022-01-01T00:30:00.000Z', 4),
	(1, '2022-01-01T00:45:00.000Z', 5),
	(1, '2022-01-01T01:00:00.000Z', 6)
;

SELECT measure_id,
    time_bucket('15 min'::interval, ts) as bucket,
    delta(
        counter_agg(ts, val)
    )
FROM example
WHERE ts >= '2022-01-01' and TS < '2022-01-02'
GROUP BY measure_id, time_bucket('15 min'::interval, ts);

The SELECT query returns:

measure_id	bucket	delta
1	"2022-01-01 00:00:00+00"	1
1	"2022-01-01 00:15:00+00"	0
1	"2022-01-01 00:30:00+00"	0
1	"2022-01-01 00:45:00+00"	0
1	"2022-01-01 01:00:00+00"	0

I though that the delta was taking into account the delta between the first value of a bucket, and the last value of the previous one. I don't know if it's the intended behaviour?

Apr 19 '22 09:04 jledentu

I though that the delta was taking into account the delta between the first value of a bucket, and the last value of the previous one.

delta will subtract the first value from the last value. When used with GROUP BY, the delta calculation uses the first and last values within the same group.

Doing GROUP BY measure_id, time_bucket('15 min'::interval, ts) will give you something like

| measure_id | ts                         | val |
|------------+----------------------------+-----|
|          1 | '2022-01-01T00:00:00.000Z' |   1 |      <-- first value in group
|          1 | '2022-01-01T00:10:00.000Z' |   2 |      <-- last value in group
 ---------------- bucket boundary --------------
|          1 | '2022-01-01T00:15:00.000Z' |   3 |      <-- first and last value in group
 ---------------- bucket boundary --------------
|          1 | '2022-01-01T00:30:00.000Z' |   4 |      <-- first and last value in group
 ---------------- bucket boundary --------------
|          1 | '2022-01-01T00:45:00.000Z' |   5 |      <-- first and last value in group
 ---------------- bucket boundary --------------
|          1 | '2022-01-01T01:00:00.000Z' |   6 |      <-- first and last value in group

and so calling delta() on this, you are calculating last value - first value for each group

| delta     |
|-----------|
| 2 - 1 = 1 |
| 3 - 3 = 0 |
| 4 - 4 = 0 |
| 5 - 5 = 0 |
| 6 - 6 = 0 |

which matches what your select query returned and suggests delta is behaving as intended here.

When you GROUP BY using time_bucket(), then the functions called in your SELECT statement are operating on the groups/time buckets individually and don't have information about the other time buckets while doing so.

Hopefully that helps some. I'm sorry to hear making counter_agg work has been difficult for you. What is the output you are hoping to get from your query?

Apr 19 '22 17:04 rtwalker

Note that having one value per bucket is not that useful.

Frank Inselbuch 713-701-5421

On Apr 19, 2022, at 12:23 PM, Ryan Walker @.***> wrote:

I though that the delta was taking into account the delta between the first value of a bucket, and the last value of the previous one.

delta will subtract the first value from the last value. When used with GROUP BY, the delta calculation uses the first and last values within the same group.

Doing GROUP BY measure_id, time_bucket('15 min'::interval, ts) will give you something like

| measure_id | ts | val | |------------+----------------------------+-----| | 1 | '2022-01-01T00:00:00.000Z' | 1 | <-- first value in group | 1 | '2022-01-01T00:10:00.000Z' | 2 | <-- last value in group ---------------- bucket boundary -------------- | 1 | '2022-01-01T00:15:00.000Z' | 3 | <-- first and last value in group ---------------- bucket boundary -------------- | 1 | '2022-01-01T00:30:00.000Z' | 4 | <-- first and last value in group ---------------- bucket boundary -------------- | 1 | '2022-01-01T00:45:00.000Z' | 5 | <-- first and last value in group ---------------- bucket boundary -------------- | 1 | '2022-01-01T01:00:00.000Z' | 6 | <-- first and last value in group and so calling delta() on this, you are calculating last value - first value for each group

delta

2 - 1 = 1

3 - 3 = 0

4 - 4 = 0

5 - 5 = 0

6 - 6 = 0

which matches what your select query returned and suggests delta is behaving as intended here.

When you GROUP BY using time_bucket(), then the functions called in your SELECT statement are operating on the groups/time buckets individually and don't have information about the other time buckets while doing so.

Hopefully that helps some. I'm sorry to hear making counter_agg work has been difficult for you. What is the output you are hoping to get from your query?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

delta
2 - 1 = 1
3 - 3 = 0
4 - 4 = 0
5 - 5 = 0
6 - 6 = 0
which matches what your select query returned and suggests delta is behaving as intended here.

Apr 19 '22 18:04 inselbuch

@rtwalker Look at the official documentation for this feature: https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/counter-aggregation/counter-aggs/#run-a-counter-aggregate-query-using-a-delta-function

Pasted here for convenience:

SELECT measure_id,
    time_bucket('15 min'::interval, ts) as bucket,
    delta(
        counter_agg(ts, val)
    )
FROM example
GROUP BY measure_id, time_bucket('15 min'::interval, ts);

This pattern is broken though, if you want to use it to monitor a system.

@inselbuch This is very normal in Grafana (read about the way it sets up $__interval for $__timeGroup()). With Opentelemetry you might set up a 10 second reporting cadence for your metrics, and if you graph at 10 seconds resolution (you're zoomed in to an hour or three) your gauges will all work fine but your counters will have perhaps 1 value per bucket. So then you're asking folks to limit their bucket widths in their graphs to minimum of 2x the reporting interval, but they will lose 50% of their deltas if they only use 2. So if you are promoting the documented query pattern you're really asking for enough intervals to be included that missing 1 of them won't matter - and I don't know about you but all of my intervals matter!

It seems clear that

is expected, because those are the deltas, and this query pattern is what the docs say to use.

@rtwalker you can probably force this to work via a CTE:

with workaround as (
  select
    measure_id,
    delta(counter_agg(ts, val)) as actual_delta
  from example
  where ts >= '2022-01-01' and ts < '2022-01-02'
)
select measure_id,
    time_bucket('15 min'::interval, ts) as bucket,
    sum(actual_delta)
from workaround
group by 1, 2

Untested. Good luck...

Note that every single counter in Promscale + sql is subject to this painful workaround too.

Apr 19 '22 20:04 kvcache

@rtwalker Thanks for the explanation! I misunderstood the docs then. I think this function doesn't meet my use case, but since then I've seen another guide in the docs that might help me more: https://docs.timescale.com/timescaledb/latest/how-to-guides/query-data/advanced-analytic-queries/#increase

Apr 28 '22 15:04 jledentu

We have a workaround for this issue that we're looking at implementing soon. We've captured the details in issue #440

If anyone has any feedback on whether this would be a workable solution for them, or any suggestions to make this more usable, please chime in!

Jun 09 '22 17:06 WireBaron

Thanks everyone for your initial feedback. We believe we have a solution (see #440) in the latest 1.8.0 release for the issues that you were experiencing. We'd love to have you all try it out and give us more feedback. 🙏

Jul 14 '22 18:07 jerryxwu

timescaledb-toolkit
timescaledb-toolkit copied to clipboard

`delta(counter_agg())` drops 1 delta per `time_bucket`

timescaledb-toolkit timescaledb-toolkit copied to clipboard

`delta(counter_agg())` drops 1 delta per `time_bucket`

timescaledb-toolkit
timescaledb-toolkit copied to clipboard