timescaledb-toolkit
timescaledb-toolkit copied to clipboard
`delta(counter_agg())` drops 1 delta per `time_bucket`
Relevant system information:
- OS: timescale forge & wsl2
- PostgreSQL version: Ubuntu 14.2-1.pgdg21.04+1
- TimescaleDB Toolkit version:
Name | Version | Schema ---------------------+---------+------------ plpgsql | 1.0 | pg_catalog timescaledb | 2.6.0 | public timescaledb_toolkit | 1.5.1 | public
- Installation method: Timescale Forge & docker
Describe the bug
delta(counter_agg(time, value))
under-reports data when used with time_bucket
. Found via inference, it appears that the first or last delta per bucket is dropped. This means that counters, when graphed via time_bucket + delta(counter_agg()) are always reporting incorrect data.
To Reproduce Steps to reproduce the behavior:
create extension if not exists timescaledb_toolkit;
drop table if exists repro;
create table repro(time timestamptz, n int);
insert into repro values
(now(), 100),
(now() - interval '10 seconds', 99),
(now() - interval '20 seconds', 98),
(now() - interval '30 seconds', 97),
(now() - interval '40 seconds', 96),
(now() - interval '50 seconds', 95),
(now() - interval '60 seconds', 94),
(now() - interval '70 seconds', 93),
(now() - interval '80 seconds', 92),
(now() - interval '90 seconds', 91),
(now() - interval '100 seconds', 90),
(now() - interval '110 seconds', 89),
(now() - interval '120 seconds', 88),
(now() - interval '130 seconds', 87),
(now() - interval '140 seconds', 86),
(now() - interval '150 seconds', 85),
(now() - interval '160 seconds', 84),
(now() - interval '170 seconds', 83),
(now() - interval '180 seconds', 82),
(now() - interval '190 seconds', 81),
(now() - interval '200 seconds', 80),
(now() - interval '210 seconds', 79),
(now() - interval '220 seconds', 78),
(now() - interval '230 seconds', 77),
(now() - interval '240 seconds', 76),
(now() - interval '250 seconds', 75),
(now() - interval '260 seconds', 74),
(now() - interval '270 seconds', 73),
(now() - interval '280 seconds', 72),
(now() - interval '290 seconds', 71),
(now() - interval '300 seconds', 70);
with win_delta as (
-- get each real row delta as simply as possible
select
time,
n - lag(n) over (order by time) as change
from repro
),
win as (
-- combine the simple row deltas into 1 minute buckets and sum the deltas within the minute
-- obviously we expect sum of "6" for each minute
select time_bucket('1m', time) as time, sum(change) as window_change from win_delta group by 1
),
delt as (
select
-- More convenient timescale function; however real data is 1 bucket higher than whatever this reports
time_bucket('1m', time) as time,
delta(counter_agg(time, n))
from repro
group by 1
)
select win.time, window_change, delta
from win join delt on win.time = delt.time
order by 1
Expected behavior Either strategy reports the real delta:
time | window_change | delta
------------------------+---------------+-------
2022-03-05 21:07:00+00 | 2 | 2
2022-03-05 21:08:00+00 | 6 | 6
2022-03-05 21:09:00+00 | 6 | 6
2022-03-05 21:10:00+00 | 6 | 6
2022-03-05 21:11:00+00 | 6 | 6
2022-03-05 21:12:00+00 | 4 | 4
Actual behavior
delta()
misses data:
time | window_change | delta
------------------------+---------------+-------
2022-03-05 21:07:00+00 | 2 | 2
2022-03-05 21:08:00+00 | 6 | 5
2022-03-05 21:09:00+00 | 6 | 5
2022-03-05 21:10:00+00 | 6 | 5
2022-03-05 21:11:00+00 | 6 | 5
2022-03-05 21:12:00+00 | 4 | 3
@WarriorOfWire , really appreciate you for providing a concrete example here.
Here's my best guess as to what's happening, unfortunately it's not a solution yet, but I figure it'd be a good idea to checkpoint this for everyone to see.
While the two methods look similar, they have vastly different semantics, and I think this explains the difference in outputs. The first strategy
with diff as (
SELECT
time,
n - lag(n) over(order by time) as change
FROM repro
)
SELECT
time_bucket('1 minute', time),
sum(change) as delta
FROM diff
GROUP BY time_bucket('1 minute', time);
does:
- Calculate the instantaneous rate of change at every pair of points in the dataset.
- Truncates the timevalue to a granularity of 1 minute.
- Buckets the dataset such that all values with the same truncated timestamp go into the same bucket.
- Calculates the overall rate of change in each bucket as the sum of the the instantaneous rates of change within said bucket.
Meanwhile the second strategy
SELECT
time_bucket('1 minute', time),
delta(counter_agg(time, n))
FROM repro
GROUP BY time_bucket('1 minute', time);
does something very different:
- Truncates the timevalue to a granularity of 1 minute.
- Buckets the dataset such that all values with the same truncated timestamp go into the same bucket.
- Gets the earliest and latest value within each bucket.
- Calculates the overall rate of change in each bucket as the difference between the earliest and latest elements within the bucket.
The difference in what they’re calculating turns out to be benign in this instance, but the difference in order leads to a subtle issue: if you have a timeseries like
time | n
-------------+-----
00:00:00+00 | 70
00:00:10+00 | 71
00:00:20+00 | 72
00:00:30+00 | 73
00:00:40+00 | 74
00:00:50+00 | 75
00:01:00+00 | 76
00:01:10+00 | 77
00:01:20+00 | 78
00:01:30+00 | 79
00:01:40+00 | 80
00:01:50+00 | 81
if you calculate the deltas first you will end up with
time | n
-------------+-----
00:00:00+00 | 0
00:00:10+00 | 1
00:00:20+00 | 1
00:00:30+00 | 1
00:00:40+00 | 1
00:00:50+00 | 1
00:01:00+00 | 1
00:01:10+00 | 1
00:01:20+00 | 1
00:01:30+00 | 1
00:01:40+00 | 1
00:01:50+00 | 1
which bucketing will turn into
time | n
-------------+-----
00:00:00+00 | 0
00:00:10+00 | 1
00:00:20+00 | 1
00:00:30+00 | 1
00:00:40+00 | 1
00:00:50+00 | 1
-- bucket change
00:01:00+00 | 1
00:01:10+00 | 1
00:01:20+00 | 1
00:01:30+00 | 1
00:01:40+00 | 1
00:01:50+00 | 1
and summation will turn into
time | sum
-------------+-----
00:00:00+00 | 5
-- bucket change
00:01:00+00 | 6
if instead you bucket first, the bucketing will transform the original dataset into
time | n
-------------+-----
00:00:00+00 | 70
00:00:10+00 | 71
00:00:20+00 | 72
00:00:30+00 | 73
00:00:40+00 | 74
00:00:50+00 | 75
-- bucket change
00:01:00+00 | 76
00:01:10+00 | 77
00:01:20+00 | 78
00:01:30+00 | 79
00:01:40+00 | 80
00:01:50+00 | 81
and the deltas will calculate
time | n
-------------+-----
00:00:00+00 | 0
00:00:10+00 | 1
00:00:20+00 | 1
00:00:30+00 | 1
00:00:40+00 | 1
00:00:50+00 | 1
-- bucket change
00:01:00+00 | 0
00:01:10+00 | 1
00:01:20+00 | 1
00:01:30+00 | 1
00:01:40+00 | 1
00:01:50+00 | 1
(note the leading 0 in the second bucket) for final values
time | sum
-------------+-----
00:00:00+00 | 5
-- bucket change
00:01:00+00 | 5
I think this is fundamental to the time_bucket()
+GROUP BY
pattern: it loses information. We have with_bounds()
which can make up for it in some cases, but I don’t know yet if it can solve this one. There also might be potential solutions with window functions.
TLDR: I think the two queries are bucketing the data differently, and that's what's causing the different in output, but I'm not yet sure how to get the desired output with counter_agg()
.
@JLockerman just to check my assumptions, this is in fact an issue right? I notice this is exactly the same query pattern as the docs show. Specifically this query from that doc:
SELECT measure_id,
time_bucket('15 min'::interval, ts) as bucket,
delta(
counter_agg(ts, val)
)
FROM example
GROUP BY measure_id, time_bucket('15 min'::interval, ts);
I expect this query also under-reports the actual value by 1 delta in each time bucket.
I have the same issue, I can't understand how to make the counter_agg
work (or maybe I didn't understand what this function is supposed to do).
CREATE TABLE example (
measure_id BIGINT,
ts TIMESTAMPTZ ,
val DOUBLE PRECISION,
PRIMARY KEY (measure_id, ts)
);
INSERT INTO example (measure_id, ts, val) VALUES
(1, '2022-01-01T00:00:00.000Z', 1),
(1, '2022-01-01T00:10:00.000Z', 2),
(1, '2022-01-01T00:15:00.000Z', 3),
(1, '2022-01-01T00:30:00.000Z', 4),
(1, '2022-01-01T00:45:00.000Z', 5),
(1, '2022-01-01T01:00:00.000Z', 6)
;
SELECT measure_id,
time_bucket('15 min'::interval, ts) as bucket,
delta(
counter_agg(ts, val)
)
FROM example
WHERE ts >= '2022-01-01' and TS < '2022-01-02'
GROUP BY measure_id, time_bucket('15 min'::interval, ts);
The SELECT
query returns:
measure_id | bucket | delta |
---|---|---|
1 | "2022-01-01 00:00:00+00" | 1 |
1 | "2022-01-01 00:15:00+00" | 0 |
1 | "2022-01-01 00:30:00+00" | 0 |
1 | "2022-01-01 00:45:00+00" | 0 |
1 | "2022-01-01 01:00:00+00" | 0 |
I though that the delta was taking into account the delta between the first value of a bucket, and the last value of the previous one. I don't know if it's the intended behaviour?
I though that the delta was taking into account the delta between the first value of a bucket, and the last value of the previous one.
delta
will subtract the first value from the last value. When used with GROUP BY
, the delta
calculation uses the first and last values within the same group.
Doing GROUP BY measure_id, time_bucket('15 min'::interval, ts)
will give you something like
| measure_id | ts | val |
|------------+----------------------------+-----|
| 1 | '2022-01-01T00:00:00.000Z' | 1 | <-- first value in group
| 1 | '2022-01-01T00:10:00.000Z' | 2 | <-- last value in group
---------------- bucket boundary --------------
| 1 | '2022-01-01T00:15:00.000Z' | 3 | <-- first and last value in group
---------------- bucket boundary --------------
| 1 | '2022-01-01T00:30:00.000Z' | 4 | <-- first and last value in group
---------------- bucket boundary --------------
| 1 | '2022-01-01T00:45:00.000Z' | 5 | <-- first and last value in group
---------------- bucket boundary --------------
| 1 | '2022-01-01T01:00:00.000Z' | 6 | <-- first and last value in group
and so calling delta()
on this, you are calculating last value - first value
for each group
| delta |
|-----------|
| 2 - 1 = 1 |
| 3 - 3 = 0 |
| 4 - 4 = 0 |
| 5 - 5 = 0 |
| 6 - 6 = 0 |
which matches what your select query returned and suggests delta
is behaving as intended here.
When you GROUP BY
using time_bucket()
, then the functions called in your SELECT
statement are operating on the groups/time buckets individually and don't have information about the other time buckets while doing so.
Hopefully that helps some. I'm sorry to hear making counter_agg
work has been difficult for you. What is the output you are hoping to get from your query?
Note that having one value per bucket is not that useful.
Frank Inselbuch 713-701-5421
On Apr 19, 2022, at 12:23 PM, Ryan Walker @.***> wrote:
I though that the delta was taking into account the delta between the first value of a bucket, and the last value of the previous one.
delta will subtract the first value from the last value. When used with GROUP BY, the delta calculation uses the first and last values within the same group.
Doing GROUP BY measure_id, time_bucket('15 min'::interval, ts) will give you something like
| measure_id | ts | val | |------------+----------------------------+-----| | 1 | '2022-01-01T00:00:00.000Z' | 1 | <-- first value in group | 1 | '2022-01-01T00:10:00.000Z' | 2 | <-- last value in group ---------------- bucket boundary -------------- | 1 | '2022-01-01T00:15:00.000Z' | 3 | <-- first and last value in group ---------------- bucket boundary -------------- | 1 | '2022-01-01T00:30:00.000Z' | 4 | <-- first and last value in group ---------------- bucket boundary -------------- | 1 | '2022-01-01T00:45:00.000Z' | 5 | <-- first and last value in group ---------------- bucket boundary -------------- | 1 | '2022-01-01T01:00:00.000Z' | 6 | <-- first and last value in group and so calling delta() on this, you are calculating last value - first value for each group
delta 2 - 1 = 1 3 - 3 = 0 4 - 4 = 0 5 - 5 = 0 6 - 6 = 0 which matches what your select query returned and suggests delta is behaving as intended here. When you GROUP BY using time_bucket(), then the functions called in your SELECT statement are operating on the groups/time buckets individually and don't have information about the other time buckets while doing so.
Hopefully that helps some. I'm sorry to hear making counter_agg work has been difficult for you. What is the output you are hoping to get from your query?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.
@rtwalker Look at the official documentation for this feature: https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/counter-aggregation/counter-aggs/#run-a-counter-aggregate-query-using-a-delta-function
Pasted here for convenience:
SELECT measure_id,
time_bucket('15 min'::interval, ts) as bucket,
delta(
counter_agg(ts, val)
)
FROM example
GROUP BY measure_id, time_bucket('15 min'::interval, ts);
This pattern is broken though, if you want to use it to monitor a system.
@inselbuch This is very normal in Grafana (read about the way it sets up $__interval
for $__timeGroup()
). With Opentelemetry you might set up a 10 second reporting cadence for your metrics, and if you graph at 10 seconds resolution (you're zoomed in to an hour or three) your gauges will all work fine but your counters will have perhaps 1 value per bucket. So then you're asking folks to limit their bucket widths in their graphs to minimum of 2x the reporting interval, but they will lose 50% of their deltas if they only use 2. So if you are promoting the documented query pattern you're really asking for enough intervals to be included that missing 1 of them won't matter - and I don't know about you but all of my intervals matter!
It seems clear that
2
1
1
1
0
is expected, because those are the deltas, and this query pattern is what the docs say to use.
@rtwalker you can probably force this to work via a CTE:
with workaround as (
select
measure_id,
delta(counter_agg(ts, val)) as actual_delta
from example
where ts >= '2022-01-01' and ts < '2022-01-02'
)
select measure_id,
time_bucket('15 min'::interval, ts) as bucket,
sum(actual_delta)
from workaround
group by 1, 2
Untested. Good luck...
Note that every single counter in Promscale + sql is subject to this painful workaround too.
@rtwalker Thanks for the explanation! I misunderstood the docs then. I think this function doesn't meet my use case, but since then I've seen another guide in the docs that might help me more: https://docs.timescale.com/timescaledb/latest/how-to-guides/query-data/advanced-analytic-queries/#increase
We have a workaround for this issue that we're looking at implementing soon. We've captured the details in issue #440
If anyone has any feedback on whether this would be a workable solution for them, or any suggestions to make this more usable, please chime in!
Thanks everyone for your initial feedback. We believe we have a solution (see #440) in the latest 1.8.0 release for the issues that you were experiencing. We'd love to have you all try it out and give us more feedback. 🙏