timescaledb-toolkit icon indicating copy to clipboard operation
timescaledb-toolkit copied to clipboard

Simpler deltas and aggregates over them

Open jonatas opened this issue 2 years ago • 3 comments

Is your feature request related to a problem? Please describe. Helping the community, we often see cases like this where people want to use last value as some expression.

Transposing the content here:

I've got a load of time series data about a fleet of batteries stored in TimescaleDB, which record the 'state of charge' of each tank at each time. I don't have measurement of the in- and out-flow, only the instantaneous state of charge.

From this data, I want to find the change in state of charge at each time, which I will later bucket to consumption across hours (after doing some battery-specific maths).

I've written an SQL query which achieves my goal:

SELECT time, charge - LAG(charge) OVER (ORDER BY time) AS delta_soc FROM charge_data;

Putting that in a Postgres generated column:

ADD COLUMN delta_soc smallint GENERATED ALWAYS AS (charge - LAG(charge) OVER (ORDER BY time)) STORED

Fails, as promised in the docs, because it references another row.

So, I (successfully) made a materialized view:

CREATE MATERIALIZED VIEW delta_soc AS
SELECT
  time,
  batt_uid,
  charge,
  (charge-LAG(charge) OVER (ORDER BY time)) as delta_charge,
  EXTRACT(EPOCH FROM time-LAG(time) OVER (ORDER BY time)) as delta_time
FROM charge_data 
ORDER BY time;

But it would be nice to have this data in near-realtime. After all, it's a "simple" operation to just provide the change from last value. So, I looked at Timescale's continuous aggregates. But, as in the docs, you're not allowed a window function in a continuous aggregate, so the continuous aggregate is invalid.

Then, just throwing things at the wall and seeing what sticks, I wondered if I could reference the previous row during insertion

INSERT INTO charge_data VALUES (..., ([$chargevalue]-LAG(charge) OVER (ORDER BY time)), ...);
HINT:  There is a column named "charge" in table "mx_data", but it cannot be referenced from this part of the query.

I'm aware I could calculate the deltas

  • before insertion
  • after insertion by modifying each charge_data row with its delta
  • in the SQL query
  • in the querying program

But it seems much simpler and tidier to just have the DB calculate the values once at/around insertion, leading me to suspect I'm missing something. Is there any way to have charge[battery][n]-charge[battery][n-1] calculated and stored for every row in near-realtime in timescale?

Additional context I'm not the author of the need and the suggestion of this issue came from @davidkohn88.

jonatas avatar Aug 26 '21 13:08 jonatas

I think there's something ever worse going on here: It seems that unless there are two rows within a bucket, then any delta is also missed? 🤯

Re-jiggering above example to be incrementing ever hour, and then taking 1/2/4/8 hour buckets:

CREATE TABLE data(ts timestamptz NOT NULL, val numeric);

SELECT create_hypertable('data', 'ts');

INSERT INTO data VALUES
('2022-01-01T00:00:00', 0),
('2022-01-01T01:00:00', 1),
('2022-01-01T02:00:00', 2),
('2022-01-01T03:00:00', 3),
('2022-01-01T04:00:00', 4),
('2022-01-01T05:00:00', 5),
('2022-01-01T06:00:00', 6),
('2022-01-01T07:00:00', 7),
('2022-01-01T08:00:00', 8),
('2022-01-01T09:00:00', 9),
('2022-01-01T10:00:00', 10),
('2022-01-01T11:00:00', 11),
('2022-01-01T12:00:00', 12);

SELECT
    ts,
    val
FROM data;

CREATE VIEW buckets AS
SELECT
    mins_per_bucket,
    bucket_start,
    delta(summary) AS delta
FROM (
    SELECT
        mins_per_bucket,
        time_bucket(make_interval(mins => mins_per_bucket), ts) AS bucket_start,
        counter_agg(ts, val) AS "summary"
    FROM data, unnest(ARRAY[60, 120, 240, 480, 960]) AS mins_per_bucket
    GROUP BY
        mins_per_bucket, time_bucket(make_interval(mins => mins_per_bucket), ts)
) AS t
ORDER BY mins_per_bucket, bucket_start;

SELECT
    mins_per_bucket,
    sum(delta) AS total_increase
FROM buckets GROUP BY mins_per_bucket;

SELECT
    mins_per_bucket,
    bucket_start,
    delta
FROM buckets;

Same symptom:

 mins_per_bucket | total_increase 
-----------------+----------------
              60 |              0
             120 |              6
             240 |              9
             480 |             11
             960 |             12
(5 rows)

 mins_per_bucket |      bucket_start      | delta 
-----------------+------------------------+-------
              60 | 2022-01-01 00:00:00+00 |     0
              60 | 2022-01-01 01:00:00+00 |     0
              60 | 2022-01-01 02:00:00+00 |     0
              60 | 2022-01-01 03:00:00+00 |     0
              60 | 2022-01-01 04:00:00+00 |     0
              60 | 2022-01-01 05:00:00+00 |     0
              60 | 2022-01-01 06:00:00+00 |     0
              60 | 2022-01-01 07:00:00+00 |     0
              60 | 2022-01-01 08:00:00+00 |     0
              60 | 2022-01-01 09:00:00+00 |     0
              60 | 2022-01-01 10:00:00+00 |     0
              60 | 2022-01-01 11:00:00+00 |     0
              60 | 2022-01-01 12:00:00+00 |     0
             120 | 2022-01-01 00:00:00+00 |     1
             120 | 2022-01-01 02:00:00+00 |     1
             120 | 2022-01-01 04:00:00+00 |     1
             120 | 2022-01-01 06:00:00+00 |     1
             120 | 2022-01-01 08:00:00+00 |     1
             120 | 2022-01-01 10:00:00+00 |     1
             120 | 2022-01-01 12:00:00+00 |     0
             240 | 2022-01-01 00:00:00+00 |     3
             240 | 2022-01-01 04:00:00+00 |     3
             240 | 2022-01-01 08:00:00+00 |     3
             240 | 2022-01-01 12:00:00+00 |     0
             480 | 2022-01-01 00:00:00+00 |     7
             480 | 2022-01-01 08:00:00+00 |     4
             960 | 2022-01-01 00:00:00+00 |    12
(27 rows)

davetapley avatar Nov 17 '22 23:11 davetapley

Note, going to move this over to #626. Hi @davetapley I think this issue and also #626 are related to the way that the aggregates interpolate between different time buckets. Because of the way Postgres aggregates work, we had to add a special "interpolated" accessor to deal with these cases, see: https://docs.timescale.com/api/latest/hyperfunctions/counter_aggs/interpolated_delta/ for instance.

I wish the SQL were a little less annoying for this, but right now it's the best we've got, we're hoping to figure out some ways to make it easier but we're running up against some fundamental behavior of SQL syntax with it right now (and in some ways it is correct, the delta when there's only one point in a bucket is 0, but when you look outside the bucket, then you can see that it's part of a larger trend, so I guess the interpolated name makes some sense). Anyway, here's the way I'd rewrite that query:


WITH buckets as (
SELECT
    mins_per_bucket,
    bucket_start,
    toolkit_experimental.interpolated_delta(summary, bucket_start, make_interval(mins=>mins_per_bucket), lag(summary) OVER bucket_mins, lead(summary) OVER  bucket_mins) AS delta
FROM (
    SELECT
        mins_per_bucket,
        time_bucket(make_interval(mins => mins_per_bucket), ts) AS bucket_start,
        counter_agg(ts, val) AS "summary"
    FROM data, unnest(ARRAY[60, 120, 240, 480, 960]) AS mins_per_bucket
    GROUP BY
        mins_per_bucket, time_bucket(make_interval(mins => mins_per_bucket), ts)
) AS t
WINDOW bucket_mins as (PARTITION BY mins_per_bucket ORDER BY bucket_start)
ORDER BY mins_per_bucket, bucket_start)

SELECT
    mins_per_bucket,
    sum(delta) AS total_increase
FROM buckets GROUP BY mins_per_bucket;

I wouldn't create a view with that function right now as it's still experimental so it'll be dropped at the next extension version, but you can use a CTE (WITH clause) for that and you'll see that everything looks as it should now. Hope that's helpful!

We have a number of other interpolated accessors as well, and also rollup() https://docs.timescale.com/api/latest/hyperfunctions/counter_aggs/rollup-counter/ which will allow you to calculate the correct result over larger periods without having to do the sum(delta) type thing...

davidkohn88 avatar Nov 18 '22 09:11 davidkohn88

(also technically, this issue was about a different kind of delta, but that's okay, we can keep the conversation here for now and maybe clarify with a new issue or something when we're ready).

davidkohn88 avatar Nov 18 '22 09:11 davidkohn88