timescaledb-toolkit icon indicating copy to clipboard operation
timescaledb-toolkit copied to clipboard

Time weighted averages that take into account a specified timebucket

Open kavilan-nair opened this issue 4 years ago • 2 comments

Is your feature request related to a problem? Please describe. I have a use case for querying out time weighted data with respect to a specified timebucket. If I have inserted the following data:

SET TIME ZONE 'UTC';
 CREATE TABLE foo (
    measure_id      BIGINT,
    ts              TIMESTAMPTZ ,
    val             DOUBLE PRECISION,
    PRIMARY KEY (measure_id, ts)
);
INSERT INTO foo VALUES
( 1, '2020-01-01 00:00:00+00', 1.0),
( 1, '2020-01-01 00:45:00+00', 2.0),
( 1, '2020-01-01 01:00:00+00',1.0),
( 1, '2020-01-01 01:30:00+00', 2.0),
( 1, '2020-01-01 02:00:00+00', 3.0);

I would like to make a query that returns the time weighted averages within each timebucket. I would expect to see a time weighted average of 1*3/4 + 2*1/4 = 0.75+0.5 = 1.25 for the first hour. The current implementation of the time_weight function does not seem to be able to handle this in its current form.

My attempt was to use the following query but it fails to return the correct weighted average within a timebucket

SELECT measure_id,
    time_bucket('60 min'::interval, ts) as bucket,
    average(
        time_weight('LOCF', ts, val)
    )
FROM foo
GROUP BY measure_id, time_bucket('60 min'::interval, ts)
ORDER BY measure_id, time_bucket('60 min'::interval, ts);

It returns the following which is incorrect

+--------------+------------------------+--------------------+
| measure_id   | bucket                 | average            |
|--------------+------------------------+--------------------|
| 1            | 2020-01-01 00:00:00+00 | 1.1818181818181819 |
| 1            | 2020-01-01 01:00:00+00 | 1.0                |
| 1            | 2020-01-01 02:00:00+00 | <null>             |
+--------------+------------------------+--------------------+

If you are on the timescale community slack, you can view the discussion around this here: https://timescaledb.slack.com/archives/C4GT3N90X/p1629203924378600

Describe the solution you'd like I would like the time_weight function to be extended or a new function added that uses a provided timebucket to calculate the weighted averages.

Describe alternatives you've considered I am currently trying to implement a vanilla query that allows me to achieve this functionality using LOCF interpolation

kavilan-nair avatar Aug 17 '21 15:08 kavilan-nair

Thanks for the request! I think the plan here is to have some sort of window function that provides lookback and bounding capability so that you can do this a bit more easily. We've thought a bit about it, but haven't yet implemented the window function to do this, we'll take a look and update here when we have a proposal. In the meantime, if others want this please do upvote with the thumbsup reaction so we know more for prioritization!

davidkohn88 avatar Sep 01 '21 16:09 davidkohn88

Wait, this isn't how time_weight works already?

The slack link seems to have expired - any chance of some more explanation of what time_weight is actually doing here? I can't figure out where 1.1818 is coming from. Also when I tested this I got 1, but that's not correct either.

SystemParadox avatar Dec 20 '21 10:12 SystemParadox