timescaledb-toolkit icon indicating copy to clipboard operation
timescaledb-toolkit copied to clipboard

Counter_agg encounters counter reset when 2 hours merge because of Daylight Saving time change

Open emizero opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe. With daylight savings in some time zone, time will shift in the fall as we fall back 1 hour (25 hour day), and in the spring as we spring forward 1 hour (23 hour day). Given a 25 hour day, 2 hours must be merged. On the US East, 1-2am is good and then 2-3am becomes 1-2am. Note that other time zones may have a different date and different time of the early morning. Unfortunately, this shifting causes problems to counter_agg. Records from the merged 2 hours are no longer in the order in which they were received. This lead to counter resets throughout the 1-2am hour, and subsequently incorrect results for any functions that relies on the counter_agg results for any bucket that includes data in the 1-2am hour, and where applicable buckets around the latter.

Describe the solution you'd like Thus the request to have counter_agg maintain the order in which data was received even when 2 hours are merged; or any reasonable approach that does not lead to counter resets.

Describe alternatives you've considered

  • Counter_agg on UTC time values, and ultimately in the last step shift to desired timezone. Works for hour buckets or smaller. Not application to week, month or year buckets.
  • Filter out an hour's worth of data (as in one of the hours that would be merged). Unknown how this impacts the result but there is no reset.

Additional context Here is an example along with explored alternatives:

  • we use interpolated_delta and delta in combination with counter_agg.
  • PostgreSQL 15.4
  • TimescaleDB 1.17.0
create table test_table (
	device_id bigint not null,
	reading bigint not null,
	time_ts timestamp with time zone not null
);

create index test_table_time_ts_idx on test_table(time_ts desc);

--> single node hypertable
select create_hypertable('test_table', 'time_ts', chunk_time_interval => INTERVAL '1 day');

--> data

--delete from test_table; 
INSERT INTO test_table (device_id, reading, time_ts)	  
	SELECT 
                cast(random() * 4 as int) + 1, 
                generate_series(1,2678401),
                generate_series(
                    '2022-10-15 00:00:00',
	    	    '2022-11-15 00:00:00',
                    INTERVAL '1 second'
                ); 

[1] For hour buckets or smaller, we can run counter_agg on UTC data and then shift to the desired time zone later.

(1a) Interpolated_delta

-- Shift to desired time zone right away => bad result for Nov 6th at Midnight and 1am

WITH t as (
    SELECT
        device_id,
        time_bucket('1 hour'::interval, time_ts at time zone 'US/Eastern') AS ts,
        counter_agg(time_ts at time zone 'US/Eastern', reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1
    GROUP BY 1,2
)
SELECT
    device_id,
    ts,
    interpolated_delta(
            counter_agg_results, ts, '1 hour'::interval,
            lag(counter_agg_results) OVER device_id_by_time,
            lead(counter_agg_results) OVER device_id_by_time
        )
FROM t
WINDOW device_id_by_time AS ( PARTITION BY device_id order by ts);
 device_id |         ts          | interpolated_delta 
-----------+---------------------+--------------------
...
         1 | 2022-11-05 22:00:00 |               3600
         1 | 2022-11-05 23:00:00 |               3600
         1 | 2022-11-06 00:00:00 |               4050
         1 | 2022-11-06 01:00:00 |          484920016
         1 | 2022-11-06 02:00:00 |               3600
         1 | 2022-11-06 03:00:00 |               3600
         1 | 2022-11-06 04:00:00 |               3600
         1 | 2022-11-06 05:00:00 |               3600
...

-- Delayed shift to desired time zone => Yields 2 separate records for the combined hours on Nov 6 at 1am

WITH t as (
    SELECT
        device_id,
        time_bucket('1 hour'::interval, time_ts) AS ts,
        counter_agg(time_ts, reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1
    GROUP BY 1,2
)
SELECT
    device_id,
    ts at time zone 'US/Eastern',
    interpolated_delta(
            counter_agg_results, ts, '1 hour'::interval,
            lag(counter_agg_results) OVER device_id_by_time,
            lead(counter_agg_results) OVER device_id_by_time
        )
FROM t
WINDOW device_id_by_time AS ( PARTITION BY device_id order by ts);
 device_id |      timezone       | interpolated_delta 
-----------+---------------------+--------------------
...
         1 | 2022-11-05 23:00:00 |               3600
         1 | 2022-11-06 00:00:00 |               3600
         1 | 2022-11-06 01:00:00 |               3600
         1 | 2022-11-06 01:00:00 |               3600
         1 | 2022-11-06 02:00:00 |               3600
         1 | 2022-11-06 03:00:00 |               3600
         1 | 2022-11-06 04:00:00 |               3600
...

(1b) delta

-- Shift to desired time zone right away => Nov 6th at 1am has the wrong result

WITH t as (
    SELECT
        device_id,
        time_bucket('1 hour'::interval, time_ts at time zone 'US/Eastern') AS ts,
        counter_agg(time_ts at time zone 'US/Eastern', reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1
    GROUP BY 1,2
)
SELECT
    device_id,
    ts,
    delta(counter_agg_results)
FROM t;
 device_id |         ts          |   delta   
-----------+---------------------+-----------
...
         1 | 2022-11-05 22:00:00 |      3574
         1 | 2022-11-05 23:00:00 |      3576
         1 | 2022-11-06 00:00:00 |      3596
         1 | 2022-11-06 01:00:00 | 484916858
         1 | 2022-11-06 02:00:00 |      3588
         1 | 2022-11-06 03:00:00 |      3595
         1 | 2022-11-06 04:00:00 |      3594
         1 | 2022-11-06 05:00:00 |      3574
...

-- Delayed shift to desired time zone => Yields 2 separate records for the combined hours.

WITH t as (
    SELECT
        device_id,
        time_bucket('1 hour'::interval, time_ts) AS ts,
        counter_agg(time_ts, reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1
    GROUP BY 1,2
)
SELECT
    device_id,
    ts at time zone 'US/Eastern',
    delta(counter_agg_results)
FROM t;
 device_id |      timezone       | delta 
-----------+---------------------+-------
...
         1 | 2022-11-05 22:00:00 |  3574
         1 | 2022-11-05 23:00:00 |  3576
         1 | 2022-11-06 00:00:00 |  3596
         1 | 2022-11-06 01:00:00 |  3576
         1 | 2022-11-06 01:00:00 |  3592
         1 | 2022-11-06 02:00:00 |  3588
         1 | 2022-11-06 03:00:00 |  3595
         1 | 2022-11-06 04:00:00 |  3594
         1 | 2022-11-06 05:00:00 |  3574
...

[2] For 'week', 'month' or 'year', it would not make sense to bucket at UTC time zone and then shift to the desired time zone. The example in this case will stick to delta and a month bucket

-- Shift to desired time zone right away => Bad result for November

WITH t as (
    SELECT
        device_id,
        time_bucket('1 month'::interval, time_ts at time zone 'US/Eastern') AS ts,
        counter_agg(time_ts at time zone 'US/Eastern', reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1
    GROUP BY 1,2
)
SELECT
    device_id,
    ts,
    delta(counter_agg_results)
FROM t;
 device_id |         ts          |   delta   
-----------+---------------------+-----------
         1 | 2022-10-01 00:00:00 |   1483152
         1 | 2022-11-01 00:00:00 | 484182395
(2 rows)

-- Delayed shift to desired time zone => Who would want this?! Buckets remain aligned to UTC and not US Eastern!!!

WITH t as (
    SELECT
        device_id,
        time_bucket('1 month'::interval, time_ts) AS ts,
        counter_agg(time_ts, reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1
    GROUP BY 1,2
)
SELECT
    device_id,
    ts at time zone 'US/Eastern',
    delta(counter_agg_results)
FROM t;
 device_id |      timezone       |  delta  
-----------+---------------------+---------
         1 | 2022-09-30 20:00:00 | 1468776
         1 | 2022-10-31 20:00:00 | 1209589
(2 rows)

-- Shift to desired time zone right away,but eliminate data for one of the duplicate hours => What are repercussions for the omitted data?! And we expect degraded performance because of the added filter.

WITH t as (
    SELECT
        device_id,
        time_bucket('1 month'::interval, time_ts at time zone 'US/Eastern') AS ts,
        counter_agg(time_ts at time zone 'US/Eastern', reading) as counter_agg_results
    FROM test_table
    WHERE device_id = 1 and 
          time_ts at time zone 'US/Eastern' <> (time_ts + interval '1 hour') at time zone 'US/Eastern'
    GROUP BY 1,2
)
SELECT
    device_id,
    ts,
    delta(counter_agg_results)
FROM t;
 device_id |         ts          |  delta  
-----------+---------------------+---------
         1 | 2022-10-01 00:00:00 | 1483152
         1 | 2022-11-01 00:00:00 | 1195184
(2 rows)

emizero avatar Aug 31 '23 04:08 emizero