timescaledb-toolkit icon indicating copy to clipboard operation
timescaledb-toolkit copied to clipboard

interpolated integral/average with timezone offset fails for 1 month interval

Open phoenix147 opened this issue 2 years ago • 5 comments
trafficstars

Relevant system information:

  • OS: timescaledb-ha:pg14.5-ts2.8.0-latest
  • PostgreSQL version: 14.5
  • TimescaleDB Toolkit version: 1.11.1
  • Installation method: docker, kubernetes

Describe the bug

We're trying to run an interpolated_integral (or interpolated_average) query with an underlying subquery, which has a timezone offset:

SELECT bucket, toolkit_experimental.interpolated_integral(agg, bucket, '1month', LAG(agg) OVER (ORDER BY bucket), LEAD(agg) OVER (ORDER BY bucket), 'hours')
FROM (
    SELECT time_bucket('1month', time, 'CET') AS bucket, time_weight('LOCF', time, value) as agg
    FROM measurements WHERE variable = 1 
    AND time >= '2022-12-01 00:00:00' 
    AND time <= '2023-02-21 00:00:00' GROUP BY bucket)
temp ORDER BY bucket;

This works for an interval up to 1 day, but using 1 month, the following error occurs:

ERROR: assertion failed: interval_start <= self.first.ts &&\n interval_start + interval_len > self.last.ts CONTEXT: extension/src/time_weighted_average.rs:54:9 SQL state: XX000

The subquery results:

"2023-01-31 23:00:00+00"	"(version:1,first:(ts:""2023-02-01 07:03:17.904652+00"",val:0.07),last:(ts:""2023-02-21 10:55:07.632907+00"",val:103.23),weighted_sum:16610132665429.502,method:LOCF)"
"2022-12-31 23:00:00+00"	"(version:1,first:(ts:""2023-01-01 06:56:08.400243+00"",val:0.02),last:(ts:""2023-01-31 15:54:08.149332+00"",val:0),weighted_sum:10386759309142.107,method:LOCF)"
"2022-11-30 23:00:00+00"	"(version:1,first:(ts:""2022-12-01 06:47:15.756845+00"",val:0.1),last:(ts:""2022-12-31 15:19:45.953033+00"",val:0),weighted_sum:7417551236871.949,method:LOCF)"

Using this timestamps and a 1 month interval only adds 30 days, which results in the error above

Removing the timezone from the query, results in the following set:

"2022-12-01 00:00:00+00"	"(version:1,first:(ts:""2022-12-01 06:47:15.756845+00"",val:0.1),last:(ts:""2022-12-31 15:19:45.953033+00"",val:0),weighted_sum:7417551236871.949,method:LOCF)"
"2023-01-01 00:00:00+00"	"(version:1,first:(ts:""2023-01-01 06:56:08.400243+00"",val:0.02),last:(ts:""2023-01-31 15:54:08.149332+00"",val:0),weighted_sum:10386759309142.107,method:LOCF)"
"2023-02-01 00:00:00+00"	"(version:1,first:(ts:""2023-02-01 07:03:17.904652+00"",val:0.07),last:(ts:""2023-02-21 11:05:53.185773+00"",val:88.63),weighted_sum:16651660179395.787,method:LOCF)"

which works correctly.

Is there any way to work around this issue?

To Reproduce Steps to reproduce the behavior: Use a query above with time_bucket and timezone over several months (including months with 31 days)

Expected behavior The query should run even when using time_bucket with timezone data

Actual behavior The query fails with above error.

phoenix147 avatar Feb 21 '23 11:02 phoenix147

Oy. Time zones, and especially their interactions with months are truly the worst.

So, I did some digging, and I think what's going on here is that time_bucket('1 month', time, time zone) returns a timestamptz of the start of the month at that time zone, which is correct, but if I add one month to that time stamp, now not knowing I'm working in the specified time zone, the logic doesn't work exactly the same way, so you end up with some weirdness - so what I'd do to fix that is create a quick function:

CREATE OR REPLACE FUNCTION calc_month_interval(day timestamptz, tz text) RETURNS INTERVAL as $$
SELECT ($1 at time zone $2) + interval '1 month' - ($1 at time zone $2) $$ LANGUAGE SQL IMMUTABLE PARALLEL SAFE;

That does the conversion to the correct time zone and then call it in your SQL statement:

SELECT bucket, calc_month_interval(bucket, 'CET'), toolkit_experimental.interpolated_integral(agg, bucket, calc_month_interval(bucket,'CET'), LAG(agg) OVER (ORDER BY bucket), LEAD(agg) OVER (ORDER BY bucket), 'hours')
FROM (
    SELECT time_bucket('1month', time, 'CET') AS bucket, time_weight('LOCF', time, value) as agg
    FROM measurements WHERE variable = 1 
    AND time >= '2022-12-01 00:00:00' 
    AND time <= '2023-02-21 00:00:00' GROUP BY bucket)
temp ORDER BY bucket;

This is a bit unfortunate and we may be able to create a function that helps with this...but it's a bit of a weird edge cases with the interactions of time zones, buckets, and months and oof. Good catch. Let us know if this helps!

davidkohn88 avatar Feb 21 '23 17:02 davidkohn88

David's solution here is much more robust, so I'm just going to delete my last comment.

WireBaron avatar Feb 21 '23 17:02 WireBaron

This appears to be somewhat less robust and only work if your server time zone is set to 'UTC', I'm not 100% sure why yet, but we'll keep looking into this, it may be a bug around how we do time buckets with time zone offsets related in some way to: https://github.com/timescale/timescaledb/issues/5334

davidkohn88 avatar Feb 21 '23 18:02 davidkohn88

The problem appears to be that we are using timestamptz_pl_interval to compute the end time from the given start and interval parameters, and timestamptz_pl_interval always uses the session timezone to add the months and days of the interval to the timestamptz. AFAICT there isn't any direct alternative to timestamptz_pl_interval that allows specifying the timezone.

Since session configuration is transactional, another workaround is set the session timezone in a transaction, make your query, then rollback. edit: or use SET LOCAL within a transaction, then you don't even need to rollback it

syvb avatar Feb 22 '23 01:02 syvb

@davidkohn88

Thanks for the workaround, it's working good.

We also experimented with using the simple interval/average functions like this

SELECT time_bucket('1 month', time) as dt, toolkit_experimental.integral(time_weight('LOCF', time, value), 'hours') AS tw
FROM measurements
WHERE variable = 1
AND time >= '2023-01-01'
AND time <= '2023-02-21'
GROUP BY dt
ORDER BY dt

but the results, especially with fine grained buckets (1m, 15m) were not as good as with the interpolation variants

phoenix147 avatar Feb 22 '23 09:02 phoenix147