timescaledb
timescaledb copied to clipboard
[Bug]: time_bucket_gapfill using timezone returns incorrect data for daylight saving
What type of bug is this?
Incorrect result
What subsystems and features are affected?
Gapfill
What happened?
Hi everyone,
First of all thanks for the amazing work on TimescaleDb. I have an issue with the time_bucket_gapfill function using the timezone parameter when testing against the daylight saving dates in the Europe/Paris timezone.
The following query :
Parameters=[p0='PT1H' (DbType = Object), p1='Europe/Paris', p2='2024-02-29T23:00:00.0000000Z' (DbType = DateTime), p3='2024-03-31T22:00:00.0000000Z' (DbType = DateTime), p4='d21d06dd-fcf9-4c4b-9ed7-e64c3385af30', p5='2024-02-29T23:00:00.0000000Z' (DbType = DateTime), p6='2024-03-31T22:00:00.0000000Z' (DbType = DateTime)], CommandType='Text', CommandTimeout='30']
SELECT t."Bucket" AS "Timestamp", t."Value"
FROM (
SELECT time_bucket_gapfill(@p0, "Timestamp", @p1, @p2, @p3) AS "Bucket",
sum("Value") as "Value"
FROM "TimeSeriesPoints"
WHERE "TimeSeriesId" = @p4 AND "Timestamp" >= @p5 AND "Timestamp" < @p6
GROUP BY "Bucket"
ORDER BY "Bucket"
) AS t
returns the following results:
...
{
"Timestamp": "2024-03-31T00:00:00Z",
"Value": 10
},
{
"Timestamp": "2024-03-31T01:00:00Z",
"Value": 10
},
{
"Timestamp": "2024-03-31T01:00:00Z",
"Value": null
},
{
"Timestamp": "2024-03-31T02:00:00Z",
"Value": 10
},
{
"Timestamp": "2024-03-31T03:00:00Z",
"Value": 10
}
...
(I skipped the results before and after the 31st)
when the database contains exactly 10 for every 1h in UTC :
I would expect not to have this extra bucket, let me know if I'm doing something wrong otherwise. I also did not change the default timezone of the db, so it is in UTC. I already read the closed issues on the same subject and they all seem to mention that these bugs should be fully resolved in the version I'm using but that doesn't seem to be the case for me.
Thanks in advance for your help, let me know if you need more info.
EDIT: Did a few more tests, especially testing with setting the timezone to Europe/Paris first and here is the result if that can help:
Nicolas
TimescaleDB version affected
2.14.2
PostgreSQL version used
16.2
What operating system did you use?
Docker image pg16 (latest Ubunutu image)
What installation method did you use?
Docker
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?
- Pull docker image
- Create the hypertable
- Add data as in the screenshot
- Execute the given query
- See the results
hello @DiAifU , thank you for reaching out. It would help a lot if you could share reproduction steps that can be executed via psql, if you have been able to reproduce this via psql. If not, could you share your table definition? And more specifically, what Postgres type is your "Timestamp" field?
Hi @konskov ,
Here is what I believe to be the minimum repro in pure SQL statements.
CREATE TABLE "TimeSeries" (
"Id" uuid NOT NULL,
"Name" character varying(250) NOT NULL,
CONSTRAINT "PK_TimeSeries" PRIMARY KEY ("Id")
);
CREATE TABLE "TimeSeriesPoints" (
"Timestamp" timestamp with time zone NOT NULL,
"TimeSeriesId" uuid NOT NULL,
"Value" double precision NOT NULL,
CONSTRAINT "PK_TimeSeriesPoints" PRIMARY KEY ("TimeSeriesId", "Timestamp"),
CONSTRAINT "FK_TimeSeriesPoints_TimeSeries_TimeSeriesId" FOREIGN KEY ("TimeSeriesId") REFERENCES "TimeSeries" ("Id") ON DELETE CASCADE
);
SELECT create_hypertable('"TimeSeriesPoints"', by_range('Timestamp'));
INSERT INTO "TimeSeries" ("Id", "Name")
VALUES ('97622f56-fded-435f-a914-ebf0d0b98c82', 'New TS');
INSERT INTO "TimeSeriesPoints" ("TimeSeriesId", "Timestamp", "Value")
VALUES ('97622f56-fded-435f-a914-ebf0d0b98c82', '2024-03-30T23:00:00Z', 10);
INSERT INTO "TimeSeriesPoints" ("TimeSeriesId", "Timestamp", "Value")
VALUES ('97622f56-fded-435f-a914-ebf0d0b98c82', '2024-03-31T00:00:00Z', 10);
INSERT INTO "TimeSeriesPoints" ("TimeSeriesId", "Timestamp", "Value")
VALUES ('97622f56-fded-435f-a914-ebf0d0b98c82', '2024-03-31T01:00:00Z', 10);
INSERT INTO "TimeSeriesPoints" ("TimeSeriesId", "Timestamp", "Value")
VALUES ('97622f56-fded-435f-a914-ebf0d0b98c82', '2024-03-31T02:00:00Z', 10);
INSERT INTO "TimeSeriesPoints" ("TimeSeriesId", "Timestamp", "Value")
VALUES ('97622f56-fded-435f-a914-ebf0d0b98c82', '2024-03-31T03:00:00Z', 10);
SELECT time_bucket_gapfill('PT1H', "Timestamp", 'Europe/Paris') AS "Bucket",
sum("Value") as "Value"
FROM "TimeSeriesPoints"
WHERE "TimeSeriesId" = '97622f56-fded-435f-a914-ebf0d0b98c82' AND "Timestamp" >= '2024-03-30T23:00:00.0000000Z' AND "Timestamp" < '2024-03-31T04:00:00.0000000Z'
GROUP BY "Bucket"
ORDER BY "Bucket";
The output from that last SELECT statement contains this, which seems incorrect to me.
Thanks for your help.
Nicoas
Hi everybody, I noticed the same behaviour. But later I found the same bug when I used generate_series together with timezone conversion. Maybe it helps you find the bug.
WRONG DST:
OK DST:
Hi everyone,
I'm still having the issue. Do you think by chance anyone could look at it ? It would be a blocker for me if gapfilling function can't be reliable during time changes, so I'll have to find other solutions in these case, unless I'm missing something which would explain the behavior I'm facing.
Thank you for your help !
Nicolas
Is bug #6344 and PR #6507 relevant?
I also reported an issue, seems bad data happens after 2024-03-31 when DST happened in Europe.
#6844
@intermittentnrg I've seen that you've closed your issue but I don't understand how it got resolved for you. I'm still having it on my side whether I try on PG 16 or 14 (like you) and with and updated extension (2.14.2).
Any idea wht I could be doing wrong ?
Nicolas
My issue was resolved by upgrading to 2.14.
I originally ran 2.12 and upgraded one minor version at a time. I forgot to ALTER EXTENSION timescaledb UPDATE
after the 2nd upgrade to 2.14.
Perhaps it's a different issue.
It seems to be different yes.
After another test on docker image pg15.5-ts2.12.2-all, using PostgreSQL 15 and Timescale 2.12.2, things are working as expected and I don't get duplicated timestamp using the above repro example. I believe the provided fix in 2.14 may have broken something else then.
Anyone can help ?
Thanks in advance.
Nicolas
Ok I can also reproduce this.
I got a different issue where a row is doubled and no null value.
When attempting to reproduce it for dbfiddle I instead got the same error as you. https://dbfiddle.uk/nEUn9PF-
It does not reproduce in dbfiddle tho as it's using timescale 2.11.
Ah the reason for the doubled value is because I use interpolate()
which fills in the null value.
Here is dbfiddle with interpolate, which again doesn't occur in dbfiddle which uses 2.11. https://dbfiddle.uk/zd3LKk4f
dbfiddle now has timescale 2.14. Here's new links which reproduce correctly:
Without interpolate: https://dbfiddle.uk/V9LUICgU
2024-03-31 01:00:00+00
is returned twice, once with null
With interpolate: https://dbfiddle.uk/n3xdDowM
2024-03-31 01:00:00+00
is returned twice with same value.
Hmm there is definitely something off. I boiled down the example a bit further:
sven@dev[82029]=# WITH data (time) AS (VALUES
('2024-01-31 02:00:00+00'::timestamptz),
('2024-01-31 02:00:00+00'::timestamptz)
)
SELECT
time_bucket_gapfill('1h'::interval, time, 'Europe/Stockholm')
FROM data
WHERE
time BETWEEN '2024-03-31T0:00Z' AND '2024-03-31T02:00Z'
GROUP BY 1;
time_bucket_gapfill
------------------------
2024-03-31 00:00:00+00
2024-03-31 01:00:00+00
2024-03-31 01:00:00+00
2024-03-31 02:00:00+00
(4 rows)
generate_series does produce the expected result:
sven@dev[82029]=# SELECT generate_series('2024-03-31T0:00Z','2024-03-31T02:00Z','1h'::interval) at time zone 'Europe/Stockholm';
timezone
---------------------
2024-03-31 01:00:00
2024-03-31 03:00:00
2024-03-31 04:00:00
(3 rows)
Slightly shorter
WITH data (time) AS (VALUES('2024-01-31 02:00:00+00'::timestamptz))
SELECT
time_bucket_gapfill('1h', time, 'Europe/Stockholm', '2024-03-31T0:00Z', '2024-03-31T02:00Z')
FROM data
GROUP BY 1;
time_bucket_gapfill
------------------------
2024-01-31 02:00:00+00
2024-03-31 00:00:00+00
2024-03-31 01:00:00+00
2024-03-31 01:00:00+00
(4 rows)
or with zero rows:
SELECT
time_bucket_gapfill('1h', time, 'Europe/Stockholm', '2024-03-31T0:00Z', '2024-03-31T02:00Z')
FROM (SELECT NULL::timestamptz AS time LIMIT 0) s
GROUP BY 1;
@svenklemm Are docker images automatically updated after the merge ?
Need to wait for next release 2.15.1 I think.