timescaledb
timescaledb copied to clipboard
[Bug]: Continuous aggregates with time_bucket hourly and timezone fail to update
What type of bug is this?
Incorrect result, Unexpected error
What subsystems and features are affected?
Continuous aggregate
What happened?
I have a number of Continuous aggregates that uses hourly buckets
time_bucket(interval '1 hour', hypertable_name.time, 'Asia/Dubai') AS bucket
I noticed that i stopped seeing the new results after recreating the hypertable. After hours of debugging i pinned down that the hourly buckets dont refresh when used with timezone
TimescaleDB version affected
2.14.2 and 2.15.0
PostgreSQL version used
15.6
What operating system did you use?
Arch Linux and Ubuntu Linux
What installation method did you use?
Deb/Apt
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
May 09 11:40:20 ahmad-82nd postgres[245522]: 2024-05-09 11:40:20.361 +04 [245522] DEBUG: launching job 1000 "Refresh Continuous Aggregate Policy [1000]"
May 09 11:40:20 ahmad-82nd postgres[246102]: 2024-05-09 11:40:20.368 +04 [246102] DEBUG: extension state changed: unknown to created
May 09 11:40:20 ahmad-82nd postgres[246102]: 2024-05-09 11:40:20.368 +04 [246102] DEBUG: Executing policy_refresh_continuous_aggregate with parameters {"end_offset": null, "start_offset": "3 mons", "mat_hypertable_id": 3}
May 09 11:40:20 ahmad-82nd postgres[246102]: 2024-05-09 11:40:20.369 +04 [246102] DEBUG: rehashing catalog cache id 35 for pg_namespace; 9 tups, 4 buckets
May 09 11:40:20 ahmad-82nd postgres[246102]: 2024-05-09 11:40:20.370 +04 [246102] LOG: job 1000 threw an error
May 09 11:40:20 ahmad-82nd postgres[246102]: 2024-05-09 11:40:20.370 +04 [246102] ERROR: timestamp out of range
May 09 11:40:20 ahmad-82nd postgres[180293]: 2024-05-09 11:40:20.375 +04 [180293] LOG: background worker "Refresh Continuous Aggregate Policy [1000]" (PID 246102) exited with exit code 1
How can we reproduce the bug?
CREATE TABLE metrics (
sensor_id INTEGER NOT NULL,
value DOUBLE PRECISION NOT NULL,
timestamp TIMESTAMPTZ NOT NULL
);
SELECT create_hypertable('metrics', 'timestamp');
-- insert data into the hyper table
INSERT INTO metrics
SELECT
s.sensor_id,
random()*50 + 10,
timestamp
FROM
generate_series(DATE (now() - interval '6 month'), (now() - interval '5 day'), INTERVAL '1 hours') AS timestamp
CROSS JOIN (SELECT generate_series(1, 200) as sensor_id) as s;
-- hourly continuous aggregate with timezone
CREATE MATERIALIZED VIEW datalake_hourly
WITH (timescaledb.continuous, timescaledb.materialized_only = true) AS
SELECT
time_bucket(INTERVAL '1 hour', metrics.timestamp, 'Asia/Dubai') AS bucket,
sensor_id,
count(*) as count_items
FROM metrics
GROUP BY bucket, sensor_id;
SELECT add_continuous_aggregate_policy(
'datalake_hourly',
start_offset => INTERVAL '3 months',
end_offset => Null,
schedule_interval => INTERVAL '15 seconds'
);
-- insert few more rows
INSERT INTO metrics
SELECT
s.sensor_id,
random()*50 + 10,
timestamp
FROM
generate_series((now() - interval '5 day'), now(), INTERVAL '1 hour') AS timestamp
CROSS JOIN (SELECT generate_series(1, 200) as sensor_id) as s;
-- wait and check the continuous aggregate
SELECT
timescaledb_information.jobs.job_id,
timescaledb_information.jobs.application_name,
timescaledb_information.jobs.initial_start,
timescaledb_information.jobs.next_start,
timescaledb_information.jobs.config,
timescaledb_information.jobs.schedule_interval,
timescaledb_information.jobs.max_runtime ,
timescaledb_information.jobs.max_retries ,
timescaledb_information.jobs.retry_period ,
timescaledb_information.jobs.scheduled ,
timescaledb_information.jobs.fixed_schedule ,
timescaledb_information.jobs.hypertable_name ,
timescaledb_information.continuous_aggregates.view_name ,
last_run_started_at ,
job_status,
last_run_status,
last_successful_finish ,
total_runs,
total_failures ,
total_successes
FROM timescaledb_information.jobs
left join timescaledb_information.job_stats on timescaledb_information.jobs.job_id = timescaledb_information.job_stats.job_id
left join timescaledb_information.continuous_aggregates on timescaledb_information.continuous_aggregates.materialization_hypertable_name = timescaledb_information.jobs.hypertable_name
order by job_id;
Thanks for the script, it reproduces for me. The error is related to the NULL value of the end_offset
, so probably as a workaround you can set it to something far in the future but not null.
@fabriziomello should the end point calculation follow the same logic as cagg_get_time_min
? Currently the null end point is returned as max time and not +infinity even for variable-bucket caggs, so the calculations in ts_compute_inscribed_bucketed_refresh_window_variable
fail.
Hi @akuzm thank you for the reply. I believe in the end_offset
we can only specify an interval and so this seemed to work with me '-10 years'
and refreshed the data correctly from today post creating the job.
However I believe we should add a warning or enhance the code related to handling cases with NULL as we had lots of issue with it from back in time