timescaledb
timescaledb copied to clipboard
[Bug]: Continuous aggregates recalculation of historical data
What type of bug is this?
Other
What subsystems and features are affected?
Continuous aggregate
What happened?
I'm not a native English speaker, but as you read about continuous aggregates in the TimescaleDB documentation, you can see that the documentation contains conflicting information in various places.
My problem:
I have the following hypertable:
create table conditions
(
timestamp timestamp with time zone not null,
datapoint varchar not null,
country_id uuid not null,
value interval not null
);
create unique index conditions_timestamp_datapoint_country_id_uidx
on conditions (timestamp, datapoint, country_id);
select create_hypertable('conditions', 'timestamp');
Following materialized view:
create materialized view country_datapoints_hourly
with (timescaledb.continuous) as
select datapoint_type_id,
country_id,
time_bucket('1 hour', timestamp) as bucket,
extract(epoch from sum(value))::bigint as duration
from conditions
group by 1, 2, 3
with no data;
Following continuous aggregate policy:
select add_continuous_aggregate_policy('country_datapoints_hourly',
start_offset => interval '4 hours',
end_offset => interval '1 minute',
schedule_interval => interval '1 hour');
And when new historical data is inserted to the table conditions
and timestamp is behind the materialized threshold/watermark, and also behind offset time for the incoming continuous aggregate policy related to these documentations (provided below) I expect data to be reevaluated/recalculated for materialized view.
-
https://www.timescale.com/blog/continuous-aggregates-faster-queries-with-automatically-maintained-materialized-views/ Chapter: Invalidation Engine and Materialization Engine
-
https://github.com/timescale/timescaledb/blob/main/tsl/src/continuous_aggs/README.md Chapter: Bookkeeping and Internal State and The Invalidation Log and Threshold
While different parts of the documentation are stating opposite and reevaluation of historical data can not be done out of the box and it must be done manually by calling CALL refresh_continuous_aggregate
-
https://docs.timescale.com/use-timescale/latest/continuous-aggregates/troubleshooting/#updates-to-previously-materialized-regions-arent-shown-in-real-time-aggregates
-
https://docs.timescale.com/use-timescale/latest/continuous-aggregates/troubleshooting/#continuous-aggregate-doesnt-refresh-with-newly-inserted-historical-data
TimescaleDB version affected
2.10.2
PostgreSQL version used
14.1
What operating system did you use?
Ubuntu 22.04 x64
What installation method did you use?
Docker
What platform did you run on?
Google Cloud Platform (GCP)
Relevant log output and stack trace
No response
How can we reproduce the bug?
1
create table conditions
(
timestamp timestamp with time zone not null,
datapoint varchar not null,
country_id uuid not null,
value interval not null
);
create unique index conditions_timestamp_datapoint_country_id_uidx
on conditions (timestamp, datapoint, country_id);
select create_hypertable('conditions', 'timestamp');
2
create materialized view country_datapoints_hourly
with (timescaledb.continuous) as
select datapoint_type_id,
country_id,
time_bucket('1 hour', timestamp) as bucket,
extract(epoch from sum(value))::bigint as duration
from conditions
group by 1, 2, 3
with no data;
3
select add_continuous_aggregate_policy('country_datapoints_hourly',
start_offset => interval '4 hours',
end_offset => interval '1 minute',
schedule_interval => interval '1 hour');
4 Add recent data, and call manually CALL refresh_continuous_aggregate 5 Add data before threshold/watermark
hello @TRUSTMEIMJEDI , thank you for reaching out. The expected behavior of a continuous aggregate refresh is to process invalidations within its specified range. So in your example, if there is data inserted that’s older than 4 hours, the refresh will not materialize these.
@konskov Could TimescaleDB Team update these documents, because currently two different behaviors are described for continuous aggregates?
https://www.timescale.com/blog/continuous-aggregates-faster-queries-with-automatically-maintained-materialized-views/ Chapter: Invalidation Engine and Materialization Engine
https://github.com/timescale/timescaledb/blob/main/tsl/src/continuous_aggs/README.md Chapter: Bookkeeping and Internal State and The Invalidation Log and Threshold
VS
https://docs.timescale.com/use-timescale/latest/continuous-aggregates/troubleshooting/#updates-to-previously-materialized-regions-arent-shown-in-real-time-aggregates
https://docs.timescale.com/use-timescale/latest/continuous-aggregates/troubleshooting/#continuous-aggregate-doesnt-refresh-with-newly-inserted-historical-data