timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

[Bug]: Continuous aggregates recalculation of historical data

Open TRUSTMEIMJEDI opened this issue 1 year ago • 2 comments

What type of bug is this?

Other

What subsystems and features are affected?

Continuous aggregate

What happened?

I'm not a native English speaker, but as you read about continuous aggregates in the TimescaleDB documentation, you can see that the documentation contains conflicting information in various places.

My problem:

I have the following hypertable:

create table conditions
(
    timestamp   timestamp with time zone not null,
    datapoint   varchar                  not null,
    country_id    uuid                     not null,
    value       interval                 not null
);

create unique index conditions_timestamp_datapoint_country_id_uidx
    on conditions (timestamp, datapoint, country_id);

select create_hypertable('conditions', 'timestamp');

Following materialized view:

create materialized view country_datapoints_hourly
    with (timescaledb.continuous) as
select datapoint_type_id,
       country_id,
       time_bucket('1 hour', timestamp)       as bucket,
       extract(epoch from sum(value))::bigint as duration
from conditions
group by 1, 2, 3
with no data;

Following continuous aggregate policy:

select add_continuous_aggregate_policy('country_datapoints_hourly',
                                       start_offset => interval '4 hours',
                                       end_offset => interval '1 minute',
                                       schedule_interval => interval '1 hour');

And when new historical data is inserted to the table conditions and timestamp is behind the materialized threshold/watermark, and also behind offset time for the incoming continuous aggregate policy related to these documentations (provided below) I expect data to be reevaluated/recalculated for materialized view.

  1. https://www.timescale.com/blog/continuous-aggregates-faster-queries-with-automatically-maintained-materialized-views/ Chapter: Invalidation Engine and Materialization Engine

  2. https://github.com/timescale/timescaledb/blob/main/tsl/src/continuous_aggs/README.md Chapter: Bookkeeping and Internal State and The Invalidation Log and Threshold

While different parts of the documentation are stating opposite and reevaluation of historical data can not be done out of the box and it must be done manually by calling CALL refresh_continuous_aggregate

  1. https://docs.timescale.com/use-timescale/latest/continuous-aggregates/troubleshooting/#updates-to-previously-materialized-regions-arent-shown-in-real-time-aggregates

  2. https://docs.timescale.com/use-timescale/latest/continuous-aggregates/troubleshooting/#continuous-aggregate-doesnt-refresh-with-newly-inserted-historical-data

TimescaleDB version affected

2.10.2

PostgreSQL version used

14.1

What operating system did you use?

Ubuntu 22.04 x64

What installation method did you use?

Docker

What platform did you run on?

Google Cloud Platform (GCP)

Relevant log output and stack trace

No response

How can we reproduce the bug?

1

create table conditions
(
    timestamp   timestamp with time zone not null,
    datapoint   varchar                  not null,
    country_id    uuid                     not null,
    value       interval                 not null
);

create unique index conditions_timestamp_datapoint_country_id_uidx
    on conditions (timestamp, datapoint, country_id);

select create_hypertable('conditions', 'timestamp');

2

create materialized view country_datapoints_hourly
    with (timescaledb.continuous) as
select datapoint_type_id,
       country_id,
       time_bucket('1 hour', timestamp)       as bucket,
       extract(epoch from sum(value))::bigint as duration
from conditions
group by 1, 2, 3
with no data;

3

select add_continuous_aggregate_policy('country_datapoints_hourly',
                                       start_offset => interval '4 hours',
                                       end_offset => interval '1 minute',
                                       schedule_interval => interval '1 hour');

4 Add recent data, and call manually CALL refresh_continuous_aggregate 5 Add data before threshold/watermark

TRUSTMEIMJEDI avatar Jan 19 '24 17:01 TRUSTMEIMJEDI

hello @TRUSTMEIMJEDI , thank you for reaching out. The expected behavior of a continuous aggregate refresh is to process invalidations within its specified range. So in your example, if there is data inserted that’s older than 4 hours, the refresh will not materialize these.

konskov avatar Jan 22 '24 13:01 konskov

@konskov Could TimescaleDB Team update these documents, because currently two different behaviors are described for continuous aggregates?

https://www.timescale.com/blog/continuous-aggregates-faster-queries-with-automatically-maintained-materialized-views/ Chapter: Invalidation Engine and Materialization Engine

https://github.com/timescale/timescaledb/blob/main/tsl/src/continuous_aggs/README.md Chapter: Bookkeeping and Internal State and The Invalidation Log and Threshold

VS

https://docs.timescale.com/use-timescale/latest/continuous-aggregates/troubleshooting/#updates-to-previously-materialized-regions-arent-shown-in-real-time-aggregates

https://docs.timescale.com/use-timescale/latest/continuous-aggregates/troubleshooting/#continuous-aggregate-doesnt-refresh-with-newly-inserted-historical-data

TRUSTMEIMJEDI avatar Jan 22 '24 13:01 TRUSTMEIMJEDI