timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

[Bug]: Lost continuous aggregate data older than 2 days

Open hongquan opened this issue 2 years ago • 18 comments

What type of bug is this?

Data corruption

What subsystems and features are affected?

Continuous aggregate

What happened?

I have a website to collect sensor data, where I defines 2 tables:

  • farm_tscondition
            Column            |           Type           | Collation | Nullable | Default 
------------------------------+--------------------------+-----------+----------+---------
 measured_at                  | timestamp with time zone |           | not null | 
 temperature                  | double precision         |           |          | 
 humidity                     | double precision         |           |          | 
  • farm_tsmeasure
                         Table "public.farm_tsmeasure"
      Column       |           Type           | Collation | Nullable | Default 
-------------------+--------------------------+-----------+----------+---------
 measured_at       | timestamp with time zone |           | not null | 
 bird_in           | integer                  |           |          | 
 bird_out          | integer                  |           |          | 

And hypertables:

SELECT create_hypertable('farm_tscondition', 'measured_at', chunk_time_interval => INTERVAL '1 hour');
SELECT create_hypertable('farm_tsmeasure', 'measured_at', chunk_time_interval => INTERVAL '1 hour');
I also create continuous aggregate to get data average of each 10 minutes:
CREATE MATERIALIZED VIEW farm_tscondition_10m
WITH (timescaledb.continuous) AS
SELECT node_id, crop_id, room_id,
       time_bucket(INTERVAL '10 minutes', measured_at) AS time_start,
       AVG(temperature) AS avg_temperature,
       AVG(humidity) AS avg_humidity
FROM farm_tscondition
WHERE node_id IS NOT NULL AND room_id IS NOT NULL
GROUP BY node_id, crop_id, room_id, time_start
WITH NO DATA;

SELECT add_continuous_aggregate_policy('farm_tscondition_10m',
    start_offset => INTERVAL '1 h',
    end_offset => INTERVAL '10 m',
    schedule_interval => INTERVAL '10 m');

CREATE MATERIALIZED VIEW farm_tsmeasure_10m
WITH (timescaledb.continuous) AS
SELECT node_id, crop_id, room_id,
       time_bucket(INTERVAL '10 minutes', measured_at) AS time_start,
       -- For bird, we are counting, so the sum is more meaningful than average.
       SUM(bird_in) AS sum_bird_in,
       SUM(bird_out) AS sum_bird_out
FROM farm_tsmeasure
WHERE node_id IS NOT NULL AND room_id IS NOT NULL
GROUP BY node_id, crop_id, room_id, time_start
WITH NO DATA;

SELECT add_continuous_aggregate_policy('farm_tsmeasure_10m',
    start_offset => INTERVAL '1 h',
    end_offset => INTERVAL '10 m',
    schedule_interval => INTERVAL '10 m');

There are 2 issues:

  • I copy source code and run many websites. So all databases have just the same structure, but one of them is very slow to restore, both with pg_restore and one step of pg_upgrade (when I upgrade Postgres 12 to 14).
  • Many of them lose continuous aggregate data other than 2 days.

Please tell me what info you need to debug further. I can also send the whole data if needed.

TimescaleDB version affected

2.6.1

PostgreSQL version used

12

What operating system did you use?

Ubuntu 20.04 x86

What installation method did you use?

Deb/Apt

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

No response

hongquan avatar May 13 '22 07:05 hongquan

So you cannot query any data older than 2 days?

svenklemm avatar May 23 '22 14:05 svenklemm

@svenklemm Right, when I do SELECT * FROM farm_tscondition_10m, it returns only data from the last 2 days, not older.

hongquan avatar May 24 '22 09:05 hongquan

Could you please provide a complete self-contained script to reproduce the issue. Your cagg definition references columns not part of your table definitions.

svenklemm avatar Jul 27 '22 09:07 svenklemm

Could you please provide a complete self-contained script to reproduce the issue.

This is the whole Django application. After the application run for a while, I see the issue. I don't know which part can reproduce it. I temporary make the source code public, so you can check: https://gitlab.com/agriconnect/plantinghouse/-/blob/master/farm/models.py

Your cagg definition references columns not part of your table definitions.

Ah, yes. Each tscondition, tsmeasure table also have node_id, crop_id, room_id columns. They are not for storing IoT data. They are just foreign key to other tables.

hongquan avatar Jul 27 '22 09:07 hongquan

The entire application might be a little too much for us to go through.

To better investigate your problem, let me describe how I understand the issue. You create an empty database and start writing to it, it works fine for two days, and then on the third day the old data start to disappear from the view table. And continuously disappears with time, so that at every point in time only two previous days are seen. Right?

To confirm this and narrow the problem, please connect with psql to the database and run a SELECT like select min(time_start) from farm_tscondition_10m. If I understood you correctly, every day it should be no more than two days in the past. On the contrary, select min(measured_at) from farm_tscondition should always return the correct oldest value.

On the other hand, if you're talking about the historical data that you are backfilling after creating the new database, the problem might be in that the continuous aggregate is not refreshed for historical data, because it is created with the WITH NO DATA option. In that case, you have to manually run refresh_continuous_aggregate to generate the continuous aggregates for the historical data.

akuzm avatar Aug 17 '22 13:08 akuzm

You create an empty database and start writing to it, it works fine for two days, and then on the third day the old data start to disappear from the view table. And continuously disappears with time, so that at every point in time only two previous days are seen. Right?

Right.

Here is the result of min(time_start):

# SELECT min(time_start) FROM farm_tscondition_10m;
          min           
------------------------
 2022-08-16 10:00:00+07
(1 row)

hongquan avatar Aug 17 '22 14:08 hongquan

Hello @hongquan,

Thank you very much for your response. I am trying to reproduce the problem, but I have not succeeded so far. Would it be possible for you to answer the following questions so I can align my test environment to yours as much as possible?

  • In the issue description, you mention two hypertables and two continuous aggregates. Are both hypertables / aggregates affected by the problem or only one?
  • You mention that both tables have foreign keys and more attributes as shown in the initial description. Could you provide us with the full definition of these tables along with their foreign keys (i.e., by executing \d+ farm_tscondition; and \d+ farm_tsmeasure;)?
  • I would like to understand whether the continuous aggregate is affected by the problem or the entire hypertable. So, could you run both queries that are provided by @akuzm and provide us with the results (i.e., select min(time_start) from farm_tscondition_10m and select min(measured_at) from farm_tscondition)?

jnidzwetzki avatar Sep 12 '22 10:09 jnidzwetzki

@jnidzwetzki

* In the issue description, you mention two hypertables and two continuous aggregates. Are both hypertables / aggregates affected by the problem or only one?

In our prod deployments, only farm_tscondition has data (we don't have IoT devices to feed data to farm_tsmeasure yet), so I can only observe that farm_tscondition_10m is affected.

* You mention that both tables have foreign keys and more attributes as shown in the initial description. Could you provide us with the full definition of these tables along with their foreign keys (i.e., by executing `\d+ farm_tscondition;` and `\d+ farm_tsmeasure;`)?
                                                        Table "public.farm_tscondition"
            Column            |           Type           | Collation | Nullable | Default | Storage | Compression | Stats target | Description 
------------------------------+--------------------------+-----------+----------+---------+---------+-------------+--------------+-------------
 measured_at                  | timestamp with time zone |           | not null |         | plain   |             |              | 
 temperature                  | double precision         |           |          |         | plain   |             |              | 
 humidity                     | double precision         |           |          |         | plain   |             |              | 
 brightness                   | double precision         |           |          |         | plain   |             |              | 
 carbonic                     | smallint                 |           |          |         | plain   |             |              | 
 moisture                     | double precision         |           |          |         | plain   |             |              | 
 dielectric                   | double precision         |           |          |         | plain   |             |              | 
 soil_temperature             | double precision         |           |          |         | plain   |             |              | 
 soil_electrical_conductivity | double precision         |           |          |         | plain   |             |              | 
 electrical_conductivity      | double precision         |           |          |         | plain   |             |              | 
 hydrogen_power               | double precision         |           |          |         | plain   |             |              | 
 solution_temperature         | double precision         |           |          |         | plain   |             |              | 
 solution_salinity            | double precision         |           |          |         | plain   |             |              | 
 crop_id                      | integer                  |           |          |         | plain   |             |              | 
 node_id                      | integer                  |           |          |         | plain   |             |              | 
 room_id                      | integer                  |           |          |         | plain   |             |              | 
 soil_hydrogen_power          | double precision         |           |          |         | plain   |             |              | 
 ammonia                      | double precision         |           |          |         | plain   |             |              | 
Indexes:
    "farm_tscondition_pkey" PRIMARY KEY, btree (measured_at)
    "farm_tscondition_crop_id_09a48628" btree (crop_id)
    "farm_tscondition_node_id_94d38da4" btree (node_id)
    "farm_tscondition_node_index" btree (room_id, node_id, measured_at DESC) WHERE NOT room_id IS NULL AND NOT node_id IS NULL
    "farm_tscondition_room_id_ce266840" btree (room_id)
Check constraints:
    "farm_tscondition_carbonic_check" CHECK (carbonic >= 0)
Foreign-key constraints:
    "farm_tscondition_crop_id_09a48628_fk_farm_crop_id" FOREIGN KEY (crop_id) REFERENCES farm_crop(id) DEFERRABLE INITIALLY DEFERRED
    "farm_tscondition_node_id_94d38da4_fk_farm_node_id" FOREIGN KEY (node_id) REFERENCES farm_node(id) DEFERRABLE INITIALLY DEFERRED
    "farm_tscondition_room_id_ce266840_fk_farm_room_id" FOREIGN KEY (room_id) REFERENCES farm_room(id) DEFERRABLE INITIALLY DEFERRED
Triggers:
    ts_cagg_invalidation_trigger AFTER INSERT OR DELETE OR UPDATE ON farm_tscondition FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.continuous_agg_invalidation_trigger('1')
    ts_insert_blocker BEFORE INSERT ON farm_tscondition FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_1_30711_chunk,
              _timescaledb_internal._hyper_1_30713_chunk,
              _timescaledb_internal._hyper_1_30715_chunk,
              _timescaledb_internal._hyper_1_30717_chunk,
              _timescaledb_internal._hyper_1_30719_chunk,
              _timescaledb_internal._hyper_1_30721_chunk,
              _timescaledb_internal._hyper_1_30723_chunk,
              _timescaledb_internal._hyper_1_30726_chunk,
              _timescaledb_internal._hyper_1_30728_chunk,
              _timescaledb_internal._hyper_1_30730_chunk,
              _timescaledb_internal._hyper_1_30732_chunk,
              _timescaledb_internal._hyper_1_30734_chunk,
              _timescaledb_internal._hyper_1_30736_chunk,
              _timescaledb_internal._hyper_1_30738_chunk,
              _timescaledb_internal._hyper_1_30740_chunk,
              _timescaledb_internal._hyper_1_30742_chunk,
              _timescaledb_internal._hyper_1_30744_chunk,
              _timescaledb_internal._hyper_1_30747_chunk,
              _timescaledb_internal._hyper_1_30749_chunk,
              _timescaledb_internal._hyper_1_30751_chunk,
              _timescaledb_internal._hyper_1_30753_chunk,
              _timescaledb_internal._hyper_1_30755_chunk,
              _timescaledb_internal._hyper_1_30757_chunk,
              _timescaledb_internal._hyper_1_30759_chunk,
              _timescaledb_internal._hyper_1_30761_chunk,
              _timescaledb_internal._hyper_1_30763_chunk,
              _timescaledb_internal._hyper_1_30765_chunk,
              _timescaledb_internal._hyper_1_30768_chunk
Access method: heap
                                                   Table "public.farm_tsmeasure"
      Column       |           Type           | Collation | Nullable | Default | Storage | Compression | Stats target | Description 
-------------------+--------------------------+-----------+----------+---------+---------+-------------+--------------+-------------
 measured_at       | timestamp with time zone |           | not null |         | plain   |             |              | 
 bird_in           | integer                  |           |          |         | plain   |             |              | 
 bird_out          | integer                  |           |          |         | plain   |             |              | 
 voltage           | double precision         |           |          |         | plain   |             |              | 
 electric_current  | double precision         |           |          |         | plain   |             |              | 
 power             | double precision         |           |          |         | plain   |             |              | 
 electrical_energy | double precision         |           |          |         | plain   |             |              | 
 ampli_1_left      | double precision         |           |          |         | plain   |             |              | 
 ampli_1_right     | double precision         |           |          |         | plain   |             |              | 
 ampli_2_left      | double precision         |           |          |         | plain   |             |              | 
 ampli_2_right     | double precision         |           |          |         | plain   |             |              | 
 ampli_3_left      | double precision         |           |          |         | plain   |             |              | 
 ampli_3_right     | double precision         |           |          |         | plain   |             |              | 
 ampli_4_left      | double precision         |           |          |         | plain   |             |              | 
 ampli_4_right     | double precision         |           |          |         | plain   |             |              | 
 crop_id           | integer                  |           |          |         | plain   |             |              | 
 node_id           | integer                  |           |          |         | plain   |             |              | 
 room_id           | integer                  |           |          |         | plain   |             |              | 
Indexes:
    "farm_tsmeasure_pkey" PRIMARY KEY, btree (measured_at)
    "farm_tsmeasure_crop_id_3dfd561d" btree (crop_id)
    "farm_tsmeasure_node_id_fe716720" btree (node_id)
    "farm_tsmeasure_node_index" btree (room_id, node_id, measured_at DESC) WHERE NOT room_id IS NULL AND NOT node_id IS NULL
    "farm_tsmeasure_room_id_675ed512" btree (room_id)
Check constraints:
    "farm_tsmeasure_bird_in_check" CHECK (bird_in >= 0)
    "farm_tsmeasure_bird_out_check" CHECK (bird_out >= 0)
Foreign-key constraints:
    "farm_tsmeasure_crop_id_3dfd561d_fk_farm_crop_id" FOREIGN KEY (crop_id) REFERENCES farm_crop(id) DEFERRABLE INITIALLY DEFERRED
    "farm_tsmeasure_node_id_fe716720_fk_farm_node_id" FOREIGN KEY (node_id) REFERENCES farm_node(id) DEFERRABLE INITIALLY DEFERRED
    "farm_tsmeasure_room_id_675ed512_fk_farm_room_id" FOREIGN KEY (room_id) REFERENCES farm_room(id) DEFERRABLE INITIALLY DEFERRED
Triggers:
    ts_cagg_invalidation_trigger AFTER INSERT OR DELETE OR UPDATE ON farm_tsmeasure FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.continuous_agg_invalidation_trigger('2')
    ts_insert_blocker BEFORE INSERT ON farm_tsmeasure FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_2_30712_chunk,
              _timescaledb_internal._hyper_2_30714_chunk,
              _timescaledb_internal._hyper_2_30716_chunk,
              _timescaledb_internal._hyper_2_30718_chunk,
              _timescaledb_internal._hyper_2_30720_chunk,
              _timescaledb_internal._hyper_2_30722_chunk,
              _timescaledb_internal._hyper_2_30724_chunk,
              _timescaledb_internal._hyper_2_30727_chunk,
              _timescaledb_internal._hyper_2_30729_chunk,
              _timescaledb_internal._hyper_2_30731_chunk,
              _timescaledb_internal._hyper_2_30733_chunk,
              _timescaledb_internal._hyper_2_30735_chunk,
              _timescaledb_internal._hyper_2_30737_chunk,
              _timescaledb_internal._hyper_2_30739_chunk,
              _timescaledb_internal._hyper_2_30741_chunk,
              _timescaledb_internal._hyper_2_30743_chunk,
              _timescaledb_internal._hyper_2_30745_chunk,
              _timescaledb_internal._hyper_2_30748_chunk,
              _timescaledb_internal._hyper_2_30750_chunk,
              _timescaledb_internal._hyper_2_30752_chunk,
              _timescaledb_internal._hyper_2_30754_chunk,
              _timescaledb_internal._hyper_2_30756_chunk,
              _timescaledb_internal._hyper_2_30758_chunk,
              _timescaledb_internal._hyper_2_30760_chunk,
              _timescaledb_internal._hyper_2_30762_chunk,
              _timescaledb_internal._hyper_2_30764_chunk,
              _timescaledb_internal._hyper_2_30766_chunk,
              _timescaledb_internal._hyper_2_30769_chunk
Access method: heap
* I would like to understand whether the continuous aggregate is affected by the problem or the entire hypertable. So, could you run both queries that are provided by @akuzm and provide us with the results (i.e., `select min(time_start) from farm_tscondition_10m` and  `select min(measured_at) from farm_tscondition`)?
# SELECT min(time_start) FROM farm_tscondition_10m;
          min           
------------------------
 2022-09-11 17:00:00+07
(1 row)

# SELECT min(measured_at) FROM farm_tscondition;
              min              
-------------------------------
 2022-09-11 17:00:04.932698+07
(1 row)

hongquan avatar Sep 12 '22 13:09 hongquan

Hello @hongquan,

Thank you very much for all this information. It looks like the data is not only removed from the continuous aggregate; the data is also removed from the underlying hypertable. In addition, it appears that only the data for the last 26 hours is available. Have you ever experienced that the time range in which the data is deleted varies? Is it a fixed time window of 26 hours / 2 days or does it change?

Could you also execute the following query to check that there is no data retention policy configured that deletes data automatically?

SELECT j.hypertable_name,
       j.job_id,
       config,
       schedule_interval,
       job_status,
       last_run_status,
       last_run_started_at,
       js.next_start,
       total_runs,
       total_successes,
       total_failures
  FROM timescaledb_information.jobs j
  JOIN timescaledb_information.job_stats js
    ON j.job_id = js.job_id
  WHERE j.proc_name = 'policy_retention';

You mention that only the table farm_tscondition is currently used by your application. Would it be possible to store some test data in the table farm_tsmeasure to check whether or not this table is also affected by the problem?

jnidzwetzki avatar Sep 13 '22 12:09 jnidzwetzki

@jnidzwetzki

the data is also removed from the underlying hypertable

This is expected, because I have retention policy to remove data older than 1 day from hypertables.

Have you ever experienced that the time range in which the data is deleted varies?

Yes, it varies, but still about 2 days (> 24 hours).

Could you also execute the following query to check that there is not data retention policy configured that deletes data automatically?

This is the result:

 hypertable_name  | job_id |                     config                     | schedule_interval | job_status | last_run_status |      last_run_started_at      |          next_start           | total_runs | total_successes | total_failures 
------------------+--------+------------------------------------------------+-------------------+------------+-----------------+-------------------------------+-------------------------------+------------+-----------------+----------------
 farm_tscondition |   1002 | {"drop_after": "24:00:00", "hypertable_id": 1} | 1 day             | Scheduled  | Success         | 2022-09-13 17:09:59.857615+07 | 2022-09-14 17:10:00.477904+07 |        637 |             600 |             34
 farm_tsmeasure   |   1003 | {"drop_after": "24:00:00", "hypertable_id": 2} | 1 day             | Scheduled  | Success         | 2022-09-13 17:04:31.73244+07  | 2022-09-14 17:04:32.389057+07 |        634 |             600 |             34
(2 rows)

Would it be possible to store some test data in the table farm_tsmeasure to check whether or not this table is also affected by the problem?

Yes, I will try.

hongquan avatar Sep 13 '22 15:09 hongquan

Hello @hongquan,

Thank you for getting back to us. The retention policy explains why the data in the hypertable is removed. For some reason, the continuous aggregate data is also deleted, even though the update policy of the CAGGs seems to be configured correctly and no update should be performed for the deleted data.

It seems that something is refreshing the CAGG for the time interval of the deleted data, which is causing the removal of the CAGG data. In the output of the "SELECT min(...." queries you can see that the time range of the hyper table and the CAGG matches.

Please verify that your refresh policy is configured correctly and no manual updates of the CAGG are performed for the deleted time ranges (e.g., by calling refresh_continuous_aggregate(...)). You can check the policies by executing the following query:

SELECT j.hypertable_name,
       j.job_id, 
       j.proc_name,
       config,
       schedule_interval,
       job_status,
       last_run_status,
       last_run_started_at,
       js.next_start,
       total_runs,
       total_successes,
       total_failures
  FROM timescaledb_information.jobs j
  JOIN timescaledb_information.job_stats js
    ON j.job_id = js.job_id;

If the problem still persists, please provide us with a self-contained example with all of your configured policies so that we can reproduce the problem. You are currently using TimescaleDB 2.6.1, does this problem also occur with the current TimescaleDB version 2.8.0?

jnidzwetzki avatar Sep 14 '22 07:09 jnidzwetzki

Hi @jnidzwetzki

You can check the policies by executing the following query

This is the result:

      hypertable_name       | job_id |              proc_name              |                                   config                                    | schedule_interval | job_status | last_run_status |      last_run_started_at      |          next_start           | total_runs | total_successes | total_failures 
----------------------------+--------+-------------------------------------+-----------------------------------------------------------------------------+-------------------+------------+-----------------+-------------------------------+-------------------------------+------------+-----------------+----------------
 _materialized_hypertable_4 |   1001 | policy_refresh_continuous_aggregate | {"end_offset": "00:20:00", "start_offset": "1 day", "mat_hypertable_id": 4} | 00:20:00          | Scheduled  | Success         | 2022-09-14 17:38:34.380488+07 | 2022-09-14 17:58:35.304035+07 |      34904 |           32866 |           2038
 farm_tscondition           |   1002 | policy_retention                    | {"drop_after": "24:00:00", "hypertable_id": 1}                              | 1 day             | Scheduled  | Success         | 2022-09-14 17:10:00.479657+07 | 2022-09-15 17:10:01.104891+07 |        638 |             601 |             34
 farm_tsmeasure             |   1003 | policy_retention                    | {"drop_after": "24:00:00", "hypertable_id": 2}                              | 1 day             | Scheduled  | Success         | 2022-09-14 17:04:32.392344+07 | 2022-09-15 17:04:33.054147+07 |        635 |             601 |             34
                            |      1 | policy_telemetry                    |                                                                             | 24:00:00          | Scheduled  | Success         | 2022-09-14 00:27:02.605195+07 | 2022-09-15 00:27:05.896366+07 |        694 |             594 |            100

please provide us with a self-contained example with all of your configured policies so that we can reproduce the problem

I don't know how to provide such self-contained example. However, I'm opening the source code and you can see my configured policies in https://gitlab.com/agriconnect/plantinghouse/-/blob/master/farm/migrations/0002_tscondition_tsmeasure_views.sql

You are currently using TimescaleDB 2.6.1, does this problem also occur with the current TimescaleDB version 2.8.0

I haven't upgrade to v2.8 yet. But v2.7 also suffer this issue. I will upgrade to v2.8 to see.

hongquan avatar Sep 14 '22 10:09 hongquan

Hello @hongquan,

Thanks for executing the query. According to the result of the query, there is only one continuous_aggregate_policy configured at the moment. The CAGG is refreshed every 20 minutes for the last day ({"end_offset": "00:20:00", "start_offset": "1 day"}). So, the refresh interval of the CAGG overlaps with the data retention policy that deletes the data after 24 hours.

In this case, also the data of the CAGG is affected by the data retention (see the Documentation for more details about overlapping intervals).

Please decrease the refresh interval of the CAGG or change the data retention policy to keep data for at least two days. For example:

Decrease the refresh interval of the CAGG

SELECT remove_continuous_aggregate_policy('farm_tscondition_10m');

SELECT add_continuous_aggregate_policy('farm_tscondition_10m',
    start_offset => INTERVAL '1 hour',
    end_offset => INTERVAL '10 minutes',
    schedule_interval => INTERVAL '10 minutes');

Increase the retention policy

SELECT remove_retention_policy('farm_tscondition');

SELECT add_retention_policy('farm_tscondition', INTERVAL '2 days');

jnidzwetzki avatar Sep 14 '22 11:09 jnidzwetzki

It is weird that the CAGG is refreshed every 20 min because I configured the refresh interval to be 10 min, as written in https://gitlab.com/agriconnect/plantinghouse/-/blob/master/farm/migrations/0002_tscondition_tsmeasure_views.sql.

Now I choose to increase the retention policy to 2 days. Let's see...

hongquan avatar Sep 14 '22 13:09 hongquan

@jnidzwetzki The issue remains after I upgraded to TimescaleDB v2.8 and increased retention policy to 2 days:

# SELECT min(time_start) FROM farm_tscondition_10m;
          min           
------------------------
 2022-09-16 20:00:00+07
(1 row)

# SELECT min(measured_at) FROM farm_tscondition;
              min              
-------------------------------
 2022-09-16 20:00:01.082967+07
(1 row)
      hypertable_name       | job_id |              proc_name              |                                   config                                    | schedule_interval | job_status | last_run_status |      last_run_started_at      |          next_start           | total_runs | total_successes | total_failures 
----------------------------+--------+-------------------------------------+-----------------------------------------------------------------------------+-------------------+------------+-----------------+-------------------------------+-------------------------------+------------+-----------------+----------------
 _materialized_hypertable_4 |   1001 | policy_refresh_continuous_aggregate | {"end_offset": "00:20:00", "start_offset": "1 day", "mat_hypertable_id": 4} | 00:20:00          | Scheduled  | Success         | 2022-09-19 10:14:59.604224+07 | 2022-09-19 10:35:00.529568+07 |      35239 |           33201 |           2038
 farm_tscondition           |   1004 | policy_retention                    | {"drop_after": "2 days", "hypertable_id": 1}                                | 1 day             | Scheduled  | Success         | 2022-09-18 20:43:33.153523+07 | 2022-09-19 20:43:33.771559+07 |          5 |               5 |              0
 farm_tsmeasure             |   1003 | policy_retention                    | {"drop_after": "24:00:00", "hypertable_id": 2}                              | 1 day             | Scheduled  | Success         | 2022-09-18 17:04:35.499136+07 | 2022-09-19 17:04:36.150056+07 |        639 |             605 |             34
                            |      1 | policy_telemetry                    |                                                                             | 24:00:00          | Scheduled  | Success         | 2022-09-19 00:27:21.284439+07 | 2022-09-20 00:27:24.536283+07 |        699 |             599 |            100
(4 rows)

hongquan avatar Sep 19 '22 03:09 hongquan

Hello @hongquan,

Thank you very much for the update. I am currently trying to reproduce the problem in my local environment. I have created a minimal example based on the information you provided and am now waiting 24 hours to analyze what data is being deleted from the hypertable and the continuous aggregate.

To make sure my environment is very similar to yours, could you send us the output of the following two commands?

\d+ farm_tscondition_10m;
\d+ farm_tsmeasure_10m;

In the output of the jobs query, you can see that only for one of the hypertables the retention policy is set to 2 days. For the second hypertable, it is still set to 24 hours. Please check if this is intentional and if necessary, change the retention policy for the second hypertable as well.

jnidzwetzki avatar Sep 19 '22 14:09 jnidzwetzki

@jnidzwetzki

\d+ farm_tscondition_10m;

                                          View "public.farm_tscondition_10m"
              Column              |           Type           | Collation | Nullable | Default | Storage | Description 
----------------------------------+--------------------------+-----------+----------+---------+---------+-------------
 node_id                          | integer                  |           |          |         | plain   | 
 crop_id                          | integer                  |           |          |         | plain   | 
 room_id                          | integer                  |           |          |         | plain   | 
 time_start                       | timestamp with time zone |           |          |         | plain   | 
 avg_temperature                  | double precision         |           |          |         | plain   | 
 avg_humidity                     | double precision         |           |          |         | plain   | 
 avg_brightness                   | double precision         |           |          |         | plain   | 
 avg_carbonic                     | numeric                  |           |          |         | main    | 
 avg_ammonia                      | double precision         |           |          |         | plain   | 
 avg_moisture                     | double precision         |           |          |         | plain   | 
 avg_dielectric                   | double precision         |           |          |         | plain   | 
 avg_soil_temperature             | double precision         |           |          |         | plain   | 
 avg_soil_electrical_conductivity | double precision         |           |          |         | plain   | 
 avg_electrical_conductivity      | double precision         |           |          |         | plain   | 
 avg_hydrogen_power               | double precision         |           |          |         | plain   | 
 avg_solution_temperature         | double precision         |           |          |         | plain   | 
 avg_solution_salinity            | double precision         |           |          |         | plain   | 
View definition:
 SELECT _materialized_hypertable_6.node_id,
    _materialized_hypertable_6.crop_id,
    _materialized_hypertable_6.room_id,
    _materialized_hypertable_6.time_start,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _mate
rialized_hypertable_6.agg_5_5, NULL::double precision) AS avg_temperature,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _mate
rialized_hypertable_6.agg_6_6, NULL::double precision) AS avg_humidity,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _mate
rialized_hypertable_6.agg_7_7, NULL::double precision) AS avg_brightness,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(smallint)'::text, NULL::name, NULL::name, '{{pg_catalog,int2}}'::name[], _materialized_h
ypertable_6.agg_8_8, NULL::numeric) AS avg_carbonic,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _mate
rialized_hypertable_6.agg_9_9, NULL::double precision) AS avg_ammonia,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_6.agg_10_10, NULL::double precision) AS avg_moisture,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_6.agg_11_11, NULL::double precision) AS avg_dielectric,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_6.agg_12_12, NULL::double precision) AS avg_soil_temperature,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_6.agg_13_13, NULL::double precision) AS avg_soil_electrical_conductivity,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_6.agg_14_14, NULL::double precision) AS avg_electrical_conductivity,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_6.agg_15_15, NULL::double precision) AS avg_hydrogen_power,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_6.agg_16_16, NULL::double precision) AS avg_solution_temperature,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_6.agg_17_17, NULL::double precision) AS avg_solution_salinity
   FROM _timescaledb_internal._materialized_hypertable_6
  WHERE _materialized_hypertable_6.time_start < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(6)), '-infinity'::timestamp with time zone)
  GROUP BY _materialized_hypertable_6.node_id, _materialized_hypertable_6.crop_id, _materialized_hypertable_6.room_id, _materialized_hypertable_6.time_start
UNION ALL
 SELECT farm_tscondition.node_id,
    farm_tscondition.crop_id,
    farm_tscondition.room_id,
    time_bucket('00:10:00'::interval, farm_tscondition.measured_at) AS time_start,
    avg(farm_tscondition.temperature) AS avg_temperature,
    avg(farm_tscondition.humidity) AS avg_humidity,
    avg(farm_tscondition.brightness) AS avg_brightness,
    avg(farm_tscondition.carbonic) AS avg_carbonic,
    avg(farm_tscondition.ammonia) AS avg_ammonia,
    avg(farm_tscondition.moisture) AS avg_moisture,
    avg(farm_tscondition.dielectric) AS avg_dielectric,
    avg(farm_tscondition.soil_temperature) AS avg_soil_temperature,
    avg(farm_tscondition.soil_electrical_conductivity) AS avg_soil_electrical_conductivity,
    avg(farm_tscondition.electrical_conductivity) AS avg_electrical_conductivity,
    avg(farm_tscondition.hydrogen_power) AS avg_hydrogen_power,
    avg(farm_tscondition.solution_temperature) AS avg_solution_temperature,
    avg(farm_tscondition.solution_salinity) AS avg_solution_salinity
   FROM farm_tscondition
  WHERE farm_tscondition.node_id IS NOT NULL AND farm_tscondition.room_id IS NOT NULL AND farm_tscondition.measured_at >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(6)), '-infinity'::timestamp with time zone)
  GROUP BY farm_tscondition.node_id, farm_tscondition.crop_id, farm_tscondition.room_id, (time_bucket('00:10:00'::interval, farm_tscondition.measured_at));
\d+ farm_tsmeasure_10m;

                                     View "public.farm_tsmeasure_10m"
        Column         |           Type           | Collation | Nullable | Default | Storage | Description 
-----------------------+--------------------------+-----------+----------+---------+---------+-------------
 node_id               | integer                  |           |          |         | plain   | 
 crop_id               | integer                  |           |          |         | plain   | 
 room_id               | integer                  |           |          |         | plain   | 
 time_start            | timestamp with time zone |           |          |         | plain   | 
 sum_bird_in           | bigint                   |           |          |         | plain   | 
 sum_bird_out          | bigint                   |           |          |         | plain   | 
 avg_voltage           | double precision         |           |          |         | plain   | 
 avg_electric_current  | double precision         |           |          |         | plain   | 
 avg_power             | double precision         |           |          |         | plain   | 
 avg_electrical_energy | double precision         |           |          |         | plain   | 
 avg_ampli_1_left      | double precision         |           |          |         | plain   | 
 avg_ampli_1_right     | double precision         |           |          |         | plain   | 
 avg_ampli_2_left      | double precision         |           |          |         | plain   | 
 avg_ampli_2_right     | double precision         |           |          |         | plain   | 
 avg_ampli_3_left      | double precision         |           |          |         | plain   | 
 avg_ampli_3_right     | double precision         |           |          |         | plain   | 
 avg_ampli_4_left      | double precision         |           |          |         | plain   | 
 avg_ampli_4_right     | double precision         |           |          |         | plain   | 
View definition:
 SELECT _materialized_hypertable_4.node_id,
    _materialized_hypertable_4.crop_id,
    _materialized_hypertable_4.room_id,
    _materialized_hypertable_4.time_start,
    _timescaledb_internal.finalize_agg('pg_catalog.sum(integer)'::text, NULL::name, NULL::name, '{{pg_catalog,int4}}'::name[], _materialized_hypertable_4.agg_5_5, NULL::bigint) AS sum_bird_in,
    _timescaledb_internal.finalize_agg('pg_catalog.sum(integer)'::text, NULL::name, NULL::name, '{{pg_catalog,int4}}'::name[], _materialized_hypertable_4.agg_6_6, NULL::bigint) AS sum_bird_out,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_4.agg_7_7, NULL::double precision) AS avg_voltage,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_4.agg_8_8, NULL::double precision) AS avg_electric_current,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_4.agg_9_9, NULL::double precision) AS avg_power,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_4.agg_10_10, NULL::double precision) AS avg_electrical_energy,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_4.agg_11_11, NULL::double precision) AS avg_ampli_1_left,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_4.agg_12_12, NULL::double precision) AS avg_ampli_1_right,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_4.agg_13_13, NULL::double precision) AS avg_ampli_2_left,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_4.agg_14_14, NULL::double precision) AS avg_ampli_2_right,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_4.agg_15_15, NULL::double precision) AS avg_ampli_3_left,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_4.agg_16_16, NULL::double precision) AS avg_ampli_3_right,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_4.agg_17_17, NULL::double precision) AS avg_ampli_4_left,
    _timescaledb_internal.finalize_agg('pg_catalog.avg(double precision)'::text, NULL::name, NULL::name, '{{pg_catalog,float8}}'::name[], _materialized_hypertable_4.agg_18_18, NULL::double precision) AS avg_ampli_4_right
   FROM _timescaledb_internal._materialized_hypertable_4
  WHERE _materialized_hypertable_4.time_start < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(4)), '-infinity'::timestamp with time zone)
  GROUP BY _materialized_hypertable_4.node_id, _materialized_hypertable_4.crop_id, _materialized_hypertable_4.room_id, _materialized_hypertable_4.time_start
UNION ALL
 SELECT farm_tsmeasure.node_id,
    farm_tsmeasure.crop_id,
    farm_tsmeasure.room_id,
    time_bucket('00:10:00'::interval, farm_tsmeasure.measured_at) AS time_start,
    sum(farm_tsmeasure.bird_in) AS sum_bird_in,
    sum(farm_tsmeasure.bird_out) AS sum_bird_out,
    avg(farm_tsmeasure.voltage) AS avg_voltage,
    avg(farm_tsmeasure.electric_current) AS avg_electric_current,
    avg(farm_tsmeasure.power) AS avg_power,
    avg(farm_tsmeasure.electrical_energy) AS avg_electrical_energy,
    avg(farm_tsmeasure.ampli_1_left) AS avg_ampli_1_left,
    avg(farm_tsmeasure.ampli_1_right) AS avg_ampli_1_right,
    avg(farm_tsmeasure.ampli_2_left) AS avg_ampli_2_left,
    avg(farm_tsmeasure.ampli_2_right) AS avg_ampli_2_right,
    avg(farm_tsmeasure.ampli_3_left) AS avg_ampli_3_left,
    avg(farm_tsmeasure.ampli_3_right) AS avg_ampli_3_right,
    avg(farm_tsmeasure.ampli_4_left) AS avg_ampli_4_left,
    avg(farm_tsmeasure.ampli_4_right) AS avg_ampli_4_right
   FROM farm_tsmeasure
  WHERE farm_tsmeasure.node_id IS NOT NULL AND farm_tsmeasure.room_id IS NOT NULL AND farm_tsmeasure.measured_at >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(4)), '-infinity'::timestamp with time zone)
  GROUP BY farm_tsmeasure.node_id, farm_tsmeasure.crop_id, farm_tsmeasure.room_id, (time_bucket('00:10:00'::interval, farm_tsmeasure.measured_at));

if necessary, change the retention policy for the second hypertable as well.

Ok, I've just lengthened the retention policy for ts_measure to 2 days.

hongquan avatar Sep 20 '22 03:09 hongquan

Hello @hongquan,

Thank you for executing the queries and providing us with the output. I was now able to find out the root cause of the problem.

You have two continuous aggregates (farm_tscondition_10m and farm_tsmeasure_10m). You mentioned that only farm_tscondition_10m is used at the moment and that it is not clear if farm_tsmeasure_10m is also affected by the problem. In the most recent query output, you can see that the materialized data of farm_tscondition_10m is stored in the table _materialized_hypertable_6; the materialized data of the farm_tsmeasure_10m continuous aggregate is stored in _materialized_hypertable_4.

According to your job definition, there is only one continuous aggregate refresh policy defined at the moment. This policy refreshes the data of _materialized_hypertable_4 (see {"end_offset": "00:20:00", "start_offset": "1 day", "mat_hypertable_id": 4} in the job output).

      hypertable_name       | job_id |              proc_name              |                                   config                                    | schedule_interval | job_status | last_run_status |      last_run_started_at      |          next_start           | total_runs | total_successes | total_failures 
----------------------------+--------+-------------------------------------+-----------------------------------------------------------------------------+-------------------+------------+-----------------+-------------------------------+-------------------------------+------------+-----------------+----------------
 _materialized_hypertable_4 |   1001 | policy_refresh_continuous_aggregate | {"end_offset": "00:20:00", "start_offset": "1 day", "mat_hypertable_id": 4} | 00:20:00          | Scheduled  | Success         | 2022-09-19 10:14:59.604224+07 | 2022-09-19 10:35:00.529568+07 |      35239 |           33201 |           2038
 farm_tscondition           |   1004 | policy_retention                    | {"drop_after": "2 days", "hypertable_id": 1}                                | 1 day             | Scheduled  | Success         | 2022-09-18 20:43:33.153523+07 | 2022-09-19 20:43:33.771559+07 |          5 |               5 |              0
 farm_tsmeasure             |   1003 | policy_retention                    | {"drop_after": "24:00:00", "hypertable_id": 2}                              | 1 day             | Scheduled  | Success         | 2022-09-18 17:04:35.499136+07 | 2022-09-19 17:04:36.150056+07 |        639 |             605 |             34
                            |      1 | policy_telemetry                    |                                                                             | 24:00:00          | Scheduled  | Success         | 2022-09-19 00:27:21.284439+07 | 2022-09-20 00:27:24.536283+07 |        699 |             599 |            100
(4 rows)

So, only the data for farm_tsmeasure_10m is materialized to disk; the data for farm_tscondition_10m is never materialized. If you query the continuous aggregate, the data is calculated using real-time aggregation. Once the data is removed from the underlying hypertable by the retention policy, the real-time aggregation can no longer access the data, and the data is not available in the continuous aggregate.

You mentioned in your initial post that there should be a refresh policy configured for farm_tscondition_10m, but the policy seems not to be present on this system. So please re-execute the command:

SELECT add_continuous_aggregate_policy('farm_tscondition_10m',
    start_offset => INTERVAL '1 h',
    end_offset => INTERVAL '10 m',
    schedule_interval => INTERVAL '10 m');

to create a continuous aggregate refresh policy and to keep the data of the continuous aggregate even if the retention policy removes the data from the underlying hypertable.

jnidzwetzki avatar Sep 20 '22 09:09 jnidzwetzki

After adding CAgg policy for farm_tscondition_10m, the CAgg data now is kept as expected. Thank you, @jnidzwetzki . This issue is resolved.

hongquan avatar Oct 02 '22 08:10 hongquan

This proble comback, even that I have enough CAgg policy:

SELECT j.hypertable_name,
        j.job_id, 
        j.proc_name,
        config,
        schedule_interval,
        job_status,
        last_run_status,
        last_run_started_at,
        js.next_start,
        total_runs,
        total_successes,
        total_failures
   FROM timescaledb_information.jobs j
   JOIN timescaledb_information.job_stats js
     ON j.job_id = js.job_id;


      hypertable_name       | job_id |              proc_name              |                                     config                                     | schedule_interval | job_status | last_run_status |      last_run_started_at      |          next_start           | total_runs | total_successes | total_failures 
----------------------------+--------+-------------------------------------+--------------------------------------------------------------------------------+-------------------+------------+-----------------+-------------------------------+-------------------------------+------------+-----------------+----------------
 _materialized_hypertable_5 |   1003 | policy_refresh_continuous_aggregate | {"end_offset": "00:10:00", "start_offset": "01:00:00", "mat_hypertable_id": 5} | 00:10:00          | Scheduled  | Success         | 2023-04-04 15:10:06.579911+07 | 2023-04-04 15:20:07.305988+07 |      14092 |           14092 |              0
 _materialized_hypertable_6 |   1004 | policy_refresh_continuous_aggregate | {"end_offset": "00:10:00", "start_offset": "01:00:00", "mat_hypertable_id": 6} | 00:10:00          | Scheduled  | Success         | 2023-04-04 15:09:24.597891+07 | 2023-04-04 15:19:24.685044+07 |        172 |             172 |              0
 farm_tscondition           |   1000 | policy_retention                    | {"drop_after": "24:00:00", "hypertable_id": 1}                                 | 1 day             | Scheduled  | Success         | 2023-04-03 20:46:09.614576+07 | 2023-04-04 20:46:10.895492+07 |        225 |             216 |              9
 farm_tsmeasure             |   1001 | policy_retention                    | {"drop_after": "24:00:00", "hypertable_id": 2}                                 | 1 day             | Scheduled  | Success         | 2023-04-03 20:52:20.479195+07 | 2023-04-04 20:52:21.01931+07  |        225 |             216 |              9
                            |      1 | policy_telemetry                    |                                                                                | 24:00:00          | Scheduled  | Success         | 2023-04-04 11:54:35.120112+07 | 2023-04-05 11:54:37.701018+07 |        236 |             225 |             11
                            |      2 | policy_job_error_retention          | {"drop_after": "1 month"}                                                      | 1 mon             | Scheduled  | Success         | 2023-04-01 07:00:00.003715+07 | 2023-05-01 07:00:00+07        |          5 |               5 |              0
(6 rows)

My software version now is:

$ dpkg -l | rg timescaledb
ii  timescaledb-2-loader-postgresql-14    2.10.1~ubuntu22.04                      amd64        The loader for TimescaleDB to load individual versions.
ii  timescaledb-2-postgresql-14           2.10.1~ubuntu22.04                      amd64        An open-source time-series database based on PostgreSQL, as an extension.
ii  timescaledb-toolkit-postgresql-14     1:1.15.0~ubuntu22.04                    amd64        Library of analytical hyperfunctions, time-series pipelining, and other SQL utilities, compatible with TimescaleDB and PostgreSQL
ii  timescaledb-tools                     0.14.3~ubuntu22.04                      amd64        A suite of tools that can be used with TimescaleDB.

Could you take a look, @jnidzwetzki ?

hongquan avatar Apr 04 '23 08:04 hongquan

Solved it by deleting and recreating CAgg policy.

It seems that the problem happen when I recreate the materialized views in DB migration process.

hongquan avatar Apr 15 '23 05:04 hongquan