timescaledb [Bug]: After table compression database size increased a lot

What type of bug is this?

Other

What subsystems and features are affected?

Compression, Other

What happened?

Our production encounter the issue that after table compression table size reduced but database size increased a lot

Table and compression defination

SELECT create_hypertable('tag', 'date', if_not_exists => TRUE);

DO language plpgsql
$$
DECLARE var_sql text;
BEGIN
IF NOT EXISTS(SELECT * FROM chunk_compression_stats('tag') WHERE compression_status = 'Compressed') THEN
                var_sql := 'ALTER TABLE {tag SET (timescaledb.compress, timescaledb.compress_orderby = 'date DESC', timescaledb.compress_segmentby = 'id');';
                execute var_sql;
end if;
END;
$$;
SELECT add_retention_policy('tag', INTERVAL '30 days', if_not_exists => TRUE);
SELECT add_compression_policy('tag', INTERVAL '8 days', if_not_exists => TRUE);

\d+ tag;
                                  Table "tag"
     Column      |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
-----------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 tenantid        | uuid                        |           | not null |         | plain    |             |              |
 resourceid      | uuid                        |           | not null |         | plain    |             |              |
 value           | numeric                     |           | not null |         | main     |             |              |
 context         | uuid                        |           | not null |         | plain    |             |              |
 topic           | text                        |           | not null |         | extended |             |              |
 tag             | text                        |           | not null |         | extended |             |              |
 id              | uuid                        |           | not null |         | plain    |             |              |
 date            | timestamp without time zone |           | not null |         | plain    |             |              |
 created_at      | timestamp without time zone |           | not null |         | plain    |             |              |
 visibility_list | uuid[]                      |           |          |         | extended |             |              |
Indexes:
    "tag_pkey" PRIMARY KEY, btree (id, date)
    "tag_date_idx" btree (date DESC)
    "index_tag_date" btree (resourceid, date DESC)
Triggers:
    ts_insert_blocker BEFORE INSERT ON timeseries.tag FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Access method: heap

SELECT h.table_name, c.interval_length
  FROM _timescaledb_catalog.dimension c
  JOIN _timescaledb_catalog.hypertable h
    ON h.id = c.hypertable_id;
               table_name                | interval_length
-----------------------------------------+-----------------
 tag                                |    604800000000

Table size before compression: 3.23GB

Table size after compression: 847mb

Total database size before compression: 93.4GB

Total database size after compression: 101GB

There are no other tables compressions happening simultaneously

There is another ticket #5582, the ticket space not claimed but no database size increased

Please suggest, thanks

TimescaleDB version affected

2.10.1

PostgreSQL version used

14.6

What operating system did you use?

CentOS Linux release 7.9.2009

What installation method did you use?

RPM

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

No response

How can we reproduce the bug?

N.A

Jul 03 '23 02:07 yinan8128

Hi @yinan8128,

Thank you for filing the issue!

As explained in #5582, this could happen if the compression error'ed out for some reason before reclaiming the space.

Running vacuum on the hypertable will not work as it is not be propagated to the individual chunks in version 2.10.1. This is now fixed in version 2.11.0. Upgrading to that version and running vacuum on the hypertable should fix your issue.

Alternatively, you can continue using version 2.10.1 but you would have to run the vacuum on all individual chunks manually.

Can you try either of these and let us know if that helps improve your situation?

Jul 03 '23 07:07 lkshminarayanan

Hi

Thank you for your reply

vacuum script

select 'vacuum verbose'||' '||chunk_schema||'.'||chunk_name||';' from timescaledb_information.chunks where hypertable_name='tag' order by hypertable_name,chunk_name;
                         ?column?                          
-----------------------------------------------------------
 vacuum verbose _timescaledb_internal._hyper_19_10_chunk;
 vacuum verbose _timescaledb_internal._hyper_19_246_chunk;
 vacuum verbose _timescaledb_internal._hyper_19_258_chunk;
 vacuum verbose _timescaledb_internal._hyper_19_43_chunk;
 vacuum verbose _timescaledb_internal._hyper_19_47_chunk;

vacuum log in ansible

Something happen INFO:  vacuuming "_timescaledb_internal._hyper_19_10_chunk"
INFO:  table "_hyper_19_10_chunk": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242365
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "pg_toast.pg_toast_18292"
INFO:  table "pg_toast_18292": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242365
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "_timescaledb_internal._hyper_19_246_chunk"
INFO:  launched 2 parallel vacuum workers for index cleanup (planned: 2)
INFO:  table "_hyper_19_246_chunk": found 0 removable, 227775 nonremovable row versions in 9176 out of 53563 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242365
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.17 s, system: 0.14 s, elapsed: 6.33 s.
INFO:  vacuuming "pg_toast.pg_toast_69237"
INFO:  table "pg_toast_69237": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "_timescaledb_internal._hyper_19_258_chunk"
INFO:  launched 2 parallel vacuum workers for index cleanup (planned: 2)
INFO:  table "_hyper_19_258_chunk": found 0 removable, 251045 nonremovable row versions in 10068 out of 73831 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.08 s, system: 0.08 s, elapsed: 0.18 s.
INFO:  vacuuming "pg_toast.pg_toast_75680"
INFO:  table "pg_toast_75680": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "_timescaledb_internal._hyper_19_43_chunk"
INFO:  table "_hyper_19_43_chunk": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "pg_toast.pg_toast_21713"
INFO:  table "pg_toast_21713": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  vacuuming "_timescaledb_internal._hyper_19_47_chunk"
INFO:  scanned index "47_38_events_monitoring_netitag_pkey" to remove 62 row versions
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.03 s
INFO:  scanned index "_hyper_19_47_chunk_events_monitoring_netitag_date_idx" to remove 62 row versions
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  scanned index "_hyper_19_47_chunk_index_events_monitoring_netitag_date" to remove 62 row versions
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  table "_hyper_19_47_chunk": removed 62 dead item identifiers in 4 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  index "47_38_events_monitoring_netitag_pkey" now contains 0 row versions in 2 pages
DETAIL:  62 index row versions were removed.
0 index pages were newly deleted.
0 index pages are currently deleted, of which 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "_hyper_19_47_chunk_events_monitoring_netitag_date_idx" now contains 0 row versions in 2 pages
DETAIL:  62 index row versions were removed.
0 index pages were newly deleted.
0 index pages are currently deleted, of which 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "_hyper_19_47_chunk_index_events_monitoring_netitag_date" now contains 0 row versions in 2 pages
DETAIL:  62 index row versions were removed.
0 index pages were newly deleted.
0 index pages are currently deleted, of which 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  table "_hyper_19_47_chunk": found 62 removable, 0 nonremovable row versions in 4 out of 4 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.03 s.
INFO:  table "_hyper_19_47_chunk": truncated 4 to 0 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.05 s
INFO:  vacuuming "pg_toast.pg_toast_27725"
INFO:  table "pg_toast_27725": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 2242375
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.

Compared the database size, no released, should we run vacuum full or vacuum is okay please?

Jul 04 '23 06:07 yinan8128

Hi @yinan8128,

You can try VACUUM FULL but please note that it requires an ACCESS EXCLUSIVE lock on the table, and therefore cannot be done in parallel with any other READ/WRITE operations on the table.

Jul 04 '23 07:07 lkshminarayanan

Hi

Thank you for your reply

We checked timescaledb_information.job_errors, there are no error log of 2023-06-30, but still some compression problem happened in 2023-06-30, is it possible that timescaledb_information.job_errors not record all compression job error?
What is the reason caused compression failure and how could we check the failure please? Thanks

Jul 05 '23 10:07 yinan8128

Hi @yinan8128,

Any error in the jobs will be recorded in job_errors table. So, if you don't see that there, it is highly unlikely that the compression job failed.

But as your database size increased after compression, the most likely explanation is that the old uncompressed rows have not been vacuumed yet. It could be the case that vacuum has not freed up those rows yet due to some long running transaction. You can follow this - https://emmer.dev/blog/finding-long-running-queries-in-postgresql/ - to check if you have any of those. If you did have such a transaction running, cancel that transaction and then retry vacuum again.

As for why this has happened, it could be that you have hit a bug. But without an error log or an entry in job_errors, this might be harder to investigate.

Jul 27 '23 13:07 lkshminarayanan

Hi @yinan8128 It would of much help if we know what errors are occurring as part of compression policy getting triggered in the background. When a compression policy fails, we do not reclaim the space occupied as part of compression. Might be the case that compression policy failed multiple times and we ended up occupying more space. However this is just a guess, we need the exact errors about what is happening with the compression policy job.

Any chance to re-run your setup on timescaledb 2.11 and see if the issue is resolved.

Jul 27 '23 13:07 sb230132

Hi

Thank you for the update

We already disabled the compression in the production env, let's check another production env which have problem and feedback to you

Jul 31 '23 12:07 yinan8128

Hi

In our one env, there is database size increase problem after compression, retention reduce some storage size, but storage size usage trend is increasing, sooner or later will hit the limit

Any suggestion for the next step please? Thanks

Aug 01 '23 07:08 yinan8128

Hello @yinan8128,

Some more questions:

Is it only the compression policy that fails to run? Or do you have other background jobs/policies which are not running? I’m asking this because it is possible for the background job scheduler to exit and no jobs would run because of that, until it is restarted (although it seems from the graph that the retention policy is running normally).
Have you tried manually compressing your chunks to see if you get any errors then? Does compression complete normally in that case and is the space consumption reduced?

Aug 01 '23 11:08 konskov

Hello

There are retention jobs and compression jobs, there is no timescaledb_information.job_errors records which means no job failure?

From the attached image can see that retention worked(database size reduced 4 times), compression caused database size increased 7 times

Aug 01 '23 13:08 yinan8128

@yinan8128 Note that you need to upgrade to 2.11 to allow a VACUUM on the hypertable to release disk space. Your earlier vacuum script won't work because the informational view you used doesn't show the "internal" compressed chunks. You need to query the _timescaledb_catalog.chunk table to get all chunks.

Aug 14 '23 09:08 erimatnor

Hi

We found that manual compression also can make chunk and table size increased, added reproduce steps to the ticket, https://github.com/timescale/timescaledb/issues/5961 could you please help check?

Aug 15 '23 02:08 yinan8128

Hi @yinan8128 ,

Replied to questions asked as part of issue #5961. Please let us know if you have any further questions.

Aug 29 '23 06:08 pdipesh02

Dear Author,

This issue has been automatically marked as stale due to lack of activity. With only the issue description that is currently provided, we do not have enough information to take action. If you have or find the answers we would need, please reach out. Otherwise, this issue will be closed in 30 days.

Thank you!

Nov 11 '23 01:11 github-actions[bot]

How about the relationship of compression rate, query efficiency and data？ For example 10000 records:

1000 unique values, each value 10 records
100 unique values, each value 100 records
10 unique values, each value 1000 records

Which cases can get good compression rate and query efficieny please? Will case 3 get the best compression rate but worst query efficiency as query need to decompress compressed chunk firstly please? Thanks

Nov 21 '23 07:11 yinan8128

Hi,

I believe this was answered in your other issue: https://github.com/timescale/timescaledb/issues/5961#issuecomment-1833194509

Thanks.

Dec 04 '23 10:12 antekresic

timescaledb timescaledb copied to clipboard

[Bug]: After table compression database size increased a lot

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

timescaledb
timescaledb copied to clipboard