timescaledb
timescaledb copied to clipboard
[Bug]: After table compression database size increased a lot
What type of bug is this?
Other
What subsystems and features are affected?
Compression, Other
What happened?
Our production encounter the issue that after table compression table size reduced but database size increased a lot
Table and compression defination
SELECT create_hypertable('tag', 'date', if_not_exists => TRUE);
DO language plpgsql
$$
DECLARE var_sql text;
BEGIN
IF NOT EXISTS(SELECT * FROM chunk_compression_stats('tag') WHERE compression_status = 'Compressed') THEN
var_sql := 'ALTER TABLE {tag SET (timescaledb.compress, timescaledb.compress_orderby = 'date DESC', timescaledb.compress_segmentby = 'id');';
execute var_sql;
end if;
END;
$$;
SELECT add_retention_policy('tag', INTERVAL '30 days', if_not_exists => TRUE);
SELECT add_compression_policy('tag', INTERVAL '8 days', if_not_exists => TRUE);
\d+ tag;
Table "tag"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-----------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
tenantid | uuid | | not null | | plain | | |
resourceid | uuid | | not null | | plain | | |
value | numeric | | not null | | main | | |
context | uuid | | not null | | plain | | |
topic | text | | not null | | extended | | |
tag | text | | not null | | extended | | |
id | uuid | | not null | | plain | | |
date | timestamp without time zone | | not null | | plain | | |
created_at | timestamp without time zone | | not null | | plain | | |
visibility_list | uuid[] | | | | extended | | |
Indexes:
"tag_pkey" PRIMARY KEY, btree (id, date)
"tag_date_idx" btree (date DESC)
"index_tag_date" btree (resourceid, date DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON timeseries.tag FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker()
Access method: heap
SELECT h.table_name, c.interval_length
FROM _timescaledb_catalog.dimension c
JOIN _timescaledb_catalog.hypertable h
ON h.id = c.hypertable_id;
table_name | interval_length
-----------------------------------------+-----------------
tag | 604800000000
Table size before compression: 3.23GB
Table size after compression: 847mb
Total database size before compression: 93.4GB
Total database size after compression: 101GB
There are no other tables compressions happening simultaneously
There is another ticket #5582, the ticket space not claimed but no database size increased
Please suggest, thanks
TimescaleDB version affected
2.10.1
PostgreSQL version used
14.6
What operating system did you use?
CentOS Linux release 7.9.2009
What installation method did you use?
RPM
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?
N.A
Hi @yinan8128,
Thank you for filing the issue!
As explained in #5582, this could happen if the compression error'ed out for some reason before reclaiming the space.
Running vacuum
on the hypertable will not work as it is not be propagated to the individual chunks in version 2.10.1. This is now fixed in version 2.11.0
. Upgrading to that version and running vacuum
on the hypertable should fix your issue.
Alternatively, you can continue using version 2.10.1
but you would have to run the vacuum
on all individual chunks manually.
Can you try either of these and let us know if that helps improve your situation?
Hi
Thank you for your reply
vacuum script
select 'vacuum verbose'||' '||chunk_schema||'.'||chunk_name||';' from timescaledb_information.chunks where hypertable_name='tag' order by hypertable_name,chunk_name;
?column?
-----------------------------------------------------------
vacuum verbose _timescaledb_internal._hyper_19_10_chunk;
vacuum verbose _timescaledb_internal._hyper_19_246_chunk;
vacuum verbose _timescaledb_internal._hyper_19_258_chunk;
vacuum verbose _timescaledb_internal._hyper_19_43_chunk;
vacuum verbose _timescaledb_internal._hyper_19_47_chunk;
vacuum log in ansible
Something happen INFO: vacuuming "_timescaledb_internal._hyper_19_10_chunk"
INFO: table "_hyper_19_10_chunk": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2242365
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: vacuuming "pg_toast.pg_toast_18292"
INFO: table "pg_toast_18292": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2242365
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: vacuuming "_timescaledb_internal._hyper_19_246_chunk"
INFO: launched 2 parallel vacuum workers for index cleanup (planned: 2)
INFO: table "_hyper_19_246_chunk": found 0 removable, 227775 nonremovable row versions in 9176 out of 53563 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2242365
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.17 s, system: 0.14 s, elapsed: 6.33 s.
INFO: vacuuming "pg_toast.pg_toast_69237"
INFO: table "pg_toast_69237": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: vacuuming "_timescaledb_internal._hyper_19_258_chunk"
INFO: launched 2 parallel vacuum workers for index cleanup (planned: 2)
INFO: table "_hyper_19_258_chunk": found 0 removable, 251045 nonremovable row versions in 10068 out of 73831 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.08 s, system: 0.08 s, elapsed: 0.18 s.
INFO: vacuuming "pg_toast.pg_toast_75680"
INFO: table "pg_toast_75680": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: vacuuming "_timescaledb_internal._hyper_19_43_chunk"
INFO: table "_hyper_19_43_chunk": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: vacuuming "pg_toast.pg_toast_21713"
INFO: table "pg_toast_21713": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: vacuuming "_timescaledb_internal._hyper_19_47_chunk"
INFO: scanned index "47_38_events_monitoring_netitag_pkey" to remove 62 row versions
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.03 s
INFO: scanned index "_hyper_19_47_chunk_events_monitoring_netitag_date_idx" to remove 62 row versions
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: scanned index "_hyper_19_47_chunk_index_events_monitoring_netitag_date" to remove 62 row versions
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: table "_hyper_19_47_chunk": removed 62 dead item identifiers in 4 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: index "47_38_events_monitoring_netitag_pkey" now contains 0 row versions in 2 pages
DETAIL: 62 index row versions were removed.
0 index pages were newly deleted.
0 index pages are currently deleted, of which 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "_hyper_19_47_chunk_events_monitoring_netitag_date_idx" now contains 0 row versions in 2 pages
DETAIL: 62 index row versions were removed.
0 index pages were newly deleted.
0 index pages are currently deleted, of which 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "_hyper_19_47_chunk_index_events_monitoring_netitag_date" now contains 0 row versions in 2 pages
DETAIL: 62 index row versions were removed.
0 index pages were newly deleted.
0 index pages are currently deleted, of which 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: table "_hyper_19_47_chunk": found 62 removable, 0 nonremovable row versions in 4 out of 4 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2242373
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.03 s.
INFO: table "_hyper_19_47_chunk": truncated 4 to 0 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.05 s
INFO: vacuuming "pg_toast.pg_toast_27725"
INFO: table "pg_toast_27725": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2242375
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
Compared the database size, no released, should we run vacuum full or vacuum is okay please?
Hi @yinan8128,
You can try VACUUM FULL
but please note that it requires an ACCESS EXCLUSIVE lock on the table, and therefore cannot be done in parallel with any other READ/WRITE operations on the table.
Hi
Thank you for your reply
- We checked timescaledb_information.job_errors, there are no error log of 2023-06-30, but still some compression problem happened in 2023-06-30, is it possible that timescaledb_information.job_errors not record all compression job error?
- What is the reason caused compression failure and how could we check the failure please? Thanks
Hi @yinan8128,
Any error in the jobs will be recorded in job_errors table. So, if you don't see that there, it is highly unlikely that the compression job failed.
But as your database size increased after compression, the most likely explanation is that the old uncompressed rows have not been vacuumed yet. It could be the case that vacuum has not freed up those rows yet due to some long running transaction. You can follow this - https://emmer.dev/blog/finding-long-running-queries-in-postgresql/ - to check if you have any of those. If you did have such a transaction running, cancel that transaction and then retry vacuum again.
As for why this has happened, it could be that you have hit a bug. But without an error log or an entry in job_errors, this might be harder to investigate.
Hi @yinan8128 It would of much help if we know what errors are occurring as part of compression policy getting triggered in the background. When a compression policy fails, we do not reclaim the space occupied as part of compression. Might be the case that compression policy failed multiple times and we ended up occupying more space. However this is just a guess, we need the exact errors about what is happening with the compression policy job.
Any chance to re-run your setup on timescaledb 2.11 and see if the issue is resolved.
Hi
Thank you for the update
We already disabled the compression in the production env, let's check another production env which have problem and feedback to you
Hi
In our one env, there is database size increase problem after compression, retention reduce some storage size, but storage size usage trend is increasing, sooner or later will hit the limit
--but there is no records in timescaledb_information.job_errors select * from timescaledb_information.job_errors; job_id | proc_schema | proc_name | pid | start_time | finish_time | sqlerrcode | err_message --------+-------------+-----------+-----+------------+-------------+------------+------------- (0 rows)
Any suggestion for the next step please? Thanks
Hello @yinan8128,
Some more questions:
- Is it only the compression policy that fails to run? Or do you have other background jobs/policies which are not running? I’m asking this because it is possible for the background job scheduler to exit and no jobs would run because of that, until it is restarted (although it seems from the graph that the retention policy is running normally).
- Have you tried manually compressing your chunks to see if you get any errors then? Does compression complete normally in that case and is the space consumption reduced?
Hello
There are retention jobs and compression jobs, there is no timescaledb_information.job_errors records which means no job failure?
From the attached image can see that retention worked(database size reduced 4 times), compression caused database size increased 7 times
@yinan8128 Note that you need to upgrade to 2.11 to allow a VACUUM on the hypertable to release disk space. Your earlier vacuum script won't work because the informational view you used doesn't show the "internal" compressed chunks. You need to query the _timescaledb_catalog.chunk
table to get all chunks.
Hi
We found that manual compression also can make chunk and table size increased, added reproduce steps to the ticket, https://github.com/timescale/timescaledb/issues/5961 could you please help check?
Hi @yinan8128 ,
Replied to questions asked as part of issue #5961. Please let us know if you have any further questions.
Dear Author,
This issue has been automatically marked as stale due to lack of activity. With only the issue description that is currently provided, we do not have enough information to take action. If you have or find the answers we would need, please reach out. Otherwise, this issue will be closed in 30 days.
Thank you!
How about the relationship of compression rate, query efficiency and data? For example 10000 records:
- 1000 unique values, each value 10 records
- 100 unique values, each value 100 records
- 10 unique values, each value 1000 records
Which cases can get good compression rate and query efficieny please? Will case 3 get the best compression rate but worst query efficiency as query need to decompress compressed chunk firstly please? Thanks
Hi,
I believe this was answered in your other issue: https://github.com/timescale/timescaledb/issues/5961#issuecomment-1833194509
Thanks.