timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

hypertable_compression_stats reports incorrect values

Open phemmer opened this issue 3 years ago • 3 comments

Relevant system information:

  • OS: Debian 10/Buster
  • PostgreSQL version (output of postgres --version): postgres (PostgreSQL) 13.4 (Debian 13.4-1.pgdg100+1)
  • TimescaleDB version (output of \dx in psql): 2.4.1
  • Installation method: apt

Describe the bug The before_compression_*_bytes values reported by hypertable_compression_stats are incorrectly undersized.

To Reproduce Steps to reproduce the behavior:

  1. create a compressed hypertable
  2. insert a bunch of data
  3. query size with hypertable_detailed_size
  4. compress chunks
  5. query size with hypertable_compression_stats

Expected behavior The sizes from hypertable_detailed_size() and hypertable_compression_stats.before_compression_* should be approximately equal.

Actual behavior The values from hypertable_compression_stats are significantly smaller.

Screenshots

# select * from hypertable_compression_stats('telegraf.haproxy_table');
-[ RECORD 1 ]------------------+---------
total_chunks                   | 12
number_compressed_chunks       | 8
before_compression_table_bytes | 42246144
before_compression_index_bytes | 28024832
before_compression_toast_bytes | 0
before_compression_total_bytes | 70270976
after_compression_table_bytes  | 16941056
after_compression_index_bytes  | 1359872
after_compression_toast_bytes  | 65536
after_compression_total_bytes  | 18366464
node_name                      | ded4086
-[ RECORD 2 ]------------------+---------
total_chunks                   | 12
number_compressed_chunks       | 8
before_compression_table_bytes | 42196992
before_compression_index_bytes | 28041216
before_compression_toast_bytes | 0
before_compression_total_bytes | 70238208
after_compression_table_bytes  | 16941056
after_compression_index_bytes  | 1359872
after_compression_toast_bytes  | 65536
after_compression_total_bytes  | 18366464
node_name                      | ded4077

# select decompress_chunk(chunk_schema||'.'||chunk_name) from timescaledb_information.chunks where hypertable_name='haproxy_table' and is_compressed=true;
-[ RECORD 1 ]----+--------------------------------------------------
decompress_chunk | _timescaledb_internal._dist_hyper_2914_3190_chunk
-[ RECORD 2 ]----+--------------------------------------------------
decompress_chunk | _timescaledb_internal._dist_hyper_2914_3191_chunk
-[ RECORD 3 ]----+--------------------------------------------------
decompress_chunk | _timescaledb_internal._dist_hyper_2914_3246_chunk
-[ RECORD 4 ]----+--------------------------------------------------
decompress_chunk | _timescaledb_internal._dist_hyper_2914_3247_chunk
-[ RECORD 5 ]----+--------------------------------------------------
decompress_chunk | _timescaledb_internal._dist_hyper_2914_3307_chunk
-[ RECORD 6 ]----+--------------------------------------------------
decompress_chunk | _timescaledb_internal._dist_hyper_2914_3309_chunk
-[ RECORD 7 ]----+--------------------------------------------------
decompress_chunk | _timescaledb_internal._dist_hyper_2914_3371_chunk
-[ RECORD 8 ]----+--------------------------------------------------
decompress_chunk | _timescaledb_internal._dist_hyper_2914_3376_chunk

# select * from hypertable_detailed_size('telegraf.haproxy_table');
-[ RECORD 1 ]---------
table_bytes | 59916288
index_bytes | 31997952
toast_bytes | 0
total_bytes | 91914240
node_name   | ded4077
-[ RECORD 2 ]---------
table_bytes | 59891712
index_bytes | 32030720
toast_bytes | 0
total_bytes | 91922432
node_name   | ded4086
-[ RECORD 3 ]---------
table_bytes | 0
index_bytes | 16384
toast_bytes | 0
total_bytes | 16384
node_name   | 

# select compress_chunk(chunk_schema||'.'||chunk_name) from timescaledb_information.chunks where hypertable_name='haproxy_table' and range_end < now() - interval '1h';
-[ RECORD 1 ]--+--------------------------------------------------
compress_chunk | _timescaledb_internal._dist_hyper_2914_3190_chunk
-[ RECORD 2 ]--+--------------------------------------------------
compress_chunk | _timescaledb_internal._dist_hyper_2914_3191_chunk
-[ RECORD 3 ]--+--------------------------------------------------
compress_chunk | _timescaledb_internal._dist_hyper_2914_3246_chunk
-[ RECORD 4 ]--+--------------------------------------------------
compress_chunk | _timescaledb_internal._dist_hyper_2914_3247_chunk
-[ RECORD 5 ]--+--------------------------------------------------
compress_chunk | _timescaledb_internal._dist_hyper_2914_3307_chunk
-[ RECORD 6 ]--+--------------------------------------------------
compress_chunk | _timescaledb_internal._dist_hyper_2914_3309_chunk
-[ RECORD 7 ]--+--------------------------------------------------
compress_chunk | _timescaledb_internal._dist_hyper_2914_3371_chunk
-[ RECORD 8 ]--+--------------------------------------------------
compress_chunk | _timescaledb_internal._dist_hyper_2914_3376_chunk

# select * from hypertable_compression_stats('telegraf.haproxy_table');
-[ RECORD 1 ]------------------+---------
total_chunks                   | 12
number_compressed_chunks       | 8
before_compression_table_bytes | 42106880
before_compression_index_bytes | 19947520
before_compression_toast_bytes | 0
before_compression_total_bytes | 62054400
after_compression_table_bytes  | 16941056
after_compression_index_bytes  | 1359872
after_compression_toast_bytes  | 65536
after_compression_total_bytes  | 18366464
node_name                      | ded4086
-[ RECORD 2 ]------------------+---------
total_chunks                   | 12
number_compressed_chunks       | 8
before_compression_table_bytes | 42106880
before_compression_index_bytes | 19947520
before_compression_toast_bytes | 0
before_compression_total_bytes | 62054400
after_compression_table_bytes  | 16941056
after_compression_index_bytes  | 1359872
after_compression_toast_bytes  | 65536
after_compression_total_bytes  | 18366464
node_name                      | ded4077

^ Note that the values aren't even consistent. The value of before_compression_*_bytes before/after decompression & compression differ significantly. And neither of them is anywhere near the values of hypertable_detailed_size while decompressed.

Additional context I suppose it's also possible that hypertable_detailed_size is wrong. Bug given the inconsistency of hypertable_compression_stats, I feel it's more likely the one at fault.

I have another table where I performed this test on, and before decompression before_compression_total_bytes was reported at 5.7gb. But after decompressing everything, it was 13gb. This is a major discrepancy.

phemmer avatar Sep 14 '21 12:09 phemmer

@phemmer whenever a compress_chunk operation is executed, we record the number of bytes before/after compress_chunk is executed. The hyeprtable_compression_stats function reads from this static table and helps you determine the compression ratio.

The hyeprtable_detailed_size computes the values dynamically and reports the actual storage at the current point in time - a) this includes any associated storage with the original chunk (that was compressed, we continue to maintain this metadata table) and b) any new inserts that were directed into the compressed chunks

gayyappan avatar Sep 16 '21 16:09 gayyappan

I'm not doing any inserts into compressed chunks.

phemmer avatar Sep 16 '21 16:09 phemmer

Hi @phemmer, I tried to reproduce on TimescaleDB 2.9.0 with PG14 and could not see major difference in before_compression* numbers. Please see below the details. Let me know if iam missing anything.

tsdb=# 
tsdb=# CREATE TABLE IF NOT EXISTS metrics (
tsdb(#   "time" timestamp WITH time zone NOT NULL,
tsdb(#   device_id int NOT NULL,
tsdb(#   reading float NOT NULL
tsdb(# );
CREATE TABLE
tsdb=# 
tsdb=# SELECT create_hypertable('metrics', 'time', chunk_time_interval => INTERVAL '10 day');
   create_hypertable   
-----------------------
 (75,public,metrics,t)
(1 row)

tsdb=# 
tsdb=# INSERT INTO
tsdb-#   metrics
tsdb-# SELECT 
tsdb-#   time, device_id, random()*100 as cpu_usage 
tsdb-# FROM 
tsdb-#   generate_series(
tsdb(#     now() - INTERVAL '20 days',
tsdb(#     now(),
tsdb(#     INTERVAL '1 minute'
tsdb(#   ) as time, 
tsdb-#   generate_series(1,10) device_id
tsdb-# ;
INSERT 0 288010
tsdb=# 
tsdb=# ALTER TABLE metrics SET (timescaledb.compress, timescaledb.compress_segmentby = 'device_id');
ALTER TABLE
tsdb=# 
tsdb=# 
tsdb=# \x
Expanded display is on.
tsdb=# select * from hypertable_detailed_size('metrics');
-[ RECORD 1 ]---------
table_bytes | 15147008
index_bytes | 3538944
toast_bytes | 0
total_bytes | 18685952
node_name   | 

tsdb=# select compress_chunk(c) from show_chunks('metrics') c;
-[ RECORD 1 ]--+--------------------------------------------
compress_chunk | _timescaledb_internal._hyper_75_12598_chunk
-[ RECORD 2 ]--+--------------------------------------------
compress_chunk | _timescaledb_internal._hyper_75_12599_chunk
-[ RECORD 3 ]--+--------------------------------------------
compress_chunk | _timescaledb_internal._hyper_75_12600_chunk

tsdb=# select * from hypertable_compression_stats('metrics');
-[ RECORD 1 ]------------------+---------
total_chunks                   | 3
number_compressed_chunks       | 3
before_compression_table_bytes | 15147008
before_compression_index_bytes | 3530752
before_compression_toast_bytes | 0
before_compression_total_bytes | 18677760
after_compression_table_bytes  | 122880
after_compression_index_bytes  | 49152
after_compression_toast_bytes  | 2523136
after_compression_total_bytes  | 2695168
node_name                      | 

tsdb=# select decompress_chunk(c) from show_chunks('metrics') c;
-[ RECORD 1 ]----+--------------------------------------------
decompress_chunk | _timescaledb_internal._hyper_75_12598_chunk
-[ RECORD 2 ]----+--------------------------------------------
decompress_chunk | _timescaledb_internal._hyper_75_12599_chunk
-[ RECORD 3 ]----+--------------------------------------------
decompress_chunk | _timescaledb_internal._hyper_75_12600_chunk

tsdb=# select * from hypertable_detailed_size('metrics');
-[ RECORD 1 ]---------
table_bytes | 15122432
index_bytes | 2785280
toast_bytes | 0
total_bytes | 17907712
node_name   | 

tsdb=# select compress_chunk(c) from show_chunks('metrics') c;
-[ RECORD 1 ]--+--------------------------------------------
compress_chunk | _timescaledb_internal._hyper_75_12598_chunk
-[ RECORD 2 ]--+--------------------------------------------
compress_chunk | _timescaledb_internal._hyper_75_12599_chunk
-[ RECORD 3 ]--+--------------------------------------------
compress_chunk | _timescaledb_internal._hyper_75_12600_chunk

tsdb=# select * from hypertable_compression_stats('metrics');
-[ RECORD 1 ]------------------+---------
total_chunks                   | 3
number_compressed_chunks       | 3
before_compression_table_bytes | 15122432
before_compression_index_bytes | 2777088
before_compression_toast_bytes | 0
before_compression_total_bytes | 17899520
after_compression_table_bytes  | 122880
after_compression_index_bytes  | 49152
after_compression_toast_bytes  | 2523136
after_compression_total_bytes  | 2695168
node_name                      | 

sb230132 avatar Sep 19 '22 05:09 sb230132

Hi @phemmer, A gentle reminder to let me know if the issue still exists. Else i would like to close the issue.

sb230132 avatar Sep 26 '22 14:09 sb230132