timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

[Bug]: Fetching latest row by time on a hypertable - that is fully compressed (configured using segment_by) - results in decompressing all chunks

Open hardikm10 opened this issue 2 years ago • 3 comments

What type of bug is this?

Performance issue

What subsystems and features are affected?

Compression

What happened?

If a hypertable is fully compressed (i.e all chunks are compressed), I observed that a query to fetch the latest row like : SELECT * FROM sensor_data ORDER BY time DESC LIMIT 1, takes very long to execute. (Hypertable does have an index on (time DESC) ) (Compression configured using segmentby, orderby )

Looking at the plan, it seems to be decompressing all the chunks, which is unexpected.

Interesting thing to note, when I decompressed a chunk ( picked randomly ), the query only decompressed the recent chunk which is expected, and also used the ChunkAppend optimisation. See the plan here.

Looking further, If only used 'compress_orderby' while configuring compression ( example : timescaledb.compress_orderby = 'time DESC'), the query

TimescaleDB version affected

2.6.0

PostgreSQL version used

12.10

What operating system did you use?

MST

What installation method did you use?

Other

What platform did you run on?

Managed Service for TimescaleDB (MST/Aiven)

Relevant log output and stack trace

tsdb=> \i sensor_data_fetch_latest_row_by_time.sql
explain analyze select * from sensor_data order by time desc limit 1;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5752.72..5752.83 rows=1 width=28) (actual time=56.630..90.022 rows=1 loops=1)
   ->  Gather Merge  (cost=5752.72..60032.72 rows=472000 width=28) (actual time=56.629..90.020 rows=1 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Sort  (cost=4752.71..5932.71 rows=472000 width=28) (actual time=51.721..51.725 rows=1 loops=2)
               Sort Key: _hyper_20_154_chunk."time" DESC
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               ->  Parallel Append  (cost=0.08..2392.71 rows=472000 width=28) (actual time=0.073..31.133 rows=172850 loops=2)
                     ->  Custom Scan (DecompressChunk) on _hyper_20_154_chunk  (cost=0.08..9.18 rows=118000 width=28) (actual time=0.117..12.184 rows=100800 loops=1)
                           ->  Parallel Seq Scan on compress_hyper_21_159_chunk  (cost=0.00..9.18 rows=118 width=120) (actual time=0.009..0.096 rows=200 loops=1)
                     ->  Custom Scan (DecompressChunk) on _hyper_20_153_chunk  (cost=0.08..9.18 rows=118000 width=28) (actual time=0.038..6.218 rows=50400 loops=2)
                           ->  Parallel Seq Scan on compress_hyper_21_158_chunk  (cost=0.00..9.18 rows=118 width=120) (actual time=0.006..0.041 rows=100 loops=2)
                     ->  Custom Scan (DecompressChunk) on _hyper_20_152_chunk  (cost=0.08..9.18 rows=118000 width=28) (actual time=0.064..12.610 rows=100800 loops=1)
                           ->  Parallel Seq Scan on compress_hyper_21_161_chunk  (cost=0.00..9.18 rows=118 width=120) (actual time=0.004..0.066 rows=200 loops=1)
                     ->  Custom Scan (DecompressChunk) on _hyper_20_155_chunk  (cost=0.04..2.59 rows=59000 width=28) (actual time=0.018..2.566 rows=17100 loops=1)
                           ->  Parallel Seq Scan on compress_hyper_21_160_chunk  (cost=0.00..2.59 rows=59 width=120) (actual time=0.004..0.021 rows=100 loops=1)
                     ->  Custom Scan (DecompressChunk) on _hyper_20_151_chunk  (cost=0.04..2.59 rows=59000 width=28) (actual time=0.027..3.729 rows=26200 loops=1)
                           ->  Parallel Seq Scan on compress_hyper_21_156_chunk  (cost=0.00..2.59 rows=59 width=120) (actual time=0.006..0.033 rows=100 loops=1)
 Planning Time: 1.076 ms
 Execution Time: 90.089 ms
(21 rows)


### How can we reproduce the bug?

```bash
Creating hypertable and filling with some random data:


drop table sensor_data;

create table sensor_data(
time timestamptz not null,
sensor_id integer not null,
cpu double precision null,
temperature double precision null );

select from create_hypertable('sensor_data','time');

INSERT INTO sensor_data
SELECT
time + (INTERVAL '1 minute' * random()) AS time,
sensor_id,
random() AS cpu,
random()* 100 AS temperature
FROM
generate_series(now() - INTERVAL '1 months', now() - INTERVAL '1 week', INTERVAL '10 minute') AS g1(time),
generate_series(1, 100, 1 ) AS g2(sensor_id)
ORDER BY
time;

\d+ sensor_data


SELECT c.chunk_name,c.range_start,c.range_end,c.is_compressed,pg_size_pretty(d.total_bytes) FROM public.chunks_detailed_size('public.sensor_data') d, timescaledb_information.chunks c WHERE d.chunk_name=c.chunk_name order by c.range_start;

Configuring compression and adding a compression poilcy such that all chunks are compressed:

alter table sensor_data set (timescaledb.compress, timescaledb.compress_segmentby='sensor_id' ,timescaledb.compress_orderby = 'time DESC');

select add_compression_policy('sensor_data','1 minute'::INTERVAL);

Running the query that fetches the latest row:

explain analyze select * from sensor_data order by time desc limit 1;
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5752.72..5752.83 rows=1 width=28) (actual time=56.630..90.022 rows=1 loops=1)
   ->  Gather Merge  (cost=5752.72..60032.72 rows=472000 width=28) (actual time=56.629..90.020 rows=1 loops=1)
         Workers Planned: 1
         Workers Launched: 1
         ->  Sort  (cost=4752.71..5932.71 rows=472000 width=28) (actual time=51.721..51.725 rows=1 loops=2)
               Sort Key: _hyper_20_154_chunk."time" DESC
               Sort Method: top-N heapsort  Memory: 25kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 25kB
               ->  Parallel Append  (cost=0.08..2392.71 rows=472000 width=28) (actual time=0.073..31.133 rows=172850 loops=2)
                     ->  Custom Scan (DecompressChunk) on _hyper_20_154_chunk  (cost=0.08..9.18 rows=118000 width=28) (actual time=0.117..12.184 rows=100800 loops=1)
                           ->  Parallel Seq Scan on compress_hyper_21_159_chunk  (cost=0.00..9.18 rows=118 width=120) (actual time=0.009..0.096 rows=200 loops=1)
                     ->  Custom Scan (DecompressChunk) on _hyper_20_153_chunk  (cost=0.08..9.18 rows=118000 width=28) (actual time=0.038..6.218 rows=50400 loops=2)
                           ->  Parallel Seq Scan on compress_hyper_21_158_chunk  (cost=0.00..9.18 rows=118 width=120) (actual time=0.006..0.041 rows=100 loops=2)
                     ->  Custom Scan (DecompressChunk) on _hyper_20_152_chunk  (cost=0.08..9.18 rows=118000 width=28) (actual time=0.064..12.610 rows=100800 loops=1)
                           ->  Parallel Seq Scan on compress_hyper_21_161_chunk  (cost=0.00..9.18 rows=118 width=120) (actual time=0.004..0.066 rows=200 loops=1)
                     ->  Custom Scan (DecompressChunk) on _hyper_20_155_chunk  (cost=0.04..2.59 rows=59000 width=28) (actual time=0.018..2.566 rows=17100 loops=1)
                           ->  Parallel Seq Scan on compress_hyper_21_160_chunk  (cost=0.00..2.59 rows=59 width=120) (actual time=0.004..0.021 rows=100 loops=1)
                     ->  Custom Scan (DecompressChunk) on _hyper_20_151_chunk  (cost=0.04..2.59 rows=59000 width=28) (actual time=0.027..3.729 rows=26200 loops=1)
                           ->  Parallel Seq Scan on compress_hyper_21_156_chunk  (cost=0.00..2.59 rows=59 width=120) (actual time=0.006..0.033 rows=100 loops=1)
 Planning Time: 1.076 ms
 Execution Time: 90.089 ms
(21 rows)

Looking further, decompressed a chunk randomly, results in a good plan:

tsdb=> select decompress_chunk('_timescaledb_internal._hyper_20_152_chunk');
select decompress_chunk('_timescaledb_internal._hyper_20_152_chunk');
             decompress_chunk
-------------------------------------------
 _timescaledb_internal._hyper_20_152_chunk
(1 row)

tsdb=> \i sensor_data_fetch_latest_row_by_time.sql
explain analyze select * from sensor_data order by time desc limit 1;
                                                                          QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.03..0.03 rows=1 width=28) (actual time=8.848..8.854 rows=1 loops=1)
   ->  Custom Scan (ChunkAppend) on sensor_data  (cost=0.03..3.00 rows=100000 width=28) (actual time=8.847..8.852 rows=1 loops=1)
         Order: sensor_data."time" DESC
         ->  Sort  (cost=0.00..0.00 rows=0 width=0) (actual time=8.846..8.847 rows=1 loops=1)
               Sort Key: _hyper_20_155_chunk."time" DESC
               Sort Method: top-N heapsort  Memory: 25kB
               ->  Custom Scan (DecompressChunk) on _hyper_20_155_chunk  (cost=0.03..3.00 rows=100000 width=28) (actual time=0.033..4.847 rows=17100 loops=1)
                     ->  Seq Scan on compress_hyper_21_160_chunk  (cost=0.00..3.00 rows=100 width=120) (actual time=0.010..0.055 rows=100 loops=1)
         ->  Sort  (cost=0.00..0.00 rows=0 width=0) (never executed)
               Sort Key: _hyper_20_154_chunk."time" DESC
               ->  Custom Scan (DecompressChunk) on _hyper_20_154_chunk  (cost=0.05..10.00 rows=200000 width=28) (never executed)
                     ->  Seq Scan on compress_hyper_21_159_chunk  (cost=0.00..10.00 rows=200 width=120) (never executed)
         ->  Sort  (cost=0.00..0.00 rows=0 width=0) (never executed)
               Sort Key: _hyper_20_153_chunk."time" DESC
               ->  Custom Scan (DecompressChunk) on _hyper_20_153_chunk  (cost=0.05..10.00 rows=200000 width=28) (never executed)
                     ->  Seq Scan on compress_hyper_21_158_chunk  (cost=0.00..10.00 rows=200 width=120) (never executed)
         ->  Index Scan using _hyper_20_152_chunk_sensor_data_time_idx on _hyper_20_152_chunk  (cost=0.29..2560.19 rows=100800 width=28) (never executed)
         ->  Sort  (cost=0.00..0.00 rows=0 width=0) (never executed)
               Sort Key: _hyper_20_151_chunk."time" DESC
               ->  Custom Scan (DecompressChunk) on _hyper_20_151_chunk  (cost=0.03..3.00 rows=100000 width=28) (never executed)
                     ->  Seq Scan on compress_hyper_21_156_chunk  (cost=0.00..3.00 rows=100 width=120) (never executed)
 Planning Time: 1.146 ms
 Execution Time: 8.934 ms
(23 rows)

hardikm10 avatar Apr 08 '22 04:04 hardikm10

Hello!

I noticed the same performance issue when doing my own benchmarks.

I think the problem is that the query planner has no statistics about individual chunks. Sure, it knows the chunk boundaries but not if there are actually rows in a chunk. So the naive plan is to scan all.

Since time series data is mostly append-only, a significantly faster plan for 99% of the cases would be to first only scan the first chunk and only if that has no data, expand the search range exponentially to older chunks. However, I do not know if such an incremental process is possible with the PostgreSQL query planner.

A better approach would be to collect chunk-individual stats. At least basic stats such as count, min time, max time should be easily maintainable when compressing chunks. I filed an enhancement issue some time ago for this feature: https://github.com/timescale/timescaledb/issues/3292 Or maybe statistics about individual chunks can be collected with PostgreSQL's general stats collector (ANALYZE command).

maxhertrampf avatar Apr 08 '22 06:04 maxhertrampf

@hardikm10 the reason for this behaviour is your configured orderby does not match the query orderby. orderby always implicitly includes any segmentby columns. Here is a workaround query that does not require decompressing everything:

WITH sensors AS (
  SELECT generate_series(1,100) AS sensor_id
), last_reading AS (
  SELECT * FROM sensors s 
  INNER JOIN LATERAL(
    SELECT * FROM sensor_data WHERE sensor_id = s.sensor_id ORDER BY time DESC LIMIT 1
  ) sd ON true
)
SELECT * FROM last_reading ORDER BY time DESC LIMIT 1;

svenklemm avatar Sep 08 '22 09:09 svenklemm

There are 2 ways to optimize this. Number 1 is a special optimization for the limit 1 case as we can tell where to look for that immediately from metadata but this won't work for any limit > 1. The 2nd optimization that works for any limit is to do MergeAppend of every segmentby value.

svenklemm avatar Sep 20 '22 21:09 svenklemm

Thanks @svenklemm for checking it. Would the first option help for queries such as SELECT last(temperature, time) FROM sensor_data GROUP by sensor_id ?
In many cases, users will need more than 1 row. For instance, checking the last 10 events or sensor readings.

jfjoly avatar Sep 28 '22 09:09 jfjoly

Hmm so in general you want to avoid last together with GROUP BY as we currently do not optimize that and you would be much better served by using LATERAL instead. This issue is about ORDER BY time DESC LIMIT 1 which is very different from last() queries cause last is an aggregate function so the optimizations needed would be different.

svenklemm avatar Sep 29 '22 21:09 svenklemm

So there is some tiny overlap between last() and ORDER BY LIMIT because for the non-GROUP BY case we actually rewrite the query into ORDER BY time LIMIT but that optimization does not apply if you have GROUP BY clause. But I think the complaint about last() is together with GROUP BY clause not being performant.

svenklemm avatar Sep 29 '22 21:09 svenklemm

I have the same problem with PG 14.0, TimescaleDB 2.8.1 on Windows. Uncompressing one of the chunks & subsequent vacuum analyze did not help. Inclusion of segmentby column into order by also did not help. Maybe LATERAL sample above would work, but I did not have all possible values for the segmentby column beforehand, and the plan to select distinct values over entire table was also huge. My goal was just to figure out max and min values of a partitioning column which I expected to result in significant partitions running. Was a pretty frustrating experience (

fingoldo avatar Oct 18 '22 06:10 fingoldo