timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

SkipScan for compressed chunks

Open svenklemm opened this issue 3 years ago • 7 comments

The initial SkipScan implementation only works for uncompressed chunks but for compressed chunks a similar optimization could be implemented.

  • Coordinate with Partially mutable compressed chunks #853 to verify if nothing breaks
### Tasks
- [ ] Design Doc: https://timescale.slab.com/posts/skip-scan-enhancement-p4g1blzo
- [ ] Introduce CompressedSkipScan Node (https://github.com/timescale/timescaledb/pull/5637)
- [ ] Algorithm to reduce batch decompression to single tuple decompression in CompressedSkipScan

The simplified design memo: https://docs.google.com/document/d/1JFF8UGqpdBEhdnrsJ9ngKREmDzVft5SchzjOoao2NoA/edit#heading=h.zez1r9vckjjp

svenklemm avatar Mar 29 '21 17:03 svenklemm

It would be great to get this feature implemented.

wrobell avatar Oct 06 '21 16:10 wrobell

Is this actually related to #2897?

andnig avatar Dec 13 '21 13:12 andnig

This is not related to #2897.

svenklemm avatar Dec 13 '21 20:12 svenklemm

Sharing the content of a Stack Overflow report here as @davidkohn88 suggested it may be related to this report.


I have got a hypertable (around 300 millions rows) with the following schema

CREATE TABLE IF NOT EXISTS candlesticks(
 open_time TIMESTAMP NOT NULL,
 close_time TIMESTAMP NOT NULL,
 open   DOUBLE PRECISION,
 high   DOUBLE PRECISION,
 low    DOUBLE PRECISION,
 close   DOUBLE PRECISION,
 volume      DOUBLE PRECISION,
 quote_volume      DOUBLE PRECISION,
 symbol VARCHAR (20) NOT NULL,
 exchange VARCHAR (256),
 PRIMARY KEY (symbol, open_time, exchange)
);

After compressing it with this query

ALTER TABLE candlesticks SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'symbol, exchange'
);

The following query takes several minutes (it seems timescale decompress every chunk) whereas before it was only 1/2 seconds :

SELECT DISTINCT ON (symbol) * FROM candlesticks 
ORDER BY symbol, open_time DESC;

If i add a time conditions like open_time >= now() - INTERVAL '5 minutes' it's better


greenweeds avatar Oct 07 '22 10:10 greenweeds

Current scope of SkipScan :

Hypertable Query#1: explain select distinct sensor_id from sensor_data;

                                                                              QUERY PLAN                                                            
                   
----------------------------------------------------------------------------------------------------------------------------------------------------
-------------------
 Unique  (cost=6.23..11859.26 rows=200 width=4)
   ->  Merge Append  (cost=6.23..10901.76 rows=383000 width=4)
         Sort Key: _hyper_10_25_chunk.sensor_id
         ->  Custom Scan (SkipScan) on _hyper_10_25_chunk  (cost=0.29..66.01 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_25_chunk_sensor_data_sensor_id_time_idx on _hyper_10_25_chunk  (cost=0.29..1694.95 rows=42311 wid
th=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_26_chunk  (cost=0.29..65.44 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_26_chunk_sensor_data_sensor_id_time_idx on _hyper_10_26_chunk  (cost=0.29..1170.59 rows=29487 wid
th=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_27_chunk  (cost=0.29..65.43 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_27_chunk_sensor_data_sensor_id_time_idx on _hyper_10_27_chunk  (cost=0.29..1016.93 rows=25643 wid
th=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_28_chunk  (cost=0.29..65.42 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_28_chunk_sensor_data_sensor_id_time_idx on _hyper_10_28_chunk  (cost=0.29..1217.82 rows=30769 wid
th=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_29_chunk  (cost=0.42..93.49 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_29_chunk_sensor_data_sensor_id_time_idx on _hyper_10_29_chunk  (cost=0.42..28243.61 rows=665279 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_30_chunk  (cost=0.42..92.99 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_30_chunk_sensor_data_sensor_id_time_idx on _hyper_10_30_chunk  (cost=0.42..17120.39 rows=403198 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_31_chunk  (cost=0.42..92.98 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_31_chunk_sensor_data_sensor_id_time_idx on _hyper_10_31_chunk  (cost=0.42..19651.68 rows=463684 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_32_chunk  (cost=0.42..92.98 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_32_chunk_sensor_data_sensor_id_time_idx on _hyper_10_32_chunk  (cost=0.42..20522.07 rows=483843 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_33_chunk  (cost=0.42..93.49 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_33_chunk_sensor_data_sensor_id_time_idx on _hyper_10_33_chunk  (cost=0.42..28243.61 rows=665279 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_34_chunk  (cost=0.42..92.98 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_34_chunk_sensor_data_sensor_id_time_idx on _hyper_10_34_chunk  (cost=0.42..19667.59 rows=463678 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_35_chunk  (cost=0.42..92.98 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_35_chunk_sensor_data_sensor_id_time_idx on _hyper_10_35_chunk  (cost=0.42..20522.07 rows=483843 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_36_chunk  (cost=0.42..92.99 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_36_chunk_sensor_data_sensor_id_time_idx on _hyper_10_36_chunk  (cost=0.42..17108.42 rows=403200 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_37_chunk  (cost=0.42..93.46 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_37_chunk_sensor_data_sensor_id_time_idx on _hyper_10_37_chunk  (cost=0.42..26349.98 rows=623037 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_38_chunk  (cost=0.42..92.97 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_38_chunk_sensor_data_sensor_id_time_idx on _hyper_10_38_chunk  (cost=0.42..18397.98 rows=434237 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_39_chunk  (cost=0.42..92.98 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_39_chunk_sensor_data_sensor_id_time_idx on _hyper_10_39_chunk  (cost=0.42..19205.12 rows=453113 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (DecompressChunk) on _hyper_10_40_chunk  (cost=0.10..39.85 rows=380000 width=4)
               ->  Index Scan using compress_hyper_13_41_chunk__compressed_hypertable_13_sensor_id_ on compress_hyper_13_41_chunk  (cost=0.27..39.85
 rows=380 width=8)

Uncompressed chunk Query#2: explain select distinct sensor_id from _timescaledb_internal._hyper_10_25_chunk;

                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.29..10.97 rows=33 width=4)
   ->  Custom Scan (SkipScan) on _hyper_10_25_chunk  (cost=0.29..10.89 rows=33 width=4)
         ->  Index Only Scan using _hyper_10_25_chunk_sensor_data_sensor_id_time_idx on _hyper_10_25_chunk  (cost=0.29..1694.95 rows=42311 width=4)
               Index Cond: (sensor_id > NULL::integer)

Compressed chunk Query#3: explain select distinct sensor_id from _timescaledb_internal._hyper_10_40_chunk;

                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 HashAggregate  (cost=960.80..961.00 rows=20 width=4)
   Group Key: _hyper_10_40_chunk.sensor_id
   ->  Custom Scan (DecompressChunk) on _hyper_10_40_chunk  (cost=0.03..10.80 rows=380000 width=4)
         ->  Seq Scan on compress_hyper_13_41_chunk  (cost=0.00..10.80 rows=380 width=8)

Compressed Internal Chunk Query#4: explain select distinct sensor_id from _timescaledb_internal.compress_hyper_13_41_chunk

                                                                              QUERY PLAN                                                            
                  
----------------------------------------------------------------------------------------------------------------------------------------------------
------------------
 Unique  (cost=0.27..7.60 rows=20 width=4)
   ->  Custom Scan (SkipScan) on compress_hyper_13_41_chunk  (cost=0.27..7.55 rows=20 width=4)
         ->  Index Only Scan using compress_hyper_13_41_chunk__compressed_hypertable_13_sensor_id_ on compress_hyper_13_41_chunk  (cost=0.27..39.85 
rows=380 width=4)
               Index Cond: (sensor_id > NULL::integer)

shhnwz avatar Mar 13 '23 07:03 shhnwz

Implementation Details:

Planner Changes:

  • [x] Transform CustomScan(DecompressChunk)->IndexScan/IndexOnlyScan INTO CustomScan(DecompressChunk)->CustomScan(SkipScan)->IndexScan/IndexOnlyScan.

  • [x] Modify get_distinct_var() to map var for compressed chunk

shhnwz avatar May 03 '23 07:05 shhnwz

What is the status on this issue? This is something that would be great for us!

patstrom avatar Mar 05 '24 12:03 patstrom