timescaledb
timescaledb copied to clipboard
SkipScan for compressed chunks
The initial SkipScan implementation only works for uncompressed chunks but for compressed chunks a similar optimization could be implemented.
- Coordinate with Partially mutable compressed chunks #853 to verify if nothing breaks
### Tasks
- [ ] Design Doc: https://timescale.slab.com/posts/skip-scan-enhancement-p4g1blzo
- [ ] Introduce CompressedSkipScan Node (https://github.com/timescale/timescaledb/pull/5637)
- [ ] Algorithm to reduce batch decompression to single tuple decompression in CompressedSkipScan
The simplified design memo: https://docs.google.com/document/d/1JFF8UGqpdBEhdnrsJ9ngKREmDzVft5SchzjOoao2NoA/edit#heading=h.zez1r9vckjjp
It would be great to get this feature implemented.
Is this actually related to #2897?
This is not related to #2897.
Sharing the content of a Stack Overflow report here as @davidkohn88 suggested it may be related to this report.
I have got a hypertable (around 300 millions rows) with the following schema
CREATE TABLE IF NOT EXISTS candlesticks(
open_time TIMESTAMP NOT NULL,
close_time TIMESTAMP NOT NULL,
open DOUBLE PRECISION,
high DOUBLE PRECISION,
low DOUBLE PRECISION,
close DOUBLE PRECISION,
volume DOUBLE PRECISION,
quote_volume DOUBLE PRECISION,
symbol VARCHAR (20) NOT NULL,
exchange VARCHAR (256),
PRIMARY KEY (symbol, open_time, exchange)
);
After compressing it with this query
ALTER TABLE candlesticks SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'symbol, exchange'
);
The following query takes several minutes (it seems timescale decompress every chunk) whereas before it was only 1/2 seconds :
SELECT DISTINCT ON (symbol) * FROM candlesticks
ORDER BY symbol, open_time DESC;
If i add a time conditions like open_time >= now() - INTERVAL '5 minutes' it's better
Current scope of SkipScan :
Hypertable Query#1: explain select distinct sensor_id from sensor_data;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
-------------------
Unique (cost=6.23..11859.26 rows=200 width=4)
-> Merge Append (cost=6.23..10901.76 rows=383000 width=4)
Sort Key: _hyper_10_25_chunk.sensor_id
-> Custom Scan (SkipScan) on _hyper_10_25_chunk (cost=0.29..66.01 rows=200 width=4)
-> Index Only Scan using _hyper_10_25_chunk_sensor_data_sensor_id_time_idx on _hyper_10_25_chunk (cost=0.29..1694.95 rows=42311 wid
th=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_26_chunk (cost=0.29..65.44 rows=200 width=4)
-> Index Only Scan using _hyper_10_26_chunk_sensor_data_sensor_id_time_idx on _hyper_10_26_chunk (cost=0.29..1170.59 rows=29487 wid
th=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_27_chunk (cost=0.29..65.43 rows=200 width=4)
-> Index Only Scan using _hyper_10_27_chunk_sensor_data_sensor_id_time_idx on _hyper_10_27_chunk (cost=0.29..1016.93 rows=25643 wid
th=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_28_chunk (cost=0.29..65.42 rows=200 width=4)
-> Index Only Scan using _hyper_10_28_chunk_sensor_data_sensor_id_time_idx on _hyper_10_28_chunk (cost=0.29..1217.82 rows=30769 wid
th=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_29_chunk (cost=0.42..93.49 rows=200 width=4)
-> Index Only Scan using _hyper_10_29_chunk_sensor_data_sensor_id_time_idx on _hyper_10_29_chunk (cost=0.42..28243.61 rows=665279 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_30_chunk (cost=0.42..92.99 rows=200 width=4)
-> Index Only Scan using _hyper_10_30_chunk_sensor_data_sensor_id_time_idx on _hyper_10_30_chunk (cost=0.42..17120.39 rows=403198 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_31_chunk (cost=0.42..92.98 rows=200 width=4)
-> Index Only Scan using _hyper_10_31_chunk_sensor_data_sensor_id_time_idx on _hyper_10_31_chunk (cost=0.42..19651.68 rows=463684 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_32_chunk (cost=0.42..92.98 rows=200 width=4)
-> Index Only Scan using _hyper_10_32_chunk_sensor_data_sensor_id_time_idx on _hyper_10_32_chunk (cost=0.42..20522.07 rows=483843 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_33_chunk (cost=0.42..93.49 rows=200 width=4)
-> Index Only Scan using _hyper_10_33_chunk_sensor_data_sensor_id_time_idx on _hyper_10_33_chunk (cost=0.42..28243.61 rows=665279 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_34_chunk (cost=0.42..92.98 rows=200 width=4)
-> Index Only Scan using _hyper_10_34_chunk_sensor_data_sensor_id_time_idx on _hyper_10_34_chunk (cost=0.42..19667.59 rows=463678 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_35_chunk (cost=0.42..92.98 rows=200 width=4)
-> Index Only Scan using _hyper_10_35_chunk_sensor_data_sensor_id_time_idx on _hyper_10_35_chunk (cost=0.42..20522.07 rows=483843 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_36_chunk (cost=0.42..92.99 rows=200 width=4)
-> Index Only Scan using _hyper_10_36_chunk_sensor_data_sensor_id_time_idx on _hyper_10_36_chunk (cost=0.42..17108.42 rows=403200 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_37_chunk (cost=0.42..93.46 rows=200 width=4)
-> Index Only Scan using _hyper_10_37_chunk_sensor_data_sensor_id_time_idx on _hyper_10_37_chunk (cost=0.42..26349.98 rows=623037 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_38_chunk (cost=0.42..92.97 rows=200 width=4)
-> Index Only Scan using _hyper_10_38_chunk_sensor_data_sensor_id_time_idx on _hyper_10_38_chunk (cost=0.42..18397.98 rows=434237 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_39_chunk (cost=0.42..92.98 rows=200 width=4)
-> Index Only Scan using _hyper_10_39_chunk_sensor_data_sensor_id_time_idx on _hyper_10_39_chunk (cost=0.42..19205.12 rows=453113 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (DecompressChunk) on _hyper_10_40_chunk (cost=0.10..39.85 rows=380000 width=4)
-> Index Scan using compress_hyper_13_41_chunk__compressed_hypertable_13_sensor_id_ on compress_hyper_13_41_chunk (cost=0.27..39.85
rows=380 width=8)
Uncompressed chunk Query#2: explain select distinct sensor_id from _timescaledb_internal._hyper_10_25_chunk;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.29..10.97 rows=33 width=4)
-> Custom Scan (SkipScan) on _hyper_10_25_chunk (cost=0.29..10.89 rows=33 width=4)
-> Index Only Scan using _hyper_10_25_chunk_sensor_data_sensor_id_time_idx on _hyper_10_25_chunk (cost=0.29..1694.95 rows=42311 width=4)
Index Cond: (sensor_id > NULL::integer)
Compressed chunk Query#3: explain select distinct sensor_id from _timescaledb_internal._hyper_10_40_chunk;
QUERY PLAN
---------------------------------------------------------------------------------------------------
HashAggregate (cost=960.80..961.00 rows=20 width=4)
Group Key: _hyper_10_40_chunk.sensor_id
-> Custom Scan (DecompressChunk) on _hyper_10_40_chunk (cost=0.03..10.80 rows=380000 width=4)
-> Seq Scan on compress_hyper_13_41_chunk (cost=0.00..10.80 rows=380 width=8)
Compressed Internal Chunk Query#4: explain select distinct sensor_id from _timescaledb_internal.compress_hyper_13_41_chunk
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
------------------
Unique (cost=0.27..7.60 rows=20 width=4)
-> Custom Scan (SkipScan) on compress_hyper_13_41_chunk (cost=0.27..7.55 rows=20 width=4)
-> Index Only Scan using compress_hyper_13_41_chunk__compressed_hypertable_13_sensor_id_ on compress_hyper_13_41_chunk (cost=0.27..39.85
rows=380 width=4)
Index Cond: (sensor_id > NULL::integer)
Implementation Details:
Planner Changes:
-
[x] Transform CustomScan(DecompressChunk)->IndexScan/IndexOnlyScan INTO CustomScan(DecompressChunk)->CustomScan(SkipScan)->IndexScan/IndexOnlyScan.
-
[x] Modify get_distinct_var() to map var for compressed chunk
What is the status on this issue? This is something that would be great for us!