timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

Allow secondary indexes on compressed chunks

Open stoxx opened this issue 4 years ago • 10 comments

Currently, compressed chunks don't have any of the "regular" indexes defined on the hypertable, but they have compession group keys indexed with btree indexes. If i need to find data based on other columns, out of luck.

My suggestion is to allow secondary indexes on compressed chunks, that would operate not unlike bitmap scan indexes on regular tables.

If we skip btree index at first.. A BRIN index is a range index; it stores only min/max of key for each block range. If data is well-localized, lookups are quite fast. Looking at access plans, this is not dissimilar to how time dimension is currently filtered when scanning a compressed table, there are min and max values on each "row"group that are compared before decompression.

A similar range index, as well as GIN indexes for discrete data, R-tree cube indexes for multidimensional data, and quite possibly some other types, could work very well for compressed tables. One can think a compressed segment a bit similarly as a block in a regular table - you need to know whether to read it at all, you don't need to know if you'll find a hit there or on which row.

My specific use case is position data, where each device periodically sends time and multiple dimensions of coordinates. The coordinates are the measure i wish to compress - sensor IDs and timestamp being my clustering keys. If i had range indexes on all coordinates, a multi-index bitmap scan (or a single rtree box scan) could find results in a certain region and timespan. As it is now, i will most likely have to create an aggregate table that holds lists of time*spatial bouding boxes in an indexable structure.

stoxx avatar Sep 18 '20 17:09 stoxx

Why does compressed chunks not have any of the "regular" indexes defined on the hypertable currently? Can we restore the index of the chunk after decompression?

tclh123 avatar Oct 21 '20 17:10 tclh123

What's the timeline on this feature? It's highly requested.

kareblak avatar Jun 04 '21 12:06 kareblak

Related #2291

gayyappan avatar Nov 19 '21 18:11 gayyappan

Bump

benneharli avatar Nov 05 '22 00:11 benneharli

Another suggestion/question (instead of opening new related issue): Is there any technical reason why covering indexes are ignored too?

As almost everyone using timescaledb, we also have quite some load on our DB. Sometimes we create covering indexes (or use included fields to fully cover a query) on store it on a different storage backend. This lowers the impact on the actual tables and on the main storage. Since such indexes do not require any decompression, I think they are still useful and I would be nice if they could be used.

KevVerF avatar Mar 13 '23 13:03 KevVerF

A common way of speeding up queries that perform selection on secondary columns on databases with a columnar architecture is through the use of skipping indexes.

In timescale this could involve keeping minmax or a bloom filter on each set of compressed column entry in the table, and push down / transform quals to short circuit the selection before decompressing the column.

nevillegrech avatar Mar 18 '23 21:03 nevillegrech

Are there any plans for this? It makes compression substantially more difficult to adopt otherwise.

qeternity avatar Nov 05 '23 14:11 qeternity

We are going to release a couple of things in 2.15 that might help with these use cases:

  • For columns that have btree indexes, sparse minmax indexes on compressed data will be created. This is the same kind of index we currently create for compressed orderby columns, it can help with range or point queries. https://github.com/timescale/timescaledb/pull/6705
  • Vectorized equality filters for text columns. Combined with late materialization of compressed columns after vectorized filters, and dictionary encoding of text columns, this effectively functions as a sort of sparse index for low-cardinality text columns. We saw 10x speedups on some data sets for queries that filter on text_column = 'some text' on compressed tables. https://github.com/timescale/timescaledb/pull/6189

akuzm avatar Mar 12 '24 11:03 akuzm

It would be nice to have it somewhere in documentation how do indexes work in timescaledb in general and with compression enabled and what are differences with postgresql regular table. What one has to look for or take into account...

LukasJerabek avatar Mar 21 '24 18:03 LukasJerabek

@svenklemm Is it true that on hypertables that have compression enabled, a unique index cannot be created?

If the answer is yes, I think a note on this should be added to the docs at https://docs.timescale.com/use-timescale/latest/hypertables/hypertables-and-unique-indexes/

aisven avatar Jun 12 '24 22:06 aisven