timescaledb
timescaledb copied to clipboard
Allow secondary indexes on compressed chunks
Currently, compressed chunks don't have any of the "regular" indexes defined on the hypertable, but they have compession group keys indexed with btree indexes. If i need to find data based on other columns, out of luck.
My suggestion is to allow secondary indexes on compressed chunks, that would operate not unlike bitmap scan indexes on regular tables.
If we skip btree index at first.. A BRIN index is a range index; it stores only min/max of key for each block range. If data is well-localized, lookups are quite fast. Looking at access plans, this is not dissimilar to how time dimension is currently filtered when scanning a compressed table, there are min and max values on each "row"group that are compared before decompression.
A similar range index, as well as GIN indexes for discrete data, R-tree cube indexes for multidimensional data, and quite possibly some other types, could work very well for compressed tables. One can think a compressed segment a bit similarly as a block in a regular table - you need to know whether to read it at all, you don't need to know if you'll find a hit there or on which row.
My specific use case is position data, where each device periodically sends time and multiple dimensions of coordinates. The coordinates are the measure i wish to compress - sensor IDs and timestamp being my clustering keys. If i had range indexes on all coordinates, a multi-index bitmap scan (or a single rtree box scan) could find results in a certain region and timespan. As it is now, i will most likely have to create an aggregate table that holds lists of time*spatial bouding boxes in an indexable structure.
Why does compressed chunks not have any of the "regular" indexes defined on the hypertable currently? Can we restore the index of the chunk after decompression?
What's the timeline on this feature? It's highly requested.
Related #2291
Bump
Another suggestion/question (instead of opening new related issue): Is there any technical reason why covering indexes are ignored too?
As almost everyone using timescaledb, we also have quite some load on our DB. Sometimes we create covering indexes (or use included fields to fully cover a query) on store it on a different storage backend. This lowers the impact on the actual tables and on the main storage. Since such indexes do not require any decompression, I think they are still useful and I would be nice if they could be used.
A common way of speeding up queries that perform selection on secondary columns on databases with a columnar architecture is through the use of skipping indexes.
In timescale this could involve keeping minmax or a bloom filter on each set of compressed column entry in the table, and push down / transform quals to short circuit the selection before decompressing the column.
Are there any plans for this? It makes compression substantially more difficult to adopt otherwise.
We are going to release a couple of things in 2.15 that might help with these use cases:
- For columns that have btree indexes, sparse minmax indexes on compressed data will be created. This is the same kind of index we currently create for compressed orderby columns, it can help with range or point queries. https://github.com/timescale/timescaledb/pull/6705
- Vectorized equality filters for text columns. Combined with late materialization of compressed columns after vectorized filters, and dictionary encoding of text columns, this effectively functions as a sort of sparse index for low-cardinality text columns. We saw 10x speedups on some data sets for queries that filter on
text_column = 'some text'
on compressed tables. https://github.com/timescale/timescaledb/pull/6189
It would be nice to have it somewhere in documentation how do indexes work in timescaledb in general and with compression enabled and what are differences with postgresql regular table. What one has to look for or take into account...
@svenklemm Is it true that on hypertables that have compression enabled, a unique index cannot be created?
If the answer is yes, I think a note on this should be added to the docs at https://docs.timescale.com/use-timescale/latest/hypertables/hypertables-and-unique-indexes/