timescaledb
timescaledb copied to clipboard
Change materialization hypertable indexes
Nowadays for the Continuous Aggregate materialization hypertable we add additional indexes. Those additional indexes are one for each group-by column plus time bucket expression and it can lead to unsusable indexes.
This PR change the current logic to add just one additional composite index including all group-by columns plus the time bucket expression.
Codecov Report
Merging #4354 (94ca9c6) into main (047d6b1) will increase coverage by
0.04%
. The diff coverage is94.05%
.
:exclamation: Current head 94ca9c6 differs from pull request most recent head c10f033. Consider uploading reports for the commit c10f033 to get more accurate results
@@ Coverage Diff @@
## main #4354 +/- ##
==========================================
+ Coverage 90.79% 90.84% +0.04%
==========================================
Files 216 217 +1
Lines 39953 40264 +311
==========================================
+ Hits 36275 36576 +301
- Misses 3678 3688 +10
Impacted Files | Coverage Δ | |
---|---|---|
src/compat/compat.h | 94.73% <ø> (ø) |
|
src/loader/bgw_launcher.c | 91.82% <ø> (ø) |
|
src/planner/add_hashagg.c | 54.28% <ø> (+1.42%) |
:arrow_up: |
src/planner/planner.h | 100.00% <ø> (ø) |
|
src/ts_catalog/catalog.h | 100.00% <ø> (ø) |
|
src/utils.h | 82.60% <ø> (ø) |
|
tsl/src/init.c | 95.45% <ø> (ø) |
|
src/telemetry/telemetry.c | 81.25% <16.66%> (-1.46%) |
:arrow_down: |
src/cross_module_fn.c | 69.28% <85.71%> (+0.31%) |
:arrow_up: |
tsl/src/reorder.c | 85.29% <90.32%> (+0.07%) |
:arrow_up: |
... and 23 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 0a68209...c10f033. Read the comment docs.
I don't quite understand the reasoning behind this change. Are we saying that those indices are useless now due to the final form? In that case we could remove them altogether.
Do we have a use-case in mind where the composite index is useful?
The problem is the current implementation create one index for each aggregated column plus time bucket, for example the following continuous aggregate:
CREATE MATERIALIZED VIEW continuous_aggregates
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 day', column1),
column2,
column3,
column4,
SUM(column5)
FROM
raw_hypertable
GROUP BY
time_bucket('1 day', column1),
column2,
column3,
column4;
It will create 4 indexes in the materialization hypertable:
- indx1 (time_bucket DESC)
- indx2 (column2, time_bucket DESC)
- indx3 (column3, time_bucket DESC)
- indx4 (column4, time_bucket DESC)
And depending of the queries over the continuous aggregate some indexes will become useless (not used), and turning into a single composite index we're trying to avoid this problem as much we can.
With this PR we'll end up with just two indexes:
- indx1 (time_bucket DESC)
- indx1 (column2, column3, column4, time_bucket DESC)
It was discussed some time ago during one of our meetings /cc @svenklemm @gayyappan
I don't quite understand the motivation behind this change.
- What is this supposed to provide, more space savings by not using too many indices? Performance shouldn't increase with this change.
- Isn't this codepath supposed to be unreachable because users are not supposed to use the old format?
*skip_adding = finalized;
would lead to never creating those indices.
Let's not merge this before we are sure there are no performance regressions, since we're presenting performance numbers for
2.7.0
and we wouldn't want to risk invalidating them this quickly.
Well... if gratuitously adding indexes to caggs is the reason for the improved performance, I don't think this necessarily is a good thing. Indexes take space and also affect performance when updating or inserting. For the same reason as why indexes are not added to tables in general, I think that we should avoid creating them unless there is a good reason. I do not think that the fact that they can affect our published benchmarks is a reason to hold back on merging.
It would, however, be good if we could redo the benchmarks and compare with and without this commit, just to understand where we stand.
Let's not merge this before we are sure there are no performance regressions, since we're presenting performance numbers for 2.7.0 and we wouldn't want to risk invalidating them this quickly.
I wouldn't be horribly worried about this, the index we're replacing these indexes with is as likely to be better than the others as not, it could cause performance differences in some cases, but I don't think it's horribly likely to be a problem. Especially when we combine it with #4430 , where users can create their own indexes on continuous aggregates, I think any performance issues would be pretty transitory. Aside from that this will not impact the benchmark numbers as far as I can tell because the same indexes will be created in the benchmark case as there is only one group by column in addition to the time bucket, this only impacts continuous aggregates that have multiple group by columns.
Closing because the benchmarks I've made in the past didn't prove the performance improvements. We can revisit it in the future.