timescaledb
timescaledb copied to clipboard
Add support for correlated constraints
Allow users to specify a specific column to be used as a correlated constraint using the add_dimension() API. We store such correlated constraints in the dimensions related timescaledb catalog tables. The "dimension" catalog table has been amended by adding a "type" column. This column now explicitly stores the type: open, closed or correlated as appropriate.
We create dimension_slice and chunk_constraint entries for chunks which have correlated constraints on them. The dimension slice entry will have -inf/+inf as start/end range initially for a given correlated constraint and the chunk_constraint entry will refer back to this slice entry.
This start/end range will be refreshed later. One of the entry points is during compression for now.
We can thus store the min/max values for such correlated contraints in these catalog tables at the per-chunk level. Note that correlated constraints do not participate in partitioning of the data. Such a correlated constraint will be used for chunk pruning if the WHERE clause of a SQL query specifies ranges on such a column.
Disable-check: force-changelog-file
Disable-check: commit-count
Codecov Report
Attention: Patch coverage is 82.59109%
with 43 lines
in your changes are missing coverage. Please review.
Project coverage is 81.00%. Comparing base (
59f50f2
) to head (9c879ec
). Report is 161 commits behind head on main.
:exclamation: Current head 9c879ec differs from pull request most recent head 03553c9. Consider uploading reports for the commit 03553c9 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## main #6727 +/- ##
==========================================
+ Coverage 80.06% 81.00% +0.93%
==========================================
Files 190 198 +8
Lines 37181 37499 +318
Branches 9450 9788 +338
==========================================
+ Hits 29770 30376 +606
- Misses 2997 3208 +211
+ Partials 4414 3915 -499
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Have you considered not storing these as dimensions. This seems to be quite an invasive change and it means that all existing dimension code needs to be checked whether it handles correlated constraints correctly.
I have several issues with this PR. I don't think this can work by relying on adaptive chunking code which has an unproven track record and was written before compression even existed.
@antekresic
We are not relying on the adaptive chunking
code at all. We have fixed the issues in the function that it used for calculating min/max for a column using indexes, nothing else.
Relying on indexes to update the constraint ranges will not work, doing heap scans on uncompressed chunk regions will also not work. Updating ranges needs to include the compressed data. Creating those ranges before compression also makes very little sense.
The ranges are calculated and added when we call compress_chunk and AFTER a strong lock has been taken on the chunk being compressed. How can the ranges change in that case?
Updating ranges by hooking into the function that sets the chunk as partial is going to hurt DML on compressed chunks which is already a big pain point.
We do NOT update the range, we just mark them INVALID once as part of marking the chunk partial. The range will be re-calculated next only when the partial chunk gets compressed again.
Needs a lot more testing, especially with updates on the column that has the correlated constraint set, I don't see such a test at all.
As mentioned above, the moment you do DML on a compressed chunk, you mark it partial, that's when we invalidate the range for correlated constraints. This is tested in the PR.
Finally, a tsbench run showing the performance improvement this promises is a must since this is an optimization.
I did manual runs with data shared by AlexK sometime ago and mentioned on the slack thread that we see benefits of chunk exclusion as expected for queries involving WHERE clauses on these correlated columns.
closing this in favor of https://github.com/timescale/timescaledb/pull/7020