timescaledb
timescaledb copied to clipboard
[Enhancement]: Compression job should process chunks in order of range_start
What type of enhancement is this?
User experience
What subsystems and features will be improved?
Compression
What does the enhancement do?
The compression job should process the chunks in order of their range_start so that the experimental rollup functionality is more effective. Without an order, it's possible for chunks to processed in an order that prevent full rollups from being done, because it may start rolling up a chunk "later" in timeline, then go back in the timeline, but now that partially rolled up chunk is too large to rollup into the one further back.
Implementation challenges
No response
@RobAtticus the current show_chunks
logic uses the hypertable_id
and table_id
numbering values to do the sorting of the returned chunks. Typically, if we consider append only data insertions then that should be in sync with the time ranges.
We could return the chunks in dimension slice order though
Is show_chunks
used as part of the compression policy job? Basically what I've found is that sometimes the compression job will skip around in the set of chunks to be compressed, which leads to inefficient rollups. So this issue was about that, although I also think show_chunks
should enforce dimension slice order rather than rely on the IDs (given backfills, untiering a chunk, etc)
@RobAtticus yeah, show_chunks
is used in the compression policy logic.
yeah, maybe dimension_slice
based sorting is the way to go. We will need documentation changes also if we go this route.