databend
databend copied to clipboard
chore: tweak transient table data retention settings
I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/
Summary
Tweak transient table data retention settings
This PR introduces a new setting, transient_data_retention_time_in_minutes
, to customize the retention period for transient table. This setting defines how long the historical data should be retained, with a default value of 60 minutes (i.e. 1 hour).
Additionally, when purging data from transient tables, the retention period specified by transient_data_retention_time_in_minutes
will now be utilized.
Set transient_data_retention_time_in_minutes
to 0 will "restore" the behavior of transient table before this PR.
- Fixes #[Link the issue here]
Tests
- [ ] Unit Test
- [ ] Logic Test
- [ ] Benchmark Test
- [x] No Test
Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Breaking Change (fix or feature that could cause existing functionality not to work as expected)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
@dantengsky Hi~ I think this feature should be useful when doing real-time scenarios to avoid the growth of snapshot files. What is the progress so far, please?
@dantengsky Hi~ I think this feature should be useful when doing real-time scenarios to avoid the growth of snapshot files. What is the progress so far, please?
Thanks for asking!
This PR aims to use a more conservative (longer) retention period when purging history for transient tables, instead of the current value of "0". Once merged, this should mean that transient tables will keep more historical data by default than they do now.
Currently, the smallest unit for the retention period is a day, which is a bit too large for transient tables.
Right now, the way transient table purging taking a risk of corrupting the target table in scenarios with concurrent modifications (including append-only writes). Basically, it might purge data from pending transactions that might be successfully committed later.
Although this PR can mitigate the issue for now, it doesn't completely solve it. We need to further refine it (by checking the table's least visible timestamp at commit time) to fully fix the problem.
Why not save the settings into table option rather than a dynamic global setting.
create table t (c int)
row_per_block = 100000
block_per_segment = 1000
data_retention_ttl_minutes = 600 --- this could be respected by vacuum command
recluster_schedule_interval = ..
...
Why not save the settings into table option rather than a dynamic global setting. ....
Good idea, at least data_retention_time_in_days
should be able to adjustable at table level (or inherit from db, account)